Executive summary
Generative AI is now doing real knowledge work — drafting analyses, summarizing sources, answering questions over an organization's own material. But for professionals in regulated or confidentiality-bound fields, two problems block adoption: the output can be confidently wrong (ungrounded claims, fabricated citations, silently missing considerations), and getting useful output usually means sending sensitive source material to someone else's model.
Echo Angel Research is a local-first application that addresses both. It does not generate text with its own model. Instead, the professional's own assistant does the reasoning, while the application provides a disciplined set of model-free tools, at their heart a deterministic loop — search the sources, verify a quotation, record a citation, check coverage, and finalize — and refuses to emit a finished document unless every claim is grounded in a captured source and every standard element of the work has been addressed. Because the application never calls a model of its own, adopting it introduces no new party that ever sees your content.
This paper describes what the system does, why it can be trusted, how it has been tested — and the bounds of that testing — and where its limits are. It deliberately does not describe how to rebuild it. A separate, NDA-gated technical annex is available to serious evaluators who need architectural and evaluation depth.
1. The problem: useful AI that you cannot yet trust — or safely feed
Three gaps keep capable AI out of high-stakes professional work:
- Grounding. Language models produce fluent text that is not always faithful to any source. In knowledge work, an unsupported sentence is not a stylistic flaw — it is a liability. Overreliance on ungrounded model output is recognized as a top risk in the OWASP Top 10 for LLM Applications and a core concern of the NIST AI Risk Management Framework.
- Omission. Even a fully "grounded" answer can be wrong by what it leaves out — the consideration not raised, the standard element not addressed. Most tools defend against fabrication (commission) and say nothing about omission.
- Confidentiality. The dominant way to apply AI to your material is to send that material to a hosted model. For privileged, regulated, or commercially sensitive work, that transmission is itself the risk — independent of whether the answer is any good.
Emerging regulation (the EU AI Act, sector privacy regimes) and accepted risk frameworks all point the same way: AI used for consequential work must be traceable, grounded, and auditable, and sensitive data must stay under the owner's control. The market has tools that address pieces of this. None that we assessed resolves all three together the way a confidentiality-bound professional needs.
2. Why existing approaches fall short
The current generation of grounded-AI platforms is genuinely good, and we do not claim otherwise. The mainstream pattern is retrieval-augmented generation with verification: a hosted large language model generates an answer, and the platform then grounds, scores, cites, or corrects it. Public descriptions make the pattern explicit:
Salesforce describes a "citation architecture that allows users to verify AI-generated responses against original sources to reduce hallucination risk." ¹
Google Cloud's grounding check is designed to evaluate "a human-generated blurb or a machine-generated response," returns "a support score of 0 to 1," and is applied "at inference time." ²
Vectara's reliability layer is, in its own engineers' description, a system comprising "a generative model, a hallucination detection model and a hallucination correction model," where the system "actually corrects the issue" before delivering the answer. ³
These are capable systems. But two properties of the pattern matter for confidentiality-bound work:
- There is a model in the platform's pipeline. Grounding, scoring, and correction are applied to text the platform's own model generated. That means the sensitive prompt and source material pass through that model. For privileged content, the verification quality is beside the point — the transmission is the exposure.
- The checks run at generation time. Grounding scores and corrections are produced as the answer is generated. They are quality signals on a draft, not a gate on the final, exact artifact a person actually exports and relies on.
Echo Angel Research is built around the inverse of both properties. It is a narrow, deliberate difference — a different architecture: not "nobody does grounding," but "the model is yours, not ours, and the enforcement happens at export."
3. The Echo Angel approach
Two design choices define the system. Both are described here at the level of what they achieve; the how is reserved for the technical annex.
3.1 Client-driven — the application makes no model call.
The reasoning is done by the professional's own assistant (the client they already use and trust). Echo Angel Research supplies model-free, local tools; at their core is a deterministic verify-and-finalize loop — find a passage in your sources, return its exact captured text for verification, record a citation only if its source is actually present, check whether a draft's claims are all grounded, check whether the standard elements of the work are all addressed, and finalize. None of the application's tools — in this loop or beyond it — sends your content to a model, and the application does not run a model of its own. The practical consequence: adopting the tool introduces no new party that ever sees your material. This is the load-bearing property for privileged or confidential work.
3.2 Grounded by construction.
Every claim in a finished document is tied to a specific, captured source passage. A citation cannot be recorded against a source that was not captured — the check is preventive, not an after-the-fact audit. Claims that cannot be grounded are flagged, not asserted. Sources can be ordinary local documents — including PDFs and scanned or image material — extracted entirely on your machine; where extraction fidelity is in question (as with scanned text), that caveat is carried visibly through to the finished document.
3.3 Governed at the point of export.
The trust properties are not advisory. When a professional finalizes a draft into a shareable document, the application re-checks that exact submission — every claim grounded, every standard element of its work type addressed — and, at the refuse-or-emit export gate, renders the document only if those checks pass; otherwise it refuses and says what is missing. Enforcement happens on the precise artifact being emitted, so a check that passed on some earlier draft cannot let a flawed final version through.
3.4 Local-first, with a clearly-marked exception.
By default the application runs in Local mode — fully offline, fail-closed: your sources, drafts, citations, and logs stay on your machine, and a source step that would require the network is refused. An optional, loudly-flagged web mode can fetch sources you explicitly choose — public pages, or sources you are authorized to access, which you declare and reach through your own sign-in — behind a clear consent notice, and the application tells you, every time, that anything in this mode leaves your machine. A quality check rejects and reports pages it identifies as blocks, errors, or empty shells before they can enter your source library. The boundary is always visible; the safe default is always local.
4. Trust and governance properties
| Property | What it means for you | How it is enforced |
|---|---|---|
| No exfiltration to a model | Adopting the tool adds no new party that sees your content | The application makes no model call of its own; reasoning is done by your own client |
| Grounded output | Every asserted claim carries a verifiable source | A citation is refused unless its source passage was actually captured |
| Omission defense | Silently skipping a standard element fails loudly | A coverage check requires every standard element of the work type to be addressed or visibly dispositioned |
| Export-time gate | Nothing uncited or structurally incomplete ships | Grounding and coverage are re-checked on the exact finalized artifact, or it is refused |
| Visible boundaries | You always know when anything leaves your machine | Local is the default; the web mode is a separate, loudly-flagged opt-in for sources you explicitly choose — public pages, or sources you are authorized to access, which you declare and reach through your own sign-in |
| Content-free work record | For the most sensitive work, even the application's own record holds none of your content | The session record can be kept content-free: the structure and governance outcomes are kept; the content is never stored |
| Disclaimers by construction | Finished work carries the right not-advice notice | The application refuses to render a final artifact for which no disclaimer resolves |
This posture maps directly onto recognized frameworks: the grounding and overreliance concerns of the OWASP Top 10 for LLM Applications, the traceability and governance expectations of the NIST AI RMF, and the transparency and record-keeping direction of the EU AI Act.
5. Validation — bounded and reproducible
Trust claims should be tested, not asserted. Two checks were run; both are deliberately bounded in what they prove.
5.1 Capability — enforcement integrity.
Because the application makes no model call, the capability under test is not "is the model accurate" — it is "can a fabricated or ungrounded claim reach a finished document?" An adversarial evaluation drove a mix of grounded and fabricated claims through the real verify-and-finalize loop. Result: the fabricated citations were refused, a draft with a silent omission was refused, and a clean draft was emitted — no fabrication reached a finalized artifact. The evaluation is reproducible and bounded to the tested paths; it demonstrates the gate, not a guarantee about every conceivable input.
5.2 Security — local, open-source scanning.
The application's source code was scanned entirely on the development machine (no code was sent to any third party) using open-source tooling aligned to OWASP / CWE:
- Static analysis (Bandit) of the first-party code (91 files, 8,026 lines): no high-severity findings (3 medium, 18 low); the grounding, citation, and finalize core scanned with zero findings. The remaining medium/low findings sit outside the enforcement core — in the web-fetching and supporting tooling and an optional server entry point whose configuration is documented; all were reviewed.
- Dependency audit (pip-audit) across 77 dependencies: zero residual known vulnerabilities. An earlier audit identified vulnerabilities in third-party libraries, which were remediated to zero residual with the application's reproducibility guarantees re-verified after the update; the one finding in the current audit was the environment's own installer tool — not an application dependency — and was upgraded and re-audited to zero.
- The application's automated test suite (over 300 tests) runs green in continuous integration.
These results were validated at product commit 221e1a0, 2026-06-10; re-verified at each release. This is bounded OSS scanning, not a third-party security audit, and it is described as such.
6. Limitations and honest scope
In the spirit of an AI system card, the boundaries of the claims above:
- The guarantees attach to the finalized document. Text a user copies out of an exploratory chat is not a finalized artifact: it carries no disclaimer and has not been re-checked. The discipline is "draft freely, but produce anything you rely on through finalize."
- The optional web mode does reach the network. "No exfiltration" is precise: the application makes no model call, and its default mode is fully local. The opt-in web mode deliberately sends your chosen queries and fetches to the sites you target — public pages, or sources you are authorized to access, which you declare and reach through your own sign-in — and it says so on every use.
- Enforcement is structural, not semantic. The export gate guarantees that every claim is cited and every standard element is addressed. It does not, and cannot, certify that a cited source actually says what the prose claims — that remains a human responsibility. The tool removes whole classes of error; it does not remove judgment.
- It grounds; it does not opine. Output is research synthesis, not professional advice. Finished work carries a not-advice disclaimer directing the reader to a qualified professional.
- v1 scope. v1 ships as an installable, self-contained Windows application — no development environment is required, and all modes, including the fully-local default, run in the installed form. A hosted governance connector (for licensing and a shared prompt library — never your content) and broader platform support are on the roadmap. Optional capabilities (scanned-document OCR, web acquisition) use system tools you provision, and final acceptance on a clean machine is in progress.
7. Conclusion
For professionals who cannot send their material to someone else's model and cannot afford a confidently wrong document, the useful question is not "which AI writes best" but "which AI can I trust, and prove I can trust, without giving anything away." Echo Angel Research answers it with a narrow, tested posture: your own model does the thinking; the application grounds, governs, and gates the result; and nothing you own leaves your machine.
Process-level governance — the session record around the work itself — is covered in the companion orchestrator overview and the platform case study.
Evidence register
External factual claims in §2 are quoted from public sources, captured and verified verbatim:
- Salesforce Engineering — Grounding Enterprise AI with Live Web Retrieval and Verifiable Citations.
engineering.salesforce.com/grounding-enterprise-ai-with-live-web-retrieval-and-verifiable-citations/ - Google Cloud — Check grounding (Generative AI App Builder documentation).
cloud.google.com/generative-ai-app-builder/docs/check-grounding - VentureBeat — Guardian agents: …automatically correcting hallucinations… (quoting Vectara).
venturebeat.com/ai/beyond-detection-why-automatically-correcting-hallucinations-could-transform-enterprise-ai-adoption
Framework references: OWASP Top 10 for LLM Applications; NIST AI Risk Management Framework (AI RMF 1.0); EU Artificial Intelligence Act.
Evaluate it
Serious technical evaluators can request the NDA-gated technical annex — architectural and evaluation depth, under protection — and a guided assessment.
Request the Annex Orchestrator Overview Platform Case Study