2026-05-05 · 8 min read · Architecture · RAG · Knowledge Base

Not a Chatbot: The Architecture Behind a Domain AI System

A chatbot connects you to a general-purpose model. A domain AI system connects you to your own accumulated knowledge — structured, embedded, and retrieved against a model constrained to reason in your field. The difference is architectural.

The distinction that matters

When most people think about “AI for their office,” they picture a chatbot: a conversation window where you type a question and receive an answer. The model behind the window is general-purpose — trained on a broad corpus, able to respond to almost anything, specialised in nothing.

This architecture is adequate for general tasks. It is inadequate for professional work — not because the model is weak, but because the model does not know your domain. It has never seen your firm's casework. It does not know your lab's prior protocols. It cannot retrieve the evaluation report your team wrote eighteen months ago and use it as a template for the one you are writing today.

A domain AI system is built differently. It sits on top of your accumulated knowledge — not the internet's knowledge, yours. The architecture that makes this possible consists of three components: a knowledge base, a retrieval pipeline, and a constrained generation layer.

The knowledge base: your corpus, structured

The knowledge base is a structured repository of your domain's accumulated material. For a law firm, this includes past case memos, review reports, brief templates, and relevant statute and precedent excerpts. For a research lab, it includes prior papers, protocol documentation, grant reports, and literature synthesis notes. For a social science researcher, it includes coded interview transcripts, past evaluation reports, and literature review matrices.

This material is processed at ingestion. Documents are parsed, cleaned, and (for legal and medical domains) de-identified. De-identification happens at the intake gate — before any document enters the knowledge base, all client-identifying information is stripped. This is not an optional step; it is a structural constraint of the system.

After de-identification, documents are split into segments and converted into numerical representations called embeddings. An embedding is a vector — a list of numbers — that encodes the semantic content of a text segment. Two segments about similar topics will have similar embeddings; two segments about unrelated topics will be far apart in the embedding space. This is what makes retrieval possible.

RAG: retrieval-augmented generation

When a user asks the system a question — or when an agent initiates a task — the system does not simply send the question to a large language model (LLM) and return the result. It first retrieves.

Retrieval-augmented generation (RAG) works in two steps. First, the query is converted into an embedding using the same model used to embed the knowledge base. This embedding is used to find the knowledge-base segments most semantically similar to the query — think of it as a meaning-based search rather than a keyword search. Second, those retrieved segments are provided to the LLM as context, alongside the query. The LLM generates a response grounded in the retrieved material.

The result is a system that does not hallucinate from general training data — it generates against your knowledge base. When JurisCorpus drafts a case memo, it retrieves the most relevant prior memos and precedent segments from your firm's corpus, then uses those as the generation context. The output is grounded in your practice, not in a generic training corpus.

Domain constraints: the axiom layer

RAG alone is not enough. Retrieval gives the model access to relevant material. It does not tell the model how to reason about it. Without domain constraints, a model given a retrieved case memo and asked to generate a new one might produce something fluent and structurally incorrect — following the surface pattern of legal writing without the underlying logic.

This is where the domain axiom document (described in the previous article) becomes the third architectural layer. The axioms are not just documentation — they are encoded into the system's generation prompts, agent behaviours, and output validation rules. When an agent produces a first-draft memo, it is operating against the axiom constraints: the issue framing rules, the precedent citation conventions, the hedging requirements for contested statutory interpretation. These constraints are what make the output professionally acceptable, not merely grammatically coherent.

Why the system compounds

The compounding effect is a direct consequence of the knowledge-base architecture. Every document that flows through the system — once reviewed and accepted — can be ingested back into the knowledge base. A case memo drafted by the system and refined by a senior attorney becomes part of the corpus. The next memo retrieves it. Over time, the system's knowledge base reflects the actual practice of the firm or lab — not a generic professional corpus, but your specific accumulated expertise.

This is why the analogy we often use is that of a junior attorney who has read every memo your firm has ever produced. Not one who has read every legal brief ever written — one who has read yours. The specificity is the value.

General-purpose AI decays with use in the sense that it never learns from your work. A domain AI system compounds: every cycle of operation adds material to the knowledge base, and the quality of retrieval improves accordingly. The longer it runs, the stronger it gets.

← Domain Axioms Next: The FOMO Trap →All Insights