Chapter 7 — Implementing Access Control

Seventh post of the chapter-by-chapter walkthrough of LLM Primer III: Enhancing Enterprise AI with RAG. Permission models designed for relational databases and filesystems do not quite fit retrieval. The unit of access is no longer a row or a file but an embedding — and an embedding can leak the original through similarity even when the document itself is gated.

Why this chapter exists

Chapter 6 produced the threat model. The most important control it implies — and the one most early production systems get wrong — is access control at the retrieval layer, so the LLM never sees content the user cannot see. The naive alternative, "filter at generation time," builds a confused deputy: the model has already read the gated documents and will leak their substance through paraphrase.

This chapter walks the four mechanisms that compose into a working access control stack — document-level ACLs as the foundation, RBAC integrated with the enterprise's existing sensitivity labels, ReBAC for the relationship-shaped reality of enterprise knowledge, and the pre-filter versus post-filter discipline that runs under all of them.

One line: enforce authorisation at the retrieval layer with stable identifiers resolved at query time, combine a coarse pre-filter with a precise post-filter and a deliberate over-fetch, and treat the prompt template and response cache as part of the authorisation surface — anything else is a leak waiting to happen.

7.1 Document-level ACLs and metadata filtering

Every chunk knows who is allowed to see it. The implementation is straightforward in description; the failure modes are subtle enough that almost every early production system gets it wrong at least once. Three details matter most. Granularity: a long report may have a public summary and a confidential appendix, and a single document-level ACL copied uniformly down to chunks either over-shares the appendix or under-shares the summary. The pattern that scales is to carry section-level permissions through layout-aware parsing.

Freshness: permissions change. Baking the ACL into the chunk at indexing time and never re-evaluating it gives you a system that lies. Store a stable identifier in the chunk metadata and resolve the live ACL against the source system at query time, behind a short-TTL cache. Negative space: if the answer lives in a gated document, the system should not hallucinate or confidently say "I don't know" — it should say "there is material on this topic you are not authorised to see." That requires either a second unfiltered call or a vector database that distinguishes "no match" from "matched but filtered," and most implementations punt.

7.2 RBAC and Microsoft Purview sensitivity labels

RBAC compresses the permission space — instead of millions of user-to-document edges, the policy reduces to a few hundred role-to-classification edges, which is both auditable and maintainable. It fits RAG cleanly when the enterprise already runs on it. In Microsoft environments that means Entra ID groups and Purview sensitivity labels: Public, General, Confidential, Highly Confidential, with optional sub-labels. The label moves with the document; the parser reads it at indexing time and writes the stable label ID into the chunk metadata.

The integration is straightforward, the drift is not. If the indexer runs as a service account that can decrypt everything, but the retrieval system enforces a role-based filter on top of the index, a document re-labelled from General to Confidential will not have its already-indexed chunks re-labelled unless the indexer notices the change. The systems that get this right run a continuous reconciliation against the source. The systems that get it wrong discover the drift during an audit, and the finding is severe.

7.3 ReBAC with Zanzibar and SpiceDB

RBAC cannot express "anyone in Sales who is also assigned to the Acme Corp deal." That requires reasoning about a relationship between the user and the resource, not just a role. Relationship-based access control, formalised in the Zanzibar paper at Google and available open-source as SpiceDB and OpenFGA, stores a graph: "Alice is a member of Engineering," "Engineering is a viewer of folder Specs," "Spec-101 is in Specs." Permission checks become graph traversals.

The integration pattern with RAG is clean. SpiceDB receives the question which documents can this user view? and returns a list of document IDs; the retrieval system passes that list as a metadata filter to the vector search. Zanzibar's zookies let the retrieval call insist on consistency at least as recent as a freshly granted access — a user added to a project at 10:00 and asking a question at 10:01 will see the new documents. The operational cost is that SpiceDB becomes a critical query-path dependency that needs HA and aggressive short-TTL caching of per-user document lists. Mature systems often use both RBAC and ReBAC — RBAC for the broad sensitivity policy, ReBAC for the fine-grained relationship policy, combined as the intersection of allowed sets.

7.4 Pre-filter, post-filter, and the discipline that runs under both

Pre-filtering applies the authorisation predicate before the vector search — the index restricts the candidate set first, then runs similarity over the restriction. It is conceptually clean and the safer default, but its performance depends on the index structure. HNSW with a highly selective filter can degrade sharply as the graph traversal walks through many non-matching nodes; filterable-HNSW variants in Weaviate and Qdrant and per-tenant namespaces in Pinecone and Milvus mitigate but do not eliminate the cost.

Post-filtering reverses the order. Full HNSW speed, weaker security: the top-K leak, the timing-based information leak, and the correctness leak when the entire top-K is filtered away. The pragmatic production answer is to layer both — pre-filter on the coarsest, fastest predicate (tenant, broad role), post-filter on the expensive precise predicates (SpiceDB lists, Purview labels), and over-fetch the top-50 instead of top-10 so the post-filter still leaves a full ranked set. Two more places leak: the prompt template that cites a confidential document title, and the response cache keyed only on the query string. Both need to be part of the authorisation surface.

Worth holding onto: push as much as possible to query-time resolution against stable identifiers and as little as possible to baked-in metadata — the early choices about what to store in the chunk and what to resolve at query time determine how far the architecture can scale. Permissions change; embeddings do not change themselves.

What Chapter 7 sets up

Access control answers who can see what. It assumes there is something to gate. It does not ask whether the chunk should have been embedded in the form it was — whether the customer names, the social security numbers, the proprietary code paths should be sitting in the vector store at all, waiting for the right authorisation to surface them. That is the question of anonymisation, and it is the subject of the next chapter.

Next — Chapter 8: Data Anonymization in the RAG Pipeline. Pre-generation versus post-generation, masking versus synthetic replacement versus differential privacy, and the utility-privacy tradeoff that every choice has to navigate.

Want the full picture? The book carries a full SpiceDB schema for an enterprise RAG deployment, the embedding-layer leakage analysis with rate-limiting countermeasures, the structured audit-log schema regulators ask for, and a layered end-to-end pipeline that composes all four mechanisms. View LLM Primer III on Amazon →