The Promise and the Reality
Every organisation that uses or plans to use Retrieval Augmented Generation (RAG) faces a promise: the AI searches the entire knowledge base and delivers precise answers. The promise is technically achievable. The challenge that is systematically underestimated is not a question of the model's intelligence — it is a question of architecture.
Because the AI sees everything. It must see everything to function. But the employee asking the question is not allowed to see everything. Between what the AI knows and what it is permitted to answer lies an architectural no-man's-land that most implementations fail to address.
This post describes the architecture that structures this no-man's-land. It is aimed at decision-makers who want to understand why AI projects on enterprise data will fail without well-thought-out access control — not because of the technology, but because of compliance and trust.
The AI is the Broker, not the Subject
In classic IT architectures, access control is clearly structured: a user (subject) accesses a resource (object), and a rule set decides whether access is permitted. Identity and Access Management (IAM) and Attribute-Based Access Control (ABAC) are established patterns for this.
With AI on enterprise data, this pattern shifts fundamentally. The AI is not a user in the classical sense. It is generally authorised — otherwise it could not access the data stored in the vector store. The actual access control does not occur between AI and data, but between AI and the requesting user.
The question is not: May the AI read this chunk? The question is: May the AI include this chunk in the response to this user?
This means: the AI acts as a broker. It mediates between the organisation's knowledge base and the individual employee. And like any broker, it needs clear rules about what it may pass on to whom.
A Three-Layer Architecture
The solution requires three separate layers, each addressing an independent concern. This separation is architecturally critical: if you mix the layers, you end up with a system that is neither auditable nor maintainable.
Layer 1: Ingestion — Classification at Data Intake
When loading documents into the vector store, the resulting chunks must be tagged with metadata. Not with permission information — because permissions change — but with classification attributes: confidentiality level, organisational unit, data type, regulatory category, source system.
This separation is key: the question "What properties does this document have?" is stable. The question "Who may see it?" changes with every personnel change.
There are three approaches for the origin of these attributes:
Source system inheritance. Metadata is taken from existing systems — SharePoint classifications, DMS categories, SAP permission groups. This is the lowest-cost approach, but only as reliable as the data maintenance in the source system. In practice, this reliability is often limited.
AI-assisted classification. A specialised model classifies each chunk during ingestion. This scales, but raises a trust problem: you are securing data with a system that can itself make mistakes. A misclassified chunk is a silent data leak.
Hybrid approach. Source system attributes as a baseline, AI re-classification with human review for high confidentiality levels. Realistic, but labour-intensive.
Layer 2: Permission Checking at Runtime
When a user asks a question, the vector store returns the most relevant chunks. Before these enter the language model's prompt, a Policy Decision Point (PDP) checks for each chunk: may this user with these roles access a chunk with these attributes?
A promising candidate for this role is the Open Policy Agent (OPA). OPA enables Policy-as-Code in the declarative language Rego. Policies can be versioned, tested, and audited — all requirements that are non-negotiable in an enterprise context.
The architecture works as follows: the user asks a question. The retrieval system fetches the Top-K relevant chunks from the vector store. A Policy Enforcement Point (PEP) passes the chunk metadata and user context to OPA for each chunk. OPA decides per chunk: approved or blocked. Only approved chunks flow into the prompt.
OPA does not decide whether the user may open a document. OPA decides whether a piece of information may flow into an AI-generated response. That is IAM at a new level of granularity.
The decisive advantage of this approach: permissions are checked dynamically at runtime. When an employee changes departments or leaves the organisation, the policy inputs change — not the chunk metadata. The data in the vector store remains unchanged.
Excursus: Integration with Existing Permission Systems
This pattern becomes particularly elegant when connecting systems like Notion, Confluence, or SharePoint. These systems already have a maintained permission system. Instead of replicating these permissions, the PEP can query the source system directly: may this user see this page?
This eliminates the synchronisation problem — the permissions are always current because they are not copied, but queried live. OPA acts as an abstraction layer that normalises different permission models onto a unified attribute schema. One PEP, many source systems, one policy language.
The trade-off is latency and coupling: every request requires a round-trip to the source system. For scenarios with high requirements for currency, this is acceptable. For scenarios with high performance requirements, a periodic sync with a defined time window may be the better choice.
Layer 3: GDPR Masking — The Chicken-and-Egg Problem
Independent of the permission check, there is a second, orthogonal concern: personal data. An employee may be authorised to see a document, and yet certain personal data within it must not be passed to a language model.
This is an independent risk that exists regardless of the user. Even the Data Protection Officer personally has this risk when personal data ends up in an LLM prompt.
Masking occurs in stages:
Stage 1 — Rule-based. Pattern matching for structured personal data: email addresses, IBANs, phone numbers, social security numbers. Deterministic, auditable, performant.
Stage 2 — Local NER model. A specialised Named Entity Recognition model running on-premise recognises personal names, places, and organisations in free text. No cloud, no external processing — from a data protection standpoint, this is internal processing.
Stage 3 — Semantic understanding. And here the dilemma begins.
A sentence like "The colleague from accounting who had the incident last month" contains no named entity. In the context of the organisation, it is nonetheless unambiguously personal. Only someone who understands the sentence can recognise this — meaning a language model. Meaning exactly the system the data is supposed to be protected from.
To protect data from AI, you need an AI that sees the data. This is not a technical failure. This is a structural conflict.
The honest answer: perfect GDPR compliance before LLM access is not achievable with reasonable effort. Stages 1 and 2 catch the majority. For the rest, you end up with a Data Protection Impact Assessment, weighing legitimate interest against residual risk, with technical and organisational measures.
A possible architectural compromise: a weaker, locally operated model handles the semantic masking. The already-cleaned data then goes to the more capable external model. This is justifiable from a data protection standpoint, but requires that the local model is good enough to recognise contextual personal references. With current European open-source models, this is an open question.
The Complete Pipeline
In summary, a pipeline with four filter steps between user request and LLM response emerges:
| # | Step | Function | Technology |
| 1 | Retrieval | Relevant chunks from vector store | Vector DB (Weaviate, Milvus, Qdrant) |
| 2 | Permission filter | Check: may this user see this chunk? | OPA / Rego + PEP |
| 3 | GDPR masking | Remove or pseudonymise personal data | Regex + NER + local LLM |
| 4 | Prompt assembly | Cleaned chunks in LLM context | Orchestration (LangChain, custom) |
After LLM generation, an optional fifth step can re-check the generated response — a response check that ensures no new, impermissible insights emerge from the combination of individual pieces of information. This topic — inference attacks through the combination of harmless individual pieces of information — deserves its own examination.
The Regulatory Dilemma
This architecture is technically clean. It is also complex. Each layer is a potential point of failure. And here a tension emerges that goes beyond the technology.
In other markets — particularly the US — the regulatory framework permits building first and optimising later. Companies go into production with simpler architectures, gather experience, and improve iteratively.
In Europe, the complexity of the infrastructure arises predominantly from compliance requirements. And if the masking filter lets a sentence through, that is not a technical bug — it is a fine risk.
A paradoxical situation arises: whoever ignores it violates applicable law. Whoever does it half-heartedly bears the greatest risk. And whoever does it conscientiously documents their residual risks in the Data Protection Impact Assessment — and thereby hands the supervisory authority the audit basis at the same time.
This is not an argument against data protection. It is an argument for a regulatory practice that recognises the difference between conscientious implementation and omission. Currently, the system penalises those who take it seriously and rewards those who wait.
Recommendations for Practice
Despite the complexity, the use of AI on enterprise data is possible — if the architecture is considered from the start. The following recommendations are aimed at organisations that want to take this path:
Separate classification from authorisation. Metadata on data describes properties. Permissions are checked dynamically at runtime. This separation is the foundation for a maintainable architecture.
Use existing permission systems. Notion, Confluence, SharePoint already have access controls. Do not replicate them — integrate them. OPA as an abstraction layer normalises different permission models.
Accept the residual risk in GDPR masking — but document it. Perfect automatic recognition of contextual personal references is not achievable with today's technology. A clean Data Protection Impact Assessment that names this residual risk and describes the measures taken is the professional way to handle it.
Start with a limited scope. Do not start with the entire document base. Choose a department or a document type with clear classification rules and clean source system metadata. Gather experience before scaling.
Invest in ingestion, not in the model. The quality of AI on enterprise data is determined not by the language model, but by the quality of data preparation, classification, and access control. This is less glamorous than prompt engineering — but it is what determines success or failure.
Conclusion
Enterprise AI is not a prompt engineering topic. It is an architecture topic. And architecture starts with the data — its classification, its access control, and the honest handling of the limits of what is technically cleanly solvable.
The architecture described here is not a theoretical construct. It addresses real requirements that every organisation with regulatory obligations has when it wants to apply AI to its own data. The complexity is high. But the alternative — AI without access control or no AI at all — is not an option.
About the Author
Andre Jahn is a freelance Solution Architect with over 30 years of experience in enterprise IT. He advises companies on modernising complex IT landscapes, with a particular focus on legacy systems, compliance architectures, and the secure integration of AI into existing infrastructures.