Most teams deploying LLMs think about security too late and too narrowly. They add a content filter, maybe a WAF, and call it done. That covers the most obvious attacks but misses most of the attack surface.

Securing an LLM in production means covering the full path from user input to model output to the data that shaped it. Skip any one and the others become less effective.

01 - Map the attack surface

Before you can secure anything you need to know what you're securing. Inventory every place text reaches the model: end-user prompts, system prompts, tool responses, retrieved documents, etc.

Most teams undercount this and focus only on the user input. This misses dangerous inputs that the model fetches itself - retrieved documents, tool responses, external API calls. Anything the model reads is part of your attack surface.

A good starting point is the OWASP Top 10 for LLM Applications - it maps the most common attack vectors and gives you a structured baseline for coverage. If you're operating in a higher-risk environment, MITRE ATLAS extends that into adversarial ML territory beyond the prompt layer.

02 - Deploy runtime inspection

Inspect every prompt going in and every response coming out, on the inference path, before output is returned. Direct injection from users, indirect injection from retrieved content, jailbreak attempts, exfiltration through crafted outputs - these need to be caught before the response is delivered, not after.

Building this inline with low latency is where most teams run into trouble. It's best to use purpose-built runtime security tools to handle this without your team having to build and maintain detection logic from scratch.

03 - Monitor the data layer

Most runtime detection misses the data layer entirely - by the time a poisoned input reaches the model it's already too late. Poisoned RAG documents, backdoors introduced during fine-tuning, shortcut learning baked into weights - none of these are visible at runtime. Provenance and continuous monitoring are what close this gap. Know where every piece of data came from, when it entered the pipeline, and whether it has changed. An unaudited RAG update is a live attack surface from the moment it's ingested.

This is also an area where building it yourself is expensive. But the baseline for provenance tracking will give you a great visibility into what is entering your system and how much it can be trusted.

04 - Validate outputs against policy

Enforce output schemas, allow-listed tool calls, and sensitive-data filters on everything the model returns. Treat tool-use as a privileged action that requires explicit authorisation and weight what a model can access versus what it really needs.

The blast radius of a successful injection scales with what the model is allowed to do. A model that can only answer questions is one thing. A model that can write to a database, call external APIs, and send emails is another. Least privilege applies here exactly as it does everywhere else.

05 - Generate audit evidence

Every prompt, response, retrieval, and tool call should produce evidence mapped to the regulatory frameworks you operate under - EU AI Act, GDPR, FINMA, HIPAA, sector regulators.

The earlier a team has their audit chain and logging right the less of a headache it will be in the future.

06 - Red-team continuously

Run adversarial test suites against your model. Knowing what jailbreaks and injections work will inform how you deal with security incidents and let you adapt your security as attackers evolve.

One-time red team exercises before launch are not sufficient. Attackers adapt. Continuous adversarial testing is what tells you whether your controls are still working or need to be improved.

You don't have to navigate all of this alone.

Blindsight's platform covers the full stack described above, performing runtime inspection, monitoring the data and providing provenance and the logging you need. If you'd rather focus on deploying AI than securing it from scratch, that's what we're here for.