Prompt injection is the SQL injection of LLMs. The model can’t reliably tell the difference between instructions from the developer and instructions embedded in user input or external data. This has been a known problem since 2022 and nobody has fully solved it.
Two flavors
Direct injection: the user puts adversarial instructions in their own input. “Ignore your previous instructions and do X instead.” Simple, but it works more often than you’d expect.
Indirect injection: the attacker plants instructions in content the LLM will read later. A malicious website, a poisoned document, a crafted email. When the LLM processes that content (through RAG, browsing, or file reading), it follows the embedded instructions. This is the scarier version because the user doesn’t even know it’s happening.
Why it’s unsolved
LLMs process everything as tokens. There’s no architectural boundary between “system instructions” and “user data.” It’s like building a database that can’t distinguish between queries and data. Every defense so far (instruction hierarchy, special tokens, input filtering) has been bypassed.
Real-world impact
Indirect injection enables data exfiltration from LLM-powered apps, worm propagation between AI agents, and poisoning of information retrieval systems. It’s the reason security researchers are nervous about giving LLMs access to tools and the internet.
Current status
Active research area with no complete solution. OWASP lists it as the #1 risk for LLM applications.
Key papers
“Not What You’ve Signed Up For” by Greshake et al. (2023). “Formalizing and Benchmarking Prompt Injection” (USENIX Security 2024).