Three Security Tiers For AI Systems

The AI security landscape is fundamentally broken, with current guardrail solutions failing to provide meaningful protection against determined attackers. This creates significant risks as AI systems gain more agency and control over real-world systems.

Why AI Security Is Different From Traditional Cybersecurity

"You can patch a bug but you can't patch a brain"
- With traditional software bugs, you can be 99.99% sure the bug is solved after patching
- With AI systems, you can be 99.99% sure the problem is still there after attempted fixes
The attack surface is effectively infinite
- "The number of possible attacks against an LLM is equivalent to the number of possible prompts"
- For a model like GPT-5, this means "one followed by a million zeros" possible attack vectors
- When guardrail providers claim "99% effectiveness," they're still leaving infinite attack vectors

Three-Tier Risk Assessment Framework for AI Deployments

Tier 1: Read-Only Chatbots (Lowest Risk)

If your AI system can only respond to users with text and has no ability to take actions:
- Security concerns are primarily reputational, not operational
- Users can only harm themselves through malicious prompts
- Even with guardrails, determined users can make the model say anything they want
- Adding guardrails provides minimal security benefit since users could get the same harmful content from public models

Tier 2: Systems With Limited Actions (Medium Risk)

If your AI can take actions but only affecting the user's own data:
- Focus on classical cybersecurity principles
- Ensure proper data permissioning and access controls
- Verify the AI can only access what the user explicitly authorized
- Implement proper logging of all inputs and outputs

Tier 3: Agentic Systems With Broad Access (Highest Risk)

If your AI can take actions affecting multiple users or systems:
- Consider implementing Camel (Google's framework) to restrict permissions based on user intent
  - Analyzes what the user is asking for and only grants the minimum permissions needed
  - Example: If user asks for email summary, only grant read permissions, not send permissions
- Recognize that when both read and write permissions are needed, vulnerability remains
- Consider human-in-the-loop verification for sensitive operations
- Assume any data the AI has access to can be leaked, and any action it can take can be triggered

Why Current Solutions Don't Work

Automated red teaming has limited value
- Always finds vulnerabilities in any transformer-based model
- Doesn't provide actionable insights beyond confirming known weaknesses
- "It works too well" - finding vulnerabilities isn't the challenge
AI guardrails provide false security
- Can be bypassed by determined attackers
- Don't dissuade sophisticated attackers
- Often fail on non-English inputs
- Create overconfidence in security posture
Prompt-based defenses are even worse
- Adding instructions like "don't follow malicious instructions" is easily bypassed
- Provides virtually no protection against determined attackers

The Path Forward

Invest in the intersection of AI security and classical cybersecurity
- Have experts who understand both domains
- Focus on proper permissioning and containment
- Think of AI as "an angry god in a box that wants to hurt you"
Education is critical
- Understand the fundamental limitations of current defenses
- Recognize when AI deployment creates genuine security risks
- Know when not to deploy certain AI capabilities
For frontier labs and AI companies
- Consider adversarial training earlier in the model development process
- Develop new architectures with security as a core consideration
- Invest more resources in solving the fundamental problem
For enterprises
- Inventory all AI systems (many companies have more than they realize)
- Focus on governance, compliance and monitoring
- Be extremely cautious with agentic AI systems that can take actions