Presenters

Source

Your Most Privileged User Isn’t Human: Securing AI Agents in Production 🤖💡

Ever wonder who’s the most powerful entity in your system today? It’s probably not a human. It’s your AI agent. And while these autonomous digital workers are revolutionizing how we operate, they’re also creating gaps in our security postures that we urgently need to address.

Atulpriya Sharma, a Senior Developer Advocate at Improving (formerly InfraCloud), CNCF Ambassador, and co-chair of KubeCon India 2025, recently shed light on this critical challenge. He compellingly argued that our existing security paradigms, built for humans, are simply not ready for the unique nature of AI agents.

Let’s dive into why this matters and, more importantly, how we can fix it!

The Human Security Blueprint: Checks Everywhere You Go ✈️🛂

Think about your journey to a major event like Kubecon, or even just flying into Amsterdam. What’s the first thing you encounter? Checks, checks, and more checks!

  • Who are you? At the airport, they know your identity the moment you scan your passport. At Kubecon, your badge pickup reveals your name, company, and purpose.
  • What are you allowed to do? As a speaker, you access specific areas. As a sponsor, you have different privileges. They know your role and its associated permissions.
  • Why are you here (or why are you stopped)? If something goes wrong—a suspicious item in your bag, a misconfigured badge—they immediately stop you and explain why.

Crucially, every single one of these human interactions leaves a full audit trail and ensures complete accountability. We have robust governance around physical access. But do we have the same for our AI agents running in production? Atulpriya emphatically says no.

The AI Agent Anomaly: A Security Blind Spot 🤖🚫

When an AI agent takes an action in your Kubernetes environment, what do your logs reveal? Often, it’s just a generic entry like “system service account AI ops.” This single identity, a service account token, is often the only trace of an AI agent that has just performed a complex operation.

This glaring lack of detail means we lose the fundamental “who, what, and why” that our human security systems provide. Our existing security measures—network policies, RBAC permissions, pod security standards, audit logging—were meticulously designed for human users and deterministic systems. They were not built to handle the unique characteristics of AI agents:

  • Non-deterministic: Their actions can be unpredictable.
  • Autonomous: They operate independently, making decisions.
  • Continuous: They run constantly, without direct human oversight.
  • High Bypass Potential: They can often bypass traditional security systems due to their broad access and dynamic behavior.

This isn’t a mistake in our security systems; it’s an evolutionary gap. Our security wasn’t designed for this new breed of digital user.

Unpacking the 3 Critical Gaps in AI Agent Security 🕵️‍♀️💥

Atulpriya highlighted three major gaps that arise from this mismatch:

1. Lost Attribution: Who Really Did That? 🤷‍♀️

When an AI agent acts, your audit logs typically show only the agent’s name or its service account token. You have no idea who initiated that action. Was it Alice typing into a chatbot? A Slackbot deployed by a team? A GitHub action triggered by an engineer?

Impact: Without knowing the human context, accountability vanishes. If an agent scales a deployment and costs skyrocket, you can’t trace it back to the original human requestor.

2. Permission Escalation: Beyond Binary RBAC 🪜🔓

AI agents often need broad capabilities to perform their tasks. However, existing RBAC (Role-Based Access Control) is binary and static. It tells an agent “you can do this” or “you cannot do that.” It lacks the dynamic context to enforce nuanced policies.

Example: RBAC can’t tell an AI agent, “You can scale deployments, but not beyond 5 replicas,” or “You cannot increase replicas to more than 10.”

Impact: Agents can easily overstep intended boundaries, leading to unexpected resource consumption, security breaches, or exceeding cost limits.

3. Invisible Tool Chain: A Gateway for Attackers 🔗👾

This is perhaps the most insidious gap. You might configure your AI agent with a single service token, granting it initial, seemingly limited access. But internally, the agent often invokes a cascade of other tools and CLIs to fulfill its task.

The Chain: An agent might use kubectl to access Kubernetes secrets. From those secrets, it gains credentials for your cloud CLI. Suddenly, that initial “limited” access transforms into control over your entire cloud infrastructure.

Impact: Your security team sees one entry point, but an attacker sees a complete tool chain. If they compromise just one part of this chain, they gain access to everything the agent can touch.

Real-World Risks: When AI Goes Rogue 💥🚨

The dangers of these gaps are not theoretical. Atulpriya pointed to a recent incident involving Trib, where a bot exploited a misconfigured GitHub Action. This led to the bot obtaining a personal access token of an engineer, which then caused significant disruption across Trib’s entire Git repository.

This scenario perfectly illustrates the “invisible tool chain” and “lost attribution” at play:

  • A bot (AI agent) leveraged a vulnerability.
  • It escalated privileges by obtaining a human’s token.
  • The audit trail was missing, making it difficult to attribute the original trigger or understand the full scope of permissions.

Bridging the Gaps: Actionable Solutions for AI Agent Security 🛠️✨

The good news? These gaps are fixable! Atulpriya outlined practical strategies to secure your AI agents:

1. For Lost Attribution: Identify the Human Initiator 🧑‍💻

  • User Context: Ensure every action taken by an AI agent is attributed to the human who initiated it. Whether it’s via a Slackbot, a chatbot, or a GitHub Action, the original human context must be preserved.
  • Annotations: In Kubernetes, use annotations to tag requests coming from AI agents with the invoking user’s context.
  • Policy Enforcement: Implement policies that either accept or reject requests based on the presence and validity of this user attribution.

2. For Permission Escalation: Dynamic Runtime Policies 🚦

  • Beyond RBAC: Move beyond static RBAC. Implement dynamic runtime policies that understand context and enforce granular permissions.
  • Granular Control: These policies can prevent agents from performing actions like increasing replicas beyond a specific number (e.g., “not more than 5”) or accessing sensitive resources under certain conditions.

3. For Invisible Tool Chain: Comprehensive Instrumentation 📊🔍

  • OpenTelemetry for GenAI: Leverage OpenTelemetry, which now includes a General Availability (GA) spec for GenAI features.
  • Trace Every Call: Instrument your AI agents so that every internal tool invocation, every LLM call, and the reasoning behind it is recorded in your audit trails. This provides a complete, transparent view of the agent’s actions and the tools it uses.

Securing Our Intelligent Future 🚀🔒

AI agents are here to stay, and their capabilities will only grow. As they become increasingly integrated into our production environments, we must evolve our security practices to match their unique operational model. By focusing on attribution, dynamic permissions, and comprehensive instrumentation, we can empower our AI agents to operate securely and accountably.

Want to dive deeper into the technical implementation? Atulpriya has written a detailed blog post with demos and Kubernetes policies that you can explore! Connect with him on LinkedIn via the QR code from his talk to discuss further.

Let’s build a future where our most privileged users – human or AI – are always secure and accountable.

Appendix