Presenters
Source
Building an Ironclad Castle: GitOps-Driven Multi-Tenant Isolation in Kubernetes 🚀
Ever dreamed of a perfect platform? One where your Kubernetes clusters hum along, applications sync flawlessly, and tenants coexist peacefully? Well, as Luke Phillips, a Principal Engineer at FICO, and Christian Hernandez, a Technical Marketing Engineer at Isovalent (and an Argo CD maintainer!), showed us, the reality of multi-tenant Kubernetes can quickly turn that dream into a thrilling challenge!
In a live, high-stakes demo, Luke (the diligent Platform Engineer at “BigCorp”) faced off against Christian (the “rogue” Tenant A). Their mission: to transform a seemingly perfect, yet vulnerable, multi-tenant platform into an unbreachable fortress using the power of GitOps, Argo CD, and Cilium. Let’s dive into their journey!
The Illusion of Isolation: When “Perfect” Isn’t Enough 🛡️
Luke proudly presented his “perfect platform.” Tenants were neatly divided by Kubernetes namespaces, workloads were running, and Argo CD was syncing in the background. What more could anyone want?
Christian, playing the part of the curious (and slightly mischievous) Tenant A,
quickly put that perfection to the test. He immediately tried to curl another
tenant’s application endpoint. Surprise! He got access! Then he tried the
shared config API, and even the admin endpoint. All accessible!
💡 The Hard Truth: Kubernetes namespaces are fantastic for organization, but they offer zero network isolation by default. The default Kubernetes network model is flat – meaning all pods can reach each other. This is by design, making it easy to get started, but it’s a huge security gap in a multi-tenant environment.
Blocking the Basics: L3/L4 Network Policies 🧱
Luke, witnessing Tenant A’s network exploration, realized his platform needed some serious hardening. His first line of defense: Kubernetes Network Policies. He applied a default policy to block all ingress and egress for all tenants.
Christian tried his curl commands again. This time, the application endpoint
timed out – traffic was blocked! Progress! But then he tried the shared config
API and the admin path again. Still accessible!
🚧 The Challenge: While Kubernetes Network Policies provide essential Layer 3 (IP) and Layer 4 (port/protocol) blocking, they aren’t granular enough. They couldn’t differentiate between a legitimate shared config access and a rogue attempt to hit an admin path.
Unleashing L7 Power with Cilium 🚀
Enter Cilium Network Policies. Luke, having “stumbled upon” Cilium, knew this was the upgrade his platform needed. After cleaning up the basic K8s policies, he deployed comprehensive Cilium policies.
✨ The Game Changer: Cilium network policies, powered by eBPF, are a superset of Kubernetes policies. They extend isolation all the way up to Layer 7 (Application Layer). This means you can filter traffic based on:
- HTTP paths and methods (e.g., only allow
GETrequests to/api/v1/config, but deny access to/api/v1/admin). - DNS requests.
Christian re-ran his tests. The shared config API returned a 200 OK
(expected!), but the admin path now returned a crisp 403 Forbidden! And
crucially, the error message clearly indicated that the Cilium Envoy proxy
blocked the request at the HTTP path level. Intelligent blocking!
Beyond the Network: Securing the Deployment Plane with Argo CD ⚙️
Network locked down? Christian, the rogue tenant, simply changed tactics. He tried to deploy a new application into another tenant’s namespace using Argo CD. And guess what? It worked! A new pod popped up in Tenant B’s namespace.
🚨 The Next Vulnerability: Default Argo CD projects are highly permissive. They often come with “star” configurations, meaning they can deploy to any destination. Argo CD projects are your orchestration boundary, and they need to mirror your network policies.
Luke quickly implemented stricter Argo CD project configurations:
- Tenant A’s project was now restricted to deploying only into Tenant A’s namespace.
- He emphasized setting the default project to empty configurations or even removing it entirely to prevent accidental broad permissions.
- He also added RBAC configurations directly into the projects, defining precisely what actions each tenant role could perform.
After the cleanup, Christian tried to deploy his rogue workload again. Blocked! Argo CD prevented the deployment into the unauthorized namespace.
💪 The Takeaway: Achieving true multi-tenant isolation requires a dual-layer approach: network isolation with Cilium and orchestration isolation with finely-tuned Argo CD projects and RBAC.
Scaling Security: GitOps and Argo CD Application Sets 📈
With five tenants, manual configuration of policies and projects was manageable.
But the mandate was clear: 100 tenants by Monday! Handcrafting configurations
for each tenant is simply not sustainable. Plus, out-of-band changes (someone
kubectl editing a Cilium policy) could easily undermine security.
🌟 The Solution: GitOps to the rescue, specifically Argo CD Application Sets.
- Luke demonstrated how Application Sets use generators (like a Git generator based on folder structure) to dynamically create Argo CD applications from templates.
- By simply adding a new folder for “Tenant C” to the Git repository and pushing it, Argo CD automatically onboarded the new tenant, deploying their workload, project, and all necessary Cilium policies.
- Self-healing became a critical feature: if anyone made an out-of-band change to a Cilium policy, Argo CD would detect the drift and automatically revert it to the desired state defined in Git.
💯 The Impact: Automation, consistency, and unparalleled scalability. Onboarding new tenants became a simple Git commit.
Seeing is Believing: Network Observability with Hubble 👁️
With hundreds of tenants and policies, how do you know your network isolation is truly working?
🔭 The Answer: Hubble, Cilium’s built-in observability platform. Hubble provides:
- Visualizations of network flows.
- Real-time insights into policy denials (like that
403 Forbiddenfor the admin path). - Layer 7 details (HTTP paths, methods) for blocked requests.
This allows platform engineers to verify policies, troubleshoot issues, and see exactly why traffic is being blocked, even in a massive, complex environment.
The Ultimate Lockdown: Identity-Aware Isolation 🔐
The final frontier of security: Argo CD’s sync process itself. By default, Argo CD runs with a highly privileged service account (often cluster-admin). If a rogue tenant could somehow inject into this process, they could gain immense power.
🛡️ The Ultimate Defense: A two-pronged approach:
- Argo CD Service Account Impersonation: Luke enabled this feature in Argo
CD. This allows Argo CD to impersonate a specific, less privileged service
account when syncing a tenant’s application. He restricted Tenant A’s
project to use a dedicated
tenant-Aservice account. - Cilium Service Account-Aware Network Policies: Cilium allows you to write network policies that not only match IP addresses, ports, and HTTP paths but also Kubernetes service accounts.
By combining these, the platform achieved identity-aware isolation. Even if an application sync somehow went rogue, the impersonated service account would lack the necessary permissions, and Cilium would block any unauthorized network traffic based on that specific service account’s identity.
Your Perfect Platform, Truly Achieved! ✨
Luke and Christian masterfully demonstrated how to build a truly robust, scalable, and secure multi-tenant Kubernetes platform. Their architecture provides:
- Layer 3 to Layer 7 Traffic Isolation with DNS-aware egress using Cilium Network Policies.
- Orchestration Boundary and RBAC with Argo CD Projects.
- Tenant Self-Service within defined boundaries.
- Scalability for platform engineers and tenant teams using Argo CD Application Sets.
- Self-healing capabilities via Argo CD to prevent out-of-band configuration drift.
- Observability and Verification of network policies with Hubble.
- Identity-Aware Isolation by combining Argo CD Service Account Impersonation with Cilium Service Account-Aware Network Policies.
This is how you move from a seemingly “perfect” but vulnerable platform to one that is genuinely secure, scalable, and resilient. It’s not just about installing tools; it’s about thoughtfully integrating them into a comprehensive GitOps-driven strategy. Your perfect platform awaits!