Presenters
Source
From Bare Metal to Bulletproof: Securing Your Cloud-Native Kingdom with TPMs and Spire 🚀
The quest for secure, modern infrastructure is a constant battle, especially in the dynamic world of cloud-native. Geico Insurance, a company at the forefront of digital transformation, is tackling this challenge head-on, not just in the cloud, but right down to the foundational bare metal of their data centers. Tyler Shade, a Software Engineer at Geico, shared his team’s compelling journey into bootstrapping trust and building a truly secure, identity-first infrastructure.
The “Bare Metal” Blues: Bootstrapping Trust in a Hybrid World 🌐
Moving beyond the carefully managed environments of public clouds presents a unique hurdle: how do you establish the initial, unshakeable trust for your entire infrastructure system? In public clouds, providers handle this foundational piece. But in on-prem data centers, you’re building it from the ground up on commodity servers. The core problem? Uniquely and securely proving the identity of each machine. This is where the journey to a robust security posture truly begins.
Identity is King (and Queen, and the Whole Royal Court!) 👑
Shade champions an identity-first approach to security. Forget just locking the doors; we need to know who (or what) is knocking. This means the identity of processes, machines, and even network requests is always in scope. In today’s hybrid and multi-cloud landscape, a simple perimeter defense just won’t cut it. Identity, in its many forms – user, machine, and workload – is the bedrock of true security.
The Limits of Traditional Machine Identity 💾
While cloud providers offer platform-specific solutions for machine identity (like AWS IMDS or Azure Managed Service Identity), these don’t translate to the on-prem world. So, what are the traditional on-prem workarounds, and why do they fall short?
- Static Credentials: Think of this as a digital password – generally not ideal for long-term, secure machine identity.
- Network-Based Identity: Using DHCP for hostnames or IP addresses to define network zones doesn’t actually identify the machine itself. These are easily changeable.
- SSH Certificate Authority (SSHCA): A step up, vending unique host keys, but adoption hasn’t been widespread.
- X.509 Certificates: While better, these certificates can be exfiltrated, requiring stringent permission management and complex post-OS provisioning.
Enter the Trusted Platform Module (TPM): Your Hardware Guardian 🛡️
This is where the magic begins. Shade introduces the Trusted Platform Module (TPM), a secure cryptographic processor found in most modern computers and servers. Think of it as a tamper-resistant vault for your machine’s most sensitive secrets.
TPMs offer a suite of powerful capabilities:
- Hardware Random Number Generator: For truly unpredictable cryptographic operations.
- Secure Key Generation and Persistence: Keys are generated and stored securely, surviving reboots.
- System Integrity Verification: Hashing hardware and software configurations to detect tampering.
- Persistent Non-Volatile Memory: For storing sensitive key material.
- Volatile Memory: For session-based operations.
At its core, TPM 2.0 features:
- Endorsement Key (EK): A unique, manufacturer-burned key pair that acts as a unique credential for each TPM.
- Attestation Identity Key (AIK): A temporary key used to prove that an operation was performed by the TPM. Crucially, the TPM performs cryptographic operations internally, never exposing its private keys.
Spire and Spiffy: Orchestrating Identity from the Ground Up 🛠️
But how do you harness the power of TPMs for machine identity at scale? Enter Spiffy (a standard for attesting and issuing identities) and Spire (a CNCF project that implements the Spiffy spec). Spiffy is gaining broad support across the cloud-native ecosystem.
Spire’s elegant architecture involves:
- A centralized server with a signing Certificate Authority (CA).
- An agent running on each node, responsible for attesting the node’s identity to the server.
- In return for successful attestation, the server issues a secure certificate to the agent.
This attestation-based approach, powered by TPMs, is the key to bootstrapping a machine’s root of trust.
The Spire TPM Plugin: A Masterclass in Secure Bootstrapping 💡
Shade dives deep into the Spire TPM plugin, the secret sauce for enabling TPM attestation. Here’s how this secure bootstrapping flow unfolds:
- Agent Awakens: The Spire agent on a node initiates contact with the Spire server.
- Server Demands Proof: The server begins the agent attestation process.
- TPM Steps In: The Spire agent delegates the attestation task to the TPM plugin, which then opens a secure session with the TPM.
- EK & AIK in Action: The TPM plugin retrieves the EK from the TPM and uses it to generate a new AIK.
- EK Validation: The Spire server hashes the public EK and checks it against a pre-defined allow list. This ensures it’s a known and trusted TPM.
- The Challenge: The server sends a challenge encrypted with the EK to the TPM. The TPM must use its EK private key to decrypt a secret and then use the AIK to certify its presence.
- TPM’s Decryption Dance: The TPM decrypts the secret with its EK private key and returns the decrypted value via the AIK.
- Server’s Victory Lap: The server compares the returned secret with its original. A match means success!
- Unique Machine Identity Achieved: With successful validation, the server now has a unique and cryptographically verified identity for that machine.
Building the Future: Opportunities and Next Steps ✨
Shade also highlighted areas where the community can contribute, particularly in testing TPM PCR values to ensure consistent server configurations during boot. Geico is actively hiring for individuals eager to tackle these complex infrastructure and security challenges, especially those passionate about controlling their own data center environments.
For those keen to dive deeper, Shade recommends:
- “A Practical Guide to TPM 2.0”: A comprehensive, free resource.
- Code References: Dive into the technical implementation details.
During the Q&A, valuable insights emerged:
- Geico utilizes a streamlined version of Ubuntu 24 for its infrastructure.
- While running Kubernetes on OpenStack virtualization, passing host TPMs to VMs presents challenges due to live migration limitations. They are actively exploring vTPMs for this purpose. VM attestation at the hypervisor level is also leveraged via OpenStack’s instance metadata.
This deep dive into TPMs and Spire showcases a powerful, identity-first approach to securing cloud-native workloads from the very foundation. It’s a testament to building trust, one securely attested machine at a time.