Presenters

Source

Unleashing the Power of eBPF: Cilium’s DNS Parsing Revolution 🚀

Hey tech enthusiasts! Ever found yourself frustrated by network policies that feel a bit… clunky? Especially when dealing with those ever-changing IP addresses tied to domain names? Well, get ready for some exciting news! Hant, a rockstar Cilium CNCF maintainer and principal engineer at Microsoft on the Azure container networking team, has just dropped a bombshell: Cilium can now parse DNS directly within eBPF! This isn’t just an incremental update; it’s a leap forward that promises to transform how we handle FQDN (Fully Qualified Domain Name) network policies.

Let’s dive into what this means for you and your Kubernetes clusters.

The Old Way: A Tangled Web of User-Space Proxies 🕸️

Before this breakthrough, Cilium’s FQDN network policies worked by intercepting DNS requests. These requests were then shunted off to a user-space DNS proxy. This proxy was the middleman, resolving domain names to IP addresses, checking them against your policies, and then feeding that information back to the Cilium agent.

While this system got the job done, it came with a baggage train of challenges:

  • The Coupling Conundrum: The DNS proxy was tightly coupled with the Cilium agent. Think of it like this: if the agent sneezed, the proxy caught a cold and went down too. This meant that during upgrades or restarts, DNS resolution would fail, leading to those dreaded “blackholed” requests. 😩
  • HA Hiccups: Even with fancy HA (High Availability) setups for standalone proxies, updating the data path with new policies when the Cilium agent was down was a major headache. This “HA” mode was only truly effective if domain names had already been resolved.
  • DNS Protocol Puzzles: Let’s be honest, the DNS wire protocol, born in 1987 (RFC 1035), is a bit of a relic. Features like label compression and truncation made it a complex beast to parse, especially within the strict confines of eBPF programming. 📜
  • eBPF’s Past Limitations: Historically, eBPF struggled with robust loop support, making it tough to handle variable-length DNS responses. The eBPF verifier, designed for safety, also added layers of complexity to such tasks.

The New Era: eBPF-Native DNS Parsing Takes Flight! ✈️

This is where the magic happens! The Cilium team has engineered a solution that brings DNS parsing right into the kernel, leveraging the incredible power of eBPF.

Here’s how they’ve cracked the code:

  • Looping with Confidence: Kernel versions 5.3 introduced bounded loops, but the real game-changer is the BPF_LOOP helper function in kernel 5.17 and above. This allows eBPF programs to execute loops efficiently and verifiably, which is absolutely critical for parsing DNS messages of all shapes and sizes. 🦾
  • Mastering TCP with sockops and skb: DNS requests can sometimes switch from UDP to TCP when payloads get too large. To handle this gracefully, they’ve employed sockops and skb (socket buffer) eBPF program types.
    • A sockops program acts as a gatekeeper, intercepting socket connections and registering them in a sock_map.
    • Then, skb programs with stream_parser and stream_verdict attachment types kick in. The stream_parser ingeniously uses the first two bytes of the TCP payload (as per DNS over TCP RFCs) to determine the full DNS message length. 📏
    • The stream_verdict program then dives deep, parsing the entire DNS message, even handling multiple answers!
  • Pre-Allocated FQDN Identities: A smart move here is pre-allocating FQDN identities when network policies are created, rather than waiting until they’re needed. This simplifies everything and ensures identities are always ready. These identities are now elegantly stored in eBPF maps. 💾
  • The eBPF IP Cache: A dedicated eBPF map now serves as the central hub, holding the crucial mapping between IP addresses and their corresponding identities. This makes policy enforcement logic lightning-fast as it can directly query this map. ⚡

The Astonishing Impact: What This Means for You ✨

This eBPF-native approach isn’t just a technical feat; it brings tangible benefits:

  • Goodbye, Single Point of Failure! By ditching the user-space DNS proxy, Cilium eliminates a major potential point of failure. This means significantly enhanced cluster stability and availability. 🙌
  • True HA for FQDN Policies: Your FQDN policies are now resilient! Even if the Cilium agent takes a brief nap during a restart, FQDN policy enforcement continues seamlessly because the parsing logic lives in the kernel. 🛡️
  • Blazing Fast Performance: Kernel-level processing is inherently quicker than context switching to user-space. Expect reduced latency for DNS resolution and policy enforcement. Your applications will thank you! 💨
  • Fortified Security: Direct kernel-level parsing makes DNS hijacking a much tougher nut to crack. The entire process is observed and controlled within the secure eBPF environment. 🔐
  • Proxyless DNS Visibility? This innovation opens the door to potential proxyless DNS visibility, offering richer telemetry without the overhead of a dedicated proxy. Imagine getting more insights with less impact! 🧐

A Glimpse into the Future and How You Can Get Involved 🚀

The demo on a two-node Kind cluster was impressive! Watching the eBPF parser dissect DNS queries, handle compression, resolve IPs, and correctly block unauthorized domains was a testament to the power of this solution.

The journey isn’t over, though! Future plans include:

  • Community Feedback: Publishing a CFP (Call for Proposals) to gather feedback from the wider Cilium community.
  • Wildcard Wonders: Implementing support for wildcard matching in FQDN policies (e.g., *.something.com). 🃏
  • Performance Deep Dive: Rigorous benchmarking to quantify the exact performance gains. 📊
  • Expanding DNS Visibility: Leveraging the eBPF parser to enhance proxyless DNS visibility capabilities.

And here’s a golden opportunity: the Azure Container Networking team is actively looking for talented individuals to join them in building these cutting-edge networking solutions! If you’re passionate about this space, consider reaching out. 👨‍💻👩‍💻

This eBPF-native DNS parsing in Cilium is a monumental step forward. It’s a testament to the innovation happening in the cloud-native networking space and a clear indication of what’s possible when we push the boundaries of kernel technology. Get ready for more performant, reliable, and secure network policies!

Appendix