pidfd: What have we been up to?

Presenters

Christian Brauner

Source

All Systems Go! 2025

Diving Deep into Linux Namespaces: A Look at PFDs and the Future of Containerization 🚀

Ever wondered how Linux manages containers and isolates processes? It’s a surprisingly intricate dance of kernel features, and the recent presentation by Christian Hoelzl, alongside David Howell and Joseph Saffer, offered a fascinating glimpse into the ongoing evolution. This isn’s just about making containers work; it’s about designing a kernel that’s flexible, secure, and adaptable. Let’s break down the key takeaways – it’s a journey worth taking! 🛠️

What are PFDs and Why Do We Need Them? 💡

At the heart of this evolution lies a concept called Process File Descriptors (PFDs). Think of them as supercharged file descriptors that don’t just point to files, but to entire processes. 🤯 Why the shift? Because traditional methods of identifying and managing processes were becoming cumbersome and limiting. PFDs offer a more consistent and powerful API, enabling new features and improvements to kernel functionality.

Here’s a quick rundown of the core concepts:

Globally Unique Identifiers: Each PFD instance is globally unique across the entire system, crucial for consistent identification.
Namespaces are Key: PFDs are deeply intertwined with namespaces (PID, mount, network, user, etc.). They provide a way to uniquely identify processes within specific namespaces.
PITFs and Sentinel Values: The move towards PFDs involves using PITFs (Process Identifier Files), with new sentinel values like pitf_self and pitf_threat_group simplifying common operations.

Recent Developments: Unique Identifiers & API Enhancements 🌐

The presentation highlighted some exciting new features and improvements:

Unique Namespace Identifiers (“Cookies”): These non-recyclable identifiers are now available for all namespace types. Accessing them is currently done via socket options – a bit of a historical quirk.
System Call Evolution: The kernel is gradually adopting PFDs where regular file descriptors were previously used, often by “overloading” existing system calls. This is an incremental process, leading to some API compromises.
Mount API Injection: The ability to inject mounts into containers using a specific file descriptor provides increased flexibility in container creation and management.
Systemd Integration: The design has been heavily influenced by systemd, with consideration for how these technologies can be exposed and utilized within the systemd ecosystem.

Navigating the Challenges: Complexity vs. Flexibility 🎯

This isn’t a straightforward path. The team openly discussed the challenges and tradeoffs involved:

Incremental Changes & API Design: The incremental transition to PFDs means dealing with awkward API designs and compromises.
The Container Object Debate: Should the kernel have a dedicated “container object”? While it would simplify things, it would also limit flexibility and potentially miss out on benefits from systemd.
UID/GID Mapping Limitations: The current UID/GID mapping system, while flexible, has limitations on the number of containers and potential issues with high UIDs/GIDs.
Security is Paramount: The complexity of the system introduces potential security vulnerabilities, something the team is keenly aware of. They’re striving to avoid creating new “foot guns” and CVEs.

Looking Ahead: The Future of Linux Namespaces ✨

So, what does the future hold for Linux namespaces? Here’s a glimpse:

Namespace Iteration API: A clean and efficient API to iterate through all namespaces is on the horizon.
PFD Tag Inheritance: A robust and secure mechanism for inheriting PFD tags across fork calls is a key area of development.
Generalization of Extended Attributes: The possibility of allowing userspace to define and manage their own extended attributes, with careful consideration for security and resource usage, is being explored.
Pragmatism Over Abstraction: The team emphasizes a pragmatic approach, preferring to provide low-level building blocks that userspace tools can use to build higher-level functionality. They’re wary of introducing overly complex, high-level abstractions directly into the kernel.

Key Takeaways: A Kernel Developer’s Perspective 💾

This presentation wasn’s just about technology; it was a window into the mindset of kernel developers. Here’s what stood out:

David Howell’s Pragmatism: He prioritizes incremental improvements over grand, sweeping changes.
Focus on Low-Level Building Blocks: The team prefers to provide foundational tools, letting userspace tools handle higher-level abstractions.
Resistance to “Container” Concepts: There’s a reluctance to define “containers” or other high-level abstractions within the kernel itself.
Collaboration is Key: The team is open to collaborating with userspace tools like systemd to leverage the power of PFDs.

It’s a fascinating journey, demonstrating the ongoing evolution of Linux and the constant balancing act between flexibility, security, and practicality. 📡

Diving Deep into Linux Namespaces: A Look at PFDs and the Future of Containerization 🚀#

What are PFDs and Why Do We Need Them? 💡#

Recent Developments: Unique Identifiers & API Enhancements 🌐#

Navigating the Challenges: Complexity vs. Flexibility 🎯#

Looking Ahead: The Future of Linux Namespaces ✨#

Key Takeaways: A Kernel Developer’s Perspective 💾#

Appendix#