Presenters
Source
Unveiling the Security Secrets of Academic Open Source 🛡️: A Deep Dive into UC System Projects
Ever wondered about the security of the open source projects born from our academic institutions? Juanita, a PhD candidate at UC Santa Cruz and a dedicated Python community member, recently pulled back the curtain on the open source landscape within the University of California (UC) system. Her groundbreaking research reveals a picture that’s both fascinating and, frankly, a little concerning when it comes to security best practices. Let’s dive into what she discovered! 🚀
Mapping the Digital Footprint: How Many UC Projects Are Out There? 🗺️
Juanita’s journey began with a monumental task: mapping the vast ocean of UC-affiliated open source projects. Imagine sifting through thousands of repositories! She used clever keyword searches across campus systems and the power of GitHub’s REST API to gather data from an astonishing 52,000 repositories. To make sense of this deluge, she employed the smarts of machine learning and LLMs, estimating that approximately 50% of these repositories are indeed tied to the UC system. That’s a significant digital footprint!
The Uncomfortable Truth: Security Lapses in Academic Code 🚨
While many academic projects shine in their descriptions and READMEs (a positive sign of good documentation habits!), the security posture tells a different story. The findings are stark:
- Security Policies are Almost Non-Existent: A jaw-dropping 0.1% of the 25,000 identified projects actually have a security policy in place. Yes, you read that right. This is a critical gap that leaves many projects vulnerable.
- Licenses are Often Missing: Even among the top 2,000 projects ranked by stars (to focus on more established projects), a significant number lack detected licenses. Alarmingly, six of the top 10 projects with “no license detected” actually had no license at all, yet still held a criticality score above 0.4.
Gauging Importance: The Criticality Score 💯
To understand which projects matter most, Juanita turned to the OpenSSF’s criticality score. This metric, ranging from 0 to 1, reflects a project’s influence based on factors like contributor count and dependencies. Think of giants like Linux (0.8), Chromium (0.7), NumPy (0.7), and scikit-learn (0.74) as benchmarks.
Within the UC system, while the average criticality score for the top projects wasn’t sky-high, several projects did surpass the 0.5 mark. This highlights a mixed landscape where some academic projects carry considerable weight in the broader open source ecosystem.
Scorecards Reveal Widespread Security Weaknesses 📉
To get a clearer picture of project security, Juanita used the OpenSSF’s Scorecards. The average total score for the top 2,000 star projects was a modest 2.5, with a peak of 6.7. While comparable to some industry studies (which averaged 3.5 for peer-reviewed research repositories), it still signals a clear need for improvement.
Digging deeper into the Scorecards metrics revealed pervasive issues:
- Many Metrics Uncomputed: A common problem was the prevalence of "-1" values, meaning the metric couldn’t be calculated for a large chunk of projects. Metrics like “packaging” and “dangerous workflow,” which have high potential scores, were among those with the most uncomputed values. This suggests limited implementation and awareness.
- Low Averages Across the Board: Many individual metrics showed poor average scores, painting a consistent picture of security gaps.
Criticality vs. Security: A Developing Correlation 📈
Here’s a glimmer of hope! The data suggests a positive correlation between a project’s criticality score and its Scorecards total score. As projects become more important, their security practices tend to improve. However, this relationship isn’t perfect, and outliers remind us that even critical projects can have serious security blind spots.
Key Metrics Showing Improvement with Criticality:
- Binary artifacts maintained
- CI tests
- Code review
- Branch protection
- Licenses
- Dependency update tools
Interestingly, some metrics like “packaging” and “signed releases” showed no correlation with criticality, with the vast majority of projects having uncomputed values for these.
Four Pillars of Enhanced Academic Open Source Security 🛠️
Based on her findings, Juanita pinpointed four critical metrics that desperately need more attention:
- Token Permissions: A high-risk area with huge potential for improvement. 🔑
- Security Policy: Crucial, easy to implement, and offers substantial security benefits. 📜
- Static Analysis: This metric doesn’t improve with criticality, indicating a widespread need for adoption and education. 💡
- Signed Releases: The current low adoption rate needs a serious boost. Let’s turn those “zeros” into “ones” and “tens”! 🚀
The Bus Factor: A Looming Threat to Academic Projects 🚌
Perhaps one of the most alarming findings is the “bus factor” – the number of contributors needed to account for 60% of contributions. A staggering 86% of analyzed projects have a bus factor of one! This means the entire project could grind to a halt if a single individual is no longer involved. Over 95% of projects have a bus factor between one and two. Worryingly, this metric shows little correlation with criticality, meaning even highly important projects are often precariously balanced.
Static Analysis: Uncovering Hidden Vulnerabilities 🔎
A peek into running static analysis tools like SEMgrep on the top 100 most critical projects revealed over 2,000 triggered rules! The most frequent rule, “plain text HTTP link,” is a simple fix, raising questions about developer awareness or project inactivity. More concerning were findings like “null library function” and “run shell injection,” with some projects exhibiting a high number of these critical rule triggers. Juanita aptly suggests this points to a significant need for education, as researchers developing these tools aren’t always security experts.
Moving Forward: Education, Outreach, and Better Metrics 🤝
Juanita isn’t just pointing out problems; she’s proposing solutions:
- Direct Outreach: Sharing these findings directly with UC open source projects and even offering to help with minor security improvements via Pull Requests.
- Education is Paramount: Emphasizing that security should be a foundational part of development, not an afterthought. Open Source Program Offices (OSPOs) can be instrumental in providing basic security education to researchers.
- Rethinking Metrics: Exploring the need for improved metrics that offer more actionable insights into security.
Discussion Highlights: Academia vs. Industry & The Bus Factor Challenge 🗣️
During the Q&A, a key question arose: “Is academia not doing that much worse than industry?” Juanita suggests that while the findings are concerning, the core difference often lies in the scale of involvement and external support. Industry projects typically benefit from larger teams and more external scrutiny, which naturally drives best practices.
The challenge of increasing the bus factor was also a hot topic. Juanita proposed enhancing general open source practices, such as providing clear contributing guides, and leveraging showcase websites to boost awareness and encourage more contributions. The idea of creating template repositories with “sane defaults” for essential elements like codes of conduct and issue templates was also put forward as a practical way to improve project onboarding and adherence to best practices.
Juanita’s work is ongoing, and she’s actively seeking feedback and ideas to bolster academic open source security. This research is a crucial step in ensuring the vital projects born from our universities are as secure as they are innovative! ✨