Presenters
Source
Hugging Face’s Rocket Ride: Scaling AI with MongoDB Atlas 🚀
Hey tech enthusiasts! 👋 Arek Borucki, a Machine Learning Platform and Database Engineer at Hugging Face, recently shared an incredible story about how they’ve scaled their platform to serve nearly 3 million models to a global community of developers, all powered by MongoDB Atlas. If you’re curious about how to handle massive growth in the AI space, buckle up – this is for you!
The Hugging Face Phenomenon: Scale & Growth 📈
Hugging Face isn’t just a niche platform anymore; it’s a powerhouse in the AI world. Let’s look at the numbers that Ar showcased:
- 13+ Million Users: A rapidly expanding user base.
- Nearly 3 Million Public Models: A staggering number, and it’s growing fast. Just two years ago, this number was a mere 20,000!
- 500,000+ Datasets: Fueling the AI revolution.
- 50,000+ Organizations: From hobbyists to Fortune 500 companies (over 30% of Fortune 500 use Hugging Face!).
MongoDB is the backbone of Hugging Face, serving almost all their use cases. While they currently use a replica set, they’re gearing up to shard their collections and leverage sharded clusters to handle even more users, data, and throughput.
Architecture Deep Dive: What Powers the Hub? 💡
Ever wondered how Hugging Face serves all those models so efficiently? Ar broke it down:
- The Flow: User requests hit the Hub frontend, then the Hub API, and finally MongoDB Atlas.
- Source of Truth: MongoDB Atlas is the source of truth for all metadata.
- Model Storage: Crucially, Hugging Face doesn’t store the models themselves in MongoDB. Instead, MongoDB stores pointers to the model artifacts, which physically reside in S3. This is a key architectural decision for efficiency.
- Atlas Search for Speed: They leverage MongoDB Atlas Search to index model cards, dataset cards, and Python application files. This powers their lightning-fast, full-text search capabilities on the Hub.
A Glimpse into the Hugging Face Hub (Live Demo!) 💻
Ar gave us a peek at the Hugging Face Hub, highlighting its four pillars:
- Models: Searchable by a multitude of attributes (task, parameters, libraries like Transformers). The speed of these searches? Directly powered by MongoDB Atlas metadata.
- Datasets: Similar to models, datasets can be quickly filtered and searched, with MongoDB Atlas enabling efficient querying. Ar even shared a personal anecdote about using MongoDB embeddings from Hugging Face for learning Atlas Vector Search!
- Spaces: The home for AI applications. Ar even showed off the Hugging Chat, an AI chatbot that also uses MongoDB for its metadata, settings, and conversations. This highlights how MongoDB powers multiple critical services within Hugging Face.
- Buckets (New Feature!): A newer offering, these are cheaper storage solutions for AI agents and teams, leveraging Hugging Face’s CDN and deduplication. Guess what powers these new buckets? A brand new MongoDB Atlas cluster!
Infrastructure & Scaling Strategies 🛠️
Hugging Face runs a robust MongoDB Atlas setup:
- 7-Node Replica Set: This configuration includes three electable nodes (any can become primary) and three read-only nodes to isolate heavy queries. They also utilize one hidden analytics node for ad-hoc queries and dashboards, keeping analytical traffic separate from production workloads.
- Read Preferences: They strategically use
secondaryPreferredto offload read traffic from the primary andnearestfor specific tuning scenarios, depending on consistency needs. - Performance: They utilize fast SSDs and, in some cases, local NVMe disks for enhanced database performance.
- Atlas Autoscaling: A lifesaver! Atlas automatically scales storage when disks reach 80-85% capacity and can even scale cluster tiers during traffic spikes.
Data Modeling & Indexing: The Art of Efficiency 🎨
Ar emphasized that MongoDB’s flexible schema doesn’t mean no schema. Key strategies include:
- Precomputation: Using the MongoDB Compute pattern to precompute data, reducing load on the production database.
- Embeddings Wisely: Using embeddings where appropriate but being mindful of the 16MB document limit. Only embed essential information.
- Strategic Indexing: Employing compound and multi-key indexes. Ar performs daily log analysis (using tools like Keyhole) to identify and fix queries that aren’t using indexes effectively, preventing performance bottlenecks. They also monitor index usage to drop unused indexes.
- Atlas Search & Vector Search: Beyond full-text search, they’re exploring and planning to leverage Vector Search for even more advanced AI capabilities.
The Road to Sharding: Preparing for the Future 🌐
As Hugging Face continues its explosive growth, they’re actively preparing for sharding.
- Why Shard? To handle increasing users, data, and throughput.
- Sharded Clusters: These have more components (config servers, routers, balancers), but Atlas automates much of the complexity.
- The Shard Key Challenge: Choosing the right shard key is critical. Ar is actively using MongoDB’s sampling features and shard key analyzers to make informed decisions, moving away from guesswork.
Beyond the Hub: Hugging Chat & Mongoku 💬
- Hugging Chat: This open-source chatbot, powered by MongoDB for metadata and conversations, showcases another vital use case.
- Mongoku: Hugging Face uses and open-sourced Mongoku, a web GUI for MongoDB. It’s a fantastic tool for internal teams to share queries and manage their databases efficiently.
Ar concluded by thanking everyone and highlighting the pleasure of sharing Hugging Face’s scaling journey with MongoDB Atlas. It’s clear that with smart architecture, strategic use of database features, and a focus on performance, Hugging Face is building a truly remarkable platform for the future of AI! ✨