Presenters

Source

From Data to Decisions: Governing AI in Production ๐Ÿš€

Hey tech enthusiasts! ๐Ÿ‘‹ We’re wrapping up an incredible day at MongoDB.local.london, and what better way to cap it off than with a deep dive into making AI production-ready, especially in regulated industries? Our session, “From Data to Decisions: Governed AI in Production,” featured Deepak Deshpande from Intellect and Boris Bialek, who shared invaluable insights from their real-world deployments. Get ready to level up your AI game! ๐Ÿ’ก

The Challenge: Bridging the Gap from POC to Production ๐ŸŒ‰

Many organizations struggle to move AI from proof-of-concept (POC) to production. While model development gets a lot of attention, the real hurdles lie in building a robust architecture and ensuring proper governance. Deepak highlighted a critical use case with a sovereign wealth fund managing trillions of dollars. They needed to analyze ESG (Environmental, Social, and Governance) factors across thousands of companies, focusing on complex areas like oceans, taxes, and child labor.

The Problem: Previously, they could only achieve this for a mere 2% of their investments, on a selective basis. The sheer volume of unstructured data and the need for decision-grade data to make multi-billion dollar investment decisions presented a massive challenge. The core issue wasn’t just finding information, but ensuring its completeness, accuracy, and relevance for enterprise-level decisions.

Why AI Productionization Fails: Beyond Just Models ๐Ÿค–

Deepak outlined four typical patterns where AI retrieval fails:

  1. Weak Retrieval: Over-reliance on semantic search while neglecting the power of lexical search for specific terms and dates.
  2. Fragmented Context: Storing data in separate silos (operational data, documents, vectors) leading to data drift and inconsistencies.
  3. Stale Knowledge: Ensuring the AI’s knowledge base remains fresh and relevant, especially when dealing with rapidly changing data like ESG factors.
  4. Delayed Governance: Treating governance as an afterthought, leading to insecure and unsafe AI-driven decisions.

Boris emphasized a key point: “A good architecture and design gives you 100% improvement on algorithms. An awesome developer gets you 10% improvement.” The foundation matters more than just optimizing the model itself.

The Intellect & MongoDB Solution: A Unified Foundation ๐Ÿ› ๏ธ

Intellect’s Purple Fabric platform, built on MongoDB Atlas, addresses these challenges head-on. Their approach prioritizes:

  • Separating Workflow from Retrieval: Keeping retrieval logic atomic and distinct from workflow logic to avoid complexity.
  • Sourcing and Ingestion Excellence: Ensuring data is sourced and ingested correctly, with MongoDB Atlas serving as the robust foundation.
  • Inherited Freshness and Governance: Knowledge freshness and policy enforcement are built into the ingestion process, not an afterthought.

Deepak stressed that real-time data isn’t “free.” A mere one-minute ETL process can be insufficient when markets can shift drastically in that time.

A Real-World Query: Equinor’s Methane Reduction ๐Ÿ“Š

To illustrate their approach, they used a complex real-world query: “What is Equinor’s methane reduction target for 2030 and what is the policy and strategy to achieve it? Evidence this.”

This query has five critical criteria:

  1. Entity: Equinor
  2. Metric: Methane reduction
  3. Timeframe: 2030
  4. Policy & Strategy: Information potentially spread across multiple documents.
  5. Evidence: Supporting documentation.

The Pitfall: If even one criterion is missed, the entire answer is wrong.

Vector-Only vs. Hybrid + Re-rank Retrieval ๐ŸŽฏ

They compared two retrieval architectures for the same query, model, and prompt:

  • Vector-Only: Focused on semantic understanding, it missed crucial lexical details like company names and dates. It confidently provided an answer about Shell Corporation instead of Equinor and focused on methane but not the specific policy.
  • Hybrid + Re-rank: This approach combined lexical search (for names, dates) with semantic search (for intent) and then applied a re-ranking step. The result? An accurate answer that met all five criteria.

Key Takeaway: The retrieval architecture, not just the model, delivered the accurate outcome.

Building a Production-Ready Retrieval Architecture ๐Ÿ—๏ธ

The journey involves meticulous steps:

  1. Sourcing Messy Documents: Ingesting data from PDFs, images, and other formats, ensuring fidelity is protected.
  2. Intelligent Chunking: Avoiding fixed token windows that lose accuracy. Chunking must respect semantic boundaries and preserve context, especially for tables.
  3. Metadata is King: Using metadata as the cheapest and most effective filter, especially when dealing with millions of documents. This drastically reduces the search space.
  4. Document Design over Vector-Only: Focusing on structuring documents intelligently, utilizing parent-child hierarchies, and optimizing token sizes for efficient search while allowing for long context retrieval.
  5. Schema for Filtering and Reconstruction: Using a defined schema with parameters to filter correctly, reconstruct information, and enforce policies during ingestion. Governance is integrated from the start.
  6. Retrieval Pipeline Order Matters: Prioritizing the cheapest filters first (metadata), then using hybrid search with Reciprocal Rank Fusion (RRF) to combine vector and lexical results, followed by re-ranking before sending to the LLM.

Boris highlighted that this integrated workflow, leveraging MongoDB, reduces the document set from thousands to just a handful for the LLM to process, significantly improving quality, speed, and cost.

The Metrics That Matter: Accuracy, Performance, and Cost ๐Ÿ’ฐ

The session presented real production metrics:

  • Accuracy Jump: Moving from vector-only (71% accuracy) to hybrid + RRF resulted in a 17% accuracy increase (to 88%). Re-ranking further boosted this.
  • Stability and Latency: The final chosen architecture halved the P95 latency and dramatically improved production stability, addressing issues like throttling and model availability. This focus on stability was the primary driver, not just cost or performance.
  • Cost Efficiency: While POCs are cheap, production at scale becomes expensive. Their architecture optimizes for cost by minimizing the data sent to the LLM. For instance, a million queries that might have been prohibitively expensive with a less optimized approach became manageable.

The Last 9%: Workflow and Trust ๐Ÿค

While achieving 91% accuracy is impressive, the remaining 9% is handled through:

  • Follow-up Questions: Instead of guessing, the system asks clarifying questions for ambiguous queries.
  • Data Gap Identification: Recognizing when required data is missing and flagging it.
  • Human-in-the-Loop: Crucial for financial services, ensuring human oversight and intervention.
  • Feedback Loops: Continuously improving the system based on user interactions.

The key is converting this remaining 9% into workflows and designing the user experience to build trust, even when the AI is not entirely certain. Knowing where the AI is unreliable is part of making it trustworthy.

Key Takeaways for Productionizing AI: ๐Ÿ”‘

  1. Strengthen Your Stack, Don’t Just Upgrade Models: AI productionization is an architectural and engineering problem.
  2. Metadata is Your Best Friend: Collect and leverage metadata for efficient pre-filtering. MongoDB is excellent for this.
  3. Embrace Hybrid Search with RRF: Combine vector and lexical search for maximum accuracy.
  4. Re-rank with a Pre-filtered Top K: Treat re-ranking as a product decision, benchmark it, and make informed choices.
  5. Governance is Non-Negotiable: Integrate governance from the pipeline’s inception, not as an afterthought.

The session concluded with a powerful demonstration of their process, enabling a full production deployment in as little as six to eight weeks. This repeatability and structured approach, powered by Intellect’s Purple Fabric and MongoDB Atlas, is the key to successful, governed AI in production.

Thanks for sticking with us, and now it’s time to enjoy the happy hour! Cheers! ๐Ÿป

Appendix