Presenters

Source

🚀 Agentic Systems Without Chaos: Building the Next-Gen AI Operating Model

If your team has AI running in a proof of concept but struggles to deploy it reliably in production, you are navigating the most critical gap in modern engineering. In this episode of the Next Generation Playbook for AI, Shweta Vohra sits down with industry veteran Joseph Stein to decode how we move from simple chatbots to autonomous agentic systems without descending into architectural chaos.

Joe, who has been engineering complex systems since 1997, views this moment not just as an evolution, but as an entirely new domain in the IT Venn diagram. 🌐


🎯 Defining the Agentic Shift: Beyond the Chatbot

We often confuse simple automation with agentic behavior. Joe clarifies the distinction with a sharp focus on determinism versus non-determinism.

  • The Non-Agentic World: These are the chatbots and deterministic systems we know. They function like a compiler; you know exactly what they will do. They are functional, idempotent, and lack side effects. 🤖
  • The Agentic World: These systems plan, act, and execute. Imagine a production response system that detects an anomaly and makes a non-deterministic decision. It calls APIs (tools), gathers info, and orchestrates a goal—like isolating a compromised server—in 30 seconds instead of the 45 minutes a human might require.

The “aha” moment for agentic AI arrives when the system evolves its own path. Joe highlights tools like Open Claw, Nanobot, and Picobot (which can now run on a Raspberry Pi) as examples of the smartness emerging in the open-source community. 👾


🛠️ The Architect’s New Mission: Boundaries and Orchestration

Moving from pioneering to stability requires a radical shift in how we design systems. We are moving from the Co-pilot era to the Command Center era. 🕹️

Key Architectural Pillars:

  1. Boundaries: You must define the limits of an autonomous unit and the boundaries between multiple units.
  2. Orchestration at Scale: It is no longer about one agent; it is about managing tens of thousands of agents making simultaneous API calls.
  3. Evidence of Action: Architects must provide a backup of evidence for every action an agent takes to satisfy the organization’s risk appetite.
  4. Observability: We need a new layer of metrics. We must observe prompts, tool calls, and orchestrations with the same rigor we apply to traditional distributed systems. 📡

⚠️ New Risks, New Responsibilities

Agentic systems bring 10x opportunity but also 10x responsibility. Traditional security models are insufficient for the following threats:

  • Prompt Injection & Hijacking: Research like the Morris II worm shows how an email agent could be hijacked to take over the orchestration layer. 🐛
  • Token-Drain DOS: A denial-of-service attack no longer just causes downtime; it exhausts tokens, which translates directly into massive financial costs. 💸
  • Supply Chain Malware: LLMs can be trained to generate code that includes hidden backdoors or malware.
  • Tool Chain Escalation: Standards like MCP (Model Context Protocol) act like the stored procedures or EJBs of the future. Without strict intent control, agents might call APIs incorrectly due to limited context windows.

🏗️ Scaling with a Centralized AI Platform

Joe shares a blueprint from his own organization, where they integrated 350+ products onto a central AI platform. This approach ensures governance while enabling rapid innovation. 🏢

The Tech Stack & Operating Model:

  • Infrastructure: Private cloud data centers running open-source models on dedicated GPUs.
  • Vector Database: A rag-as-a-service system built on PG vector with an 8-step data pipelining process for hybrid search.
  • Integration: Deep integration with ServiceNow and CMDB to track every AI interaction.
  • Agentic Studio: A tool called Work HQ allows non-coders to wire agents together, set prompts, and process data sets in production.

💰 The Economics of AI: PhDs vs. Master’s Degrees

Sustainability and cost are the next great frontiers. Joe argues that we will soon look at token bills with the same shock we once felt looking at cloud bills. 📉

To manage costs and Total Cost of Ownership (TCO), Joe suggests a tiered model approach:

  • The PhD Models: Large, “thinking” models (like Qwen 383B) for complex reasoning.
  • The Master’s Degree Models: Smaller, faster models (like Llama 3 8B or Qwen Vision) that are just good enough for tasks like invoice processing.

By using over-subscription and rate limiting on a fixed amount of GPU chips, teams can maximize their hardware while avoiding the trap of always chasing the newest, most expensive model.


💡 Q&A: Insights from the Field

Shweta: Should organizations wait for standards to emerge before starting? Joe: Start yesterday. 🚀 You don’t have to ship to production immediately, but if you don’t understand these tools, your competitors will. Even if a standard like MCP changes, the experience you gain in data exchange is foundational.

Shweta: What was your biggest failure over the last year? Joe: My success. The system exploded so fast that we were firefighting to lay down tracks while the train was moving at 90 miles per hour. We had 250 users within three months, and half of them ignored the do not use in production label. Be prepared for scale from day one. 🎢


🔮 Looking Ahead: December and Beyond

By the end of the year, Joe predicts a shift toward Agent-to-Agent (A2A) communication. We will stop emailing invoices and instead allow agentic systems to communicate directly via new standards. We are also moving toward Natural Language Programming for hardware, where you become the product manager of your own devices. 🦾

The Bottom Line: Be in a hurry to learn, but do not rush a half-hearted solution into the wild. The future belongs to those who can build predictable systems in a non-deterministic world.

Ready to build? Explore the patterns that scale at boston.qcon.ai.

Appendix