Presenters

Source

🚀 From 15,000 Lines of Code to Markdown: The Future of Agentic Programming

In the world of software development, we often equate progress with more code, more features, and more complexity. But what if the secret to building better tools isn’t writing more code, but deleting it?

At a recent talk, David Gomes shared a radical transformation within Cursor, the AI-powered code editor. By leveraging the power of LLMs and Markdown, the team managed to replace a massive, complex feature with a few simple instructions.


🌳 The Foundation: Understanding Git Worktrees

Before diving into the “how,” we need to understand the “what.” In Cursor, the team utilizes a feature called Git worktrees.

Think of a worktree as a separate checkout of your repository. It allows different agents to work on different tasks in parallel without interfering with your primary workspace.

  • Isolation: Anything the agent runs—commands, lints, or tests—stays scoped to that specific worktree.
  • Parallelism: You can have a grid of agents working for you simultaneously.
  • Best-of-N: You can give the same task to multiple models (like GPT-4, Claude Opus, or Grok) and compare their implementations side-by-side.

🏗️ The Old Way: 15,000 Lines of Complexity

When Cursor 2.0 launched last October, the worktree feature was a beast of engineering. To make it work, the team had to build and maintain:

  • Logic for creating and managing worktrees.
  • Security boundaries to ensure agents didn’t escape their assigned worktree.
  • Setup scripts that users configured to run whenever an agent started.
  • A judging harness to rank which model’s implementation was best.
  • Cleanup logic to prevent users’ disks from blowing up after spinning up hundreds of worktrees.

This implementation spanned roughly 15,000 lines of code. It was heavy, hard to maintain, and rigid.


💡 The Breakthrough: Markdown as the New Code

David and his team realized they could replace this entire infrastructure by combining two existing Cursor primitives: Agent Skills and Sub-agents.

Instead of hard-coded logic, they wrote Markdown instructions. The new implementation of the best-of-n feature is only about 40 lines of text.

How it Works 🛠️

Users now use simple slash commands:

  • /worktree [task]: Spins up an agent in an isolated worktree.
  • /best-of-n: Instructs a parent agent to create sub-agents for different models, wait for them to finish, and then provide a comparison table of the results.

These aren’t just static files; they are dynamic prompts controlled on the backend. This allows the team to iterate on the feature’s logic instantly without requiring users to update their Cursor version.


⚖️ The Tradeoffs: Efficiency vs. Control

Moving from hard-coded logic to vibes-based Markdown isn’t without its challenges. David highlighted several key tradeoffs:

The Pros ✅

  1. Massive Deletion: Removing 15,000 lines of code significantly reduces the maintenance burden.
  2. Multi-Repo Support: The old version failed in multi-repo setups; the Markdown-based agent handles it naturally by creating worktrees across all relevant repos.
  3. Superior Judging: The parent agent now has more context. It can even stitch together the best parts of different model implementations upon request.
  4. Flexibility: Users can now switch into a worktree halfway through a chat, which was previously impossible.

The Cons ❌

  1. Hallucinations: Without hard-coded boundaries, a model might occasionally go haywire and edit files outside the worktree.
  2. Perceived Speed: Because you see the agent “thinking” through the worktree creation in the chat, it feels slower, even if the actual execution time is similar.
  3. Discoverability: Removing the dedicated UI dropdown makes the feature a power user secret hidden behind a slash command.

🧪 Making Agents Better with Evals and RL

To solve the issue of agents “staying on track,” David is turning to Evals and Reinforcement Learning (RL).

By using tools like Brain Trust and the Cursor CLI, the team runs headless evaluations to score models on two criteria:

  1. Did the model do the work in the worktree as expected?
  2. Did the model accidentally touch the primary checkout?

Early data shows that while smaller models like Haiku often deviate, flagship models like Composer and Grok perform much better. The team is now integrating these scenarios into their RL pipeline so that future versions of their custom model, Composer, will be natively “worktree-aware.”


🌐 The Road Ahead: Cursor 3.0 and Beyond

The journey doesn’t end with Markdown. David teased several upcoming shifts:

  • Cursor 3.0: A new, more agentic interface is coming. This UI is designed specifically for local parallelization and will feature a more native (but still lightweight) worktree implementation.
  • Beyond Git: Git worktrees can be slow and disk-heavy. The team is exploring new parallelization primitives that don’t rely on Git at all, aiming for even faster and more universal local isolation.

💬 Q&A Highlights

  • Q: How do you handle different operating systems?
  • A: The Markdown skill includes specific instructions for Windows, Linux, and macOS to ensure the agents use the correct shell commands for each environment.

David’s message is clear: the most powerful code you can write might actually be a well-structured Markdown file. By trusting the model to handle the “how,” developers can focus on the “what.” 🚀✨

Appendix