Presenters

Source

Unbundling Data Systems: Inside Venice’s Architecture and the CAP Theorem 🚀

Welcome back to the GoTo Podcast, where we dive deep into the cutting-edge world of software development! Today, we’re thrilled to host Félix GV, a true rockstar in the data space who’s built planetary-scale data systems. He’s here to share his invaluable experience and insights, and we’re eager to learn from his knowledge.

Introducing Venice: A Deep Dive into an Unbundled Data System 🌐

Félix kicks off the conversation by introducing Venice, the distributed database he worked on for a decade at LinkedIn. Venice belongs to a category of databases called “unbundled” systems. This means each component of the database is a separate, standalone entity. Let’s break down the core components:

  • Write-Ahead Log (WAL) / Commit Log: Initially, Venice used Kafka for this crucial role. Kafka served as the write-ahead log, ensuring data durability, and also acted as the commit log for replicating data between different replicas.
  • Servers: These are Java processes, each housing a local RocksDB database. They handle network requests and can perform computations directly within the server.
  • Clients: Venice offers a flexible client ecosystem:
    • Remote Query Client: A traditional client-server architecture where requests travel across the network to the Venice servers.
    • Eager Cache Client: This client embeds the RocksDB database locally, allowing it to pull data directly into the application process. This pattern, known as the “eager cache,” essentially turns your client application into another follower replica of the database, offering significantly better performance by keeping data in local RAM or SSDs, though at a higher resource cost.
    • Change Capture Stream Listener: Clients can also tap into the stream of data changes.
  • Control Plane: A separate, distributed system responsible for metadata management, partition placement, and overall orchestration. It leverages ZooKeeper and Apache Helix.

The “unbundled” nature means that every single piece of Venice is a distributed system in itself – from Kafka and the server fleet to the control plane and the clients. This architecture is designed for massive scale and resilience.

Chaos Engineering and Load Testing: Ensuring Real-World Reliability 🛠️

When discussing reliability, Olymp brings up a recent infrastructure outage, leading to a crucial question: “Did you really have multi-zone?” Félix confirms that Venice did indeed have multi-zone capabilities and, importantly, exercised them rigorously.

They practiced a pattern similar to Netflix’s “chaos monkey,” which they called load tests. These tests involved simulating failures by draining traffic from data centers and concentrating it onto the remaining ones. For instance, in a three-data-center setup, they would simulate a single data center failure by concentrating traffic onto the remaining two, requiring each to handle over 50% of the normal load.

During these load tests, they would push the boundaries even further, concentrating more than 50% of peak traffic onto a single data center during peak hours. This proactive approach allowed them to identify and fix issues before they impacted production, underscoring the philosophy: “having reliability mechanisms that you do not regularly test basically means you don’t have reliability mechanisms.”

The Derived Data Model: Asynchronous Writes and Consistency Trade-offs ✍️

Venice is primarily a derived data system, meaning the data it hosts is machine-generated rather than user-generated. This design choice influences its write path, which is typically asynchronous. Data is ingested via streams (like Kafka) or batch jobs.

This asynchronous write path offers a significant advantage: very high ingestion throughput. However, it comes with a critical trade-off: the sacrifice of strong consistency, or linearizability. This means that after a write operation is acknowledged as durable (backed by the WAL), the data might not yet be indexed and immediately readable. You lose the “read your own writes” guarantee.

This differs from traditional databases like PostgreSQL, where writes are acknowledged only after being persisted to the WAL and after the in-memory indices are updated, thus providing linearizability. The nuance lies in the distinction between derived data systems (like Venice and Apache Pinot) and primary data systems, which prioritize immediate consistency for user-facing applications.

The discussion naturally leads to the CAP theorem, a fundamental concept in distributed systems that states you can only choose two out of three properties: Consistency, Availability, and Partition Tolerance. Félix provides a clear explanation:

  • Consistency (C): In this context, it means “read your own writes” or linearizability.
  • Availability (A): The system remains operational and responsive.
  • Partition Tolerance (P): The system continues to operate despite network partitions (communication failures between nodes).

The theorem emphasizes that you must choose Partition Tolerance (P), leaving you with two primary choices: CP (Consistent and Partition Tolerant) or AP (Available and Partition Tolerant).

Félix argues that derived data systems like Venice, due to their asynchronous write paths, are inherently AP systems. They are highly available, but they continuously sacrifice strong consistency, almost as if they are “continuously network partitioned” from a traditional database’s perspective due to high write latency.

He points out that this trade-off isn’t exclusive to derived data systems. Multi-region primary data systems that use asynchronous replication between regions also face similar consistency challenges, explaining the potential experience of seeing an order in one browser but not another due to asynchronous replication delays across geographically distributed data centers. The benefit of multi-region presence is increased availability and reliability, but it comes at the cost of continuous consistency compromises.

DuckDB Integration: Enhancing Data Exploration and Analytics 📊

A fascinating aspect explored is the integration of DuckDB, an in-process SQL OLAP database, with Venice. While initially a hackathon project to test Venice’s abstraction flexibility, it has opened doors for new use cases.

The motivation behind this integration includes:

  • Leveraging DuckDB’s Extensions: DuckDB’s rich set of extensions, including potential vector search capabilities, can significantly enhance Venice’s functionalities. This is particularly relevant as Venice has had basic vector math capabilities since around 2019, predating the recent explosion of vector databases.
  • Streamlined Data Exploration and Debugging: For scenarios where data sets are small or a single partition suffices, loading data into DuckDB allows for rapid querying with arbitrary SQL. This provides a much faster and less tedious way for internal users to explore data shapes, check cardinalities, and perform aggregation queries compared to traditional ETL pipelines to systems like Spark.

While not yet widely adopted in production, the Venice-DuckDB integration offers a powerful new option for data exploration and analysis, showcasing the flexibility of both systems.

This has been an incredibly insightful conversation with Félix GV, shedding light on the intricate design choices behind large-scale data systems and the practical implications of theoretical concepts like the CAP theorem.

Thanks for tuning into the GoTo Podcast! Don’t forget to visit gotopia.tech for more valuable content from the brightest minds in software development.

Appendix