Presenters
Source
๐ Unlocking Java’s Observability: The Power of eBPF and OpenTelemetry!
Hey tech enthusiasts! ๐ Ever felt the struggle of instrumenting complex Java applications, especially when dealing with sensitive data like TLS-encrypted traffic? You’re not alone! Nikola Grcevski from Grafana Labs and Endre Sara from Causely are here to shed some light on this challenge and introduce an exciting solution that’s changing the game: OpenTelemetry eBPF Instrumentation, affectionately known as Obi.
Let’s dive into the world of Java, its unique instrumentation hurdles, and how eBPF is stepping in to provide a powerful, zero-touch approach. โจ
โ Why Java Still Reigns Supreme (and Why It’s Tricky to Monitor)
Java remains a titan in the enterprise world, powering critical systems in finance and beyond. Frameworks like Spring Boot, Quarkus, and libraries for asynchronous messaging are ubiquitous. And let’s not forget the OpenTelemetry Java SDK โ it’s mature and comprehensive!
However, getting this robust instrumentation into production isn’t always a walk in the park. ๐ถโโ๏ธ๐ถโโ๏ธ
The Production Bottleneck ๐ง
- Operational Hurdles: One large customer saw Java instrumentation in staging within weeks, but it took over six months to get the first production application instrumented. This wasn’t due to technical limitations, but rather the sheer operational complexity of modifying code.
- Third-Party & Legacy Systems: Many applications, especially those from third parties or older systems running on legacy JVMs, are untouchable. You simply can’t rebuild or modify their startup behavior.
- Financial Sector Sensitivities: In financial institutions, injecting code directly into the runtime to measure risk and impact is often too difficult or risky.
- GraalVM Native Image: The rise of GraalVM native images, designed to shave off startup milliseconds, also poses a challenge for traditional agents that rely on dynamic loading.
๐ eBPF & Java: A Love-Hate Relationship
eBPF (extended Berkeley Packet Filter) is a revolutionary technology allowing us to build probes directly into the Linux kernel and running applications. It’s powerful for user-space probing, but Java presented a unique problem.
The “File” Conundrum ๐
In Linux, everything is a file. However, Java’s dynamically compiled code often
resides in anonymous memory regions, not files. This makes direct eBPF
user-space probing tricky. While hacks like ptrace and remapping memory
regions exist, they’re complex and brittle.
The Ever-Changing Codebase ๐
Java’s Just-In-Time (JIT) compilation means that code changes from run to run. The JVM interprets bytecode, collects profiling data, and then optimizes code in stages. This dynamic nature makes building instrumentation at the assembly or binary level incredibly challenging. You’d need to cast a “really wide net” of probes, leading to an unstable instrumentation solution.
๐ Cracking the TLS Code: The Missing Piece
One of the most significant challenges Nikola and Endre aimed to solve was TLS-encrypted traffic.
The Ubiquity of TLS ๐
It turns out, many Java applications do interact with TLS, especially when communicating with cloud services or external APIs. Even if TLS is terminated at load balancers, client calls to external services are overwhelmingly over TLS.
The Java TLS Dilemma ๐ต๏ธโโ๏ธ
Unlike other languages where you might tap into libraries like LibSSL or
BoringSSL, Java typically implements TLS in Java. This means the network
traffic, even if it’s HTTP or Postgres, is encrypted and opaque to kernel-level
eBPF probes. You can’t simply parse the payloads anymore.
โจ Introducing Obi: Bridging the Gap with a Tiny Java Agent
To tackle these limitations, the team developed Obi, a small, dynamically injectable Java agent. It works in tandem with eBPF (specifically, the technology that was previously Grafana Bela) to fill in the gaps.
How Obi Works: Key Innovations ๐ก
- Dynamic Java Agent: Obi injects a minimal Java agent into running processes. This agent is bare-bones, focusing only on instrumenting TLS and Java’s thread pools. It doesn’t attempt to instrument every application library, keeping the OpenTelemetry philosophy of generic instrumentation.
- TLS Instrumentation: It taps into Java’s TLS mechanisms to capture and decrypt traffic, enabling visibility into otherwise opaque encrypted communication.
- Thread Pool Correlation: Addresses the common Java pattern where incoming and outgoing requests are handled by different threads, ensuring proper trace context propagation.
- Route Harvesting: Obi intelligently harvests routes embedded in the
symbols of generated Java classes (e.g.,
/users/{id}in Spring Boot) and matches them to incoming kernel traffic, providing accurate endpoint identification. - Asynchronous TLS Handling: A clever innovation allows Obi to correlate buffers even in asynchronous TLS implementations (like Netty). It uses a portion of the encrypted text as a key to map to the actual buffer, enabling correlation across encryption and decryption points.
- eBPF Integration: The Java agent communicates buffer information to the
kernel via a fast
sysioctlsystem call. eBPF K-probes intercept this call, read the data, and feed it into the Obi machinery to generate telemetry.
Obi + Bela: A Powerful Duo ๐ค
The OpenTelemetry eBPF Instrumentation (Obi) leverages the capabilities of Bela (now donated to OTEL). Bela installs kernel probes, network programs, and user probes to capture telemetry at the kernel level. Obi then enriches this with Java-specific insights.
The combined solution installs as a system process (or a Kubernetes DaemonSet) and can monitor multiple applications simultaneously. Telemetry can be sent to an OTLP endpoint or processed locally.
๐ ๏ธ Practical Applications and Real-World Scenarios
Endre shared some compelling use cases:
- Edge Security: Java clients (Spring Boot) interacting with Keycloak over HTTPS.
- Digital Marketing: Extensive use of Google Pub/Sub over SSL.
- Financial Services: Numerous Java applications connecting to managed databases (like Postgres) and Kafka over TLS.
He even developed open-source examples deployable via Helm charts, allowing you to experiment with Bela’s instrumentation both with and without native OpenTelemetry instrumentation.
๐ง Challenges and Future Directions
While Obi and Bela offer incredible capabilities, there are still areas for refinement:
- TLS Context Propagation: This is an ongoing area of development.
- Kafka Protocol Versions: Support for the latest Kafka protocol versions (e.g., V13) is being actively developed.
- Reactive Libraries: Instrumentation for reactive libraries like RxJava is not yet supported.
โ๏ธ eBPF vs. Java Agent: Complementary, Not Competitive
Nikola emphasized that eBPF and Java agents aren’t mutually exclusive; they are complementary.
- Java Agent: Ideal for fully instrumenting every single library, especially when custom business metrics are crucial, and when you can modify the code.
- Obi/Bela (eBPF): Perfect for zero-touch, day-zero instrumentation of third-party applications or when code modification is impossible.
In practice, many customers roll out Bela everywhere for broad visibility and then selectively use Java agents for specific, high-value applications. Bela intelligently detects existing native instrumentation and avoids redundant probing. Furthermore, Bela provides consistent, semantically consistent instrumentation across multiple languages, not just Java.
โจ Key Takeaways
- Bela is a Game-Changer: Especially valuable where deploying custom Java agents is not feasible.
- Deep Visibility: Obi provides out-of-the-box understanding of system behavior, service interactions, and even granular API, Kafka topic, and SQL query-level insights, including with TLS.
- Strategic Choices: Make informed decisions about when to leverage the Java SDK versus Obi/Bela for optimal observability.
- Community Driven: Feedback, contributions, and pull requests are highly welcomed!
This is a monumental step forward in making Java applications more observable, especially in complex, security-conscious environments. Give Obi and Bela a try and unlock deeper insights into your Java ecosystem! ๐