Presenters

Source

Bridging the Gap: How gRPC Powers Serverless in the Cloud Service Mesh 🚀

The term serverless is perhaps the most misleading name in cloud computing. Let’s clear the air: there are still servers. We haven’t figured out how to run code on pure magic just yet! What serverless actually means is that you—the developer—are no longer responsible for managing that infrastructure.

At a recent talk, Ishita Chanwani and Prangi Sakenna from the gRPC Go team at Google dove deep into how they are making serverless platforms like Cloud Run work seamlessly within a Cloud Service Mesh (CSM).


🏗️ The Serverless Powerhouse: Google Cloud Run

Google Cloud Run is a managed implementation of the open-source Knative technology. It is built on containers, offering incredible flexibility for developers.

Why Cloud Run?

  • Massive Scalability: It scales from 0 to 1,000 instances based purely on demand. 📈
  • Zero Waste: When there are no requests, the platform scales down to zero, saving costs.
  • Language Agnostic: You can deploy virtually any container image—whether written in Go, Python, Java, or C++—as long as it responds to HTTP requests.
  • Versatile Workloads: It handles everything from simple REST/GraphQL APIs to complex machine learning models and asynchronous batch processing.

🧠 The Brain: Cloud Service Mesh (CSM)

To make a serverless vision work at scale, you need a “brain” to coordinate traffic. That is where Cloud Service Mesh comes in. CSM unifies Traffic Director and Anthos Service Mesh into a single Google-managed control plane.

CSM uses the XDS API for dynamic configuration and supports a proxyless mode, which is the gold standard for running gRPC on Cloud Run. This setup allows for PSM (Proxyless Service Mesh) Security, where the control plane sends TLS configurations and root certificates directly to the application.


🚧 The Challenge: The Ingress Proxy Problem

Serverless platforms introduce a unique hurdle: the Ingress Proxy. Because serverless instances scale to zero, they do not have persistent or static IP addresses. The platform uses an ingress proxy with a persistent IP to manage routing and scaling.

However, this proxy creates a “wall” between the client and the server, leading to three major technical challenges:

  1. The Trust Gap: Conflicting security requirements.
  2. The Routing Riddle: Directing traffic to the correct backend.
  3. The Identity Crisis: Authenticating the client through a proxy.

🛡️ Challenge 1: Closing the Trust Gap

In a standard mesh, clients and servers use private Root CAs provided by the XDS control plane. However, the Cloud Run proxy is a public endpoint using standard public PKI certificates. When a mesh client sees a public certificate it doesn’t recognize, it drops the connection.

The Solution: system_root_certs

The team updated the XDS protocol to include a new boolean field: system_root_certs.

  • If this flag is true, gRPC ignores the private mesh CA and uses the operating system’s default root trust store.
  • This allows the client to verify the public proxy certificate dynamically without any manual code changes. 🛠️

📍 Challenge 2: Mastering Traffic Routing

Cloud Run proxies require a physical service name to route traffic correctly. However, mesh clients typically send a logical XDS cluster name, which the proxy cannot resolve.

The Solution: Secure Authority Overwriting

The XDS control plane now instructs the client to perform a Secure Authority Overwrite.

  • The gRPC client checks for an auto_host_rewrite flag in its configuration.
  • If authorized, the client overwrites the logical authority header with the physical Cloud Run host name just before the request leaves the client. 📡

🔑 Challenge 3: Solving the Identity Crisis

In a standard mesh, the server verifies the client via mTLS. But the Cloud Run proxy hides the client’s identity from the server. To fix this, Cloud Run requires JWT (JSON Web Tokens).

The Solution: Automated JWT Injection

Normally, developers would have to write custom code to fetch and attach these tokens. gRPC now automates this via the GCP Auth Filter:

  1. Automatic Fetching: gRPC queries a metadata server to fetch a valid JWT for a specific audience.
  2. Smart Caching: To avoid latency hits on every RPC, gRPC caches the tokens and only refreshes them when they are near expiration. ⏱️
  3. Header Injection: The GCP Auth Filter automatically injects the token into the authorization: bearer header.

🎯 Conclusion

By evolving the XDS protocol and implementing smart filters within gRPC, the team has removed the friction of integrating serverless workloads into a global service mesh. Developers can now enjoy the rapid scaling of Cloud Run with the robust security and observability of Cloud Service Mesh—all without changing a single line of application code.

Ready to dive deeper? Check out the official documentation at grpc.io, watch the tutorials on their YouTube channel, or join the mailing list to stay updated on the latest in high-performance networking! 🌐✨

Appendix