Presenters

Source

๐Ÿš€ Supercharging Your Deployments: Progressive Rollouts with Argo Rollouts and Metrics! ๐Ÿ’ก

Hey tech enthusiasts! Ever felt the nail-biting tension of a new application release? We’ve all been there. Today, we’re diving deep into a game-changer for Kubernetes deployments: Argo Rollouts. Get ready to transform your releases from risky gambles into smooth, confidence-inspiring journeys.

This session, brought to you by Chris Detsicas and Antonio Jimenez Martinez from Thousand Eyes, is all about mastering progressive delivery using continuous verification driven by metrics. We’re moving beyond just checking if your app is “healthy” to ensuring it’s “behaving as expected.” Pretty cool, right?

๐Ÿ—บ๏ธ Our Journey Today:

  • Argo Rollouts: What it is and why it’s a must-have.
  • Deployment Strategies: Exploring the power of Blue/Green and Canary.
  • Analysis Templates: Automating your validation with metrics.
  • Live Demo: Seeing Argo Rollouts in action!
  • Real-World Examples: How Thousand Eyes uses this magic.
  • Best Practices: Tips for success.

๐Ÿ“ฆ Kubernetes Deployments: The Good, The Bad, and The Argo!

You’re likely familiar with Kubernetes’ native deployment strategies like RollingUpdate and Recreate. The RollingUpdate strategy is fantastic for zero downtime, progressively replacing old pods with new ones while checking health. However, Recreate brings downtime by destroying old pods before creating new ones, which is only suitable for apps that can’t run multiple versions simultaneously.

But here’s the catch: these native strategies have limitations. They don’t inherently support progressive rollouts based on SLOs, tests, or metrics. Traffic switching is also often limited.

This is where Argo Rollouts steps in! ๐Ÿฆธโ€โ™‚๏ธ

Argo Rollouts isn’t just another tool; it’s a Kubernetes controller that replaces your standard Kubernetes Deployments with a more powerful Rollout Custom Resource Definition (CRD). It ensures your application not only is healthy but also behaves as expected throughout the rollout process.

โœจ Key Argo Rollouts Features:

  • Advanced Deployment Strategies: Canary and Blue/Green are now at your fingertips.
  • Traffic Shifting: Move beyond basic pod counts to intelligent traffic management using Ingress or Service Meshes.
  • Automated Verification: Check metrics, run tests, and execute CI/CD jobs within Kubernetes for automated validation.
  • Automatic Promotion & Rollback: Let Argo Rollouts handle the promotion to new versions or roll back swiftly if issues arise.

Important Distinction: Don’t confuse Argo CD and Argo Rollouts! Argo CD ensures your desired state (from Git) matches your Kubernetes deployment. Argo Rollouts, on the other hand, manages the progressive rollout of new versions, always verifying application behavior.

๐Ÿ› ๏ธ The Argo Rollouts Toolkit:

  • Rollout CRD: The core definition for your progressive deployments.
  • Analysis Template: Reusable gates for defining what to check, how often, and what constitutes success or failure.
  • Analysis Run: The actual execution of an Analysis Template.
  • Experiment: Compare two application versions and decide which one to keep.

And for a real-time view of your deployments, the kubectl argo-rollouts plugin is a lifesaver! ๐Ÿ“Š It provides insights into availability, weights, and step progression. If an Analysis Template fails, the rollout degrades and scales down automatically, all visible in this plugin and the Argo Rollouts UI.


๐Ÿ”ต๐ŸŸข Blue/Green vs. ๐ŸŽญ Canary: Choosing Your Strategy

Argo Rollouts offers two powerful deployment strategies:

๐Ÿ”ต Blue/Green Strategy: The Swift Switch โšก

This is the simpler of the two. You run two versions of your application simultaneously:

  • Blue: The current production version serving traffic.
  • Green: The new preview version.

You validate the Green version in a preview environment. Once you’re confident, you perform a fast cutover, switching 100% of traffic from Blue to Green in a single step.

Code Snippet:

# Example snippet
spec:
  strategy:
    blueGreen:
      activeService: my-app-active
      previewService: my-app-preview
      autoPromotionEnabled: false # Or set a delay

๐ŸŽญ Canary Strategy: The Gradual Rollout ๐Ÿ“ˆ

This strategy offers more control. You run the old and new versions side-by-side, gradually shifting production traffic to the new version (Canary) in a controlled manner.

  • Benefit: You’re testing with real production traffic but not exposing the entire user base to potential issues. If a problem arises, you can roll back quickly.

Code Snippet:

# Example snippet
spec:
  strategy:
    canary:
      steps:
        - setWeight: 10
        - pause: { duration: 5m }
        - setWeight: 20
        - pause: { duration: 5m }
        - setWeight: 50
        - pause: {} # Pauses indefinitely until manual promotion

The beauty of Canary is its phased approach, allowing you to pause at different traffic percentages for validation.


๐Ÿ“Š Automating Validation with Analysis Templates ๐ŸŽฏ

While Blue/Green and Canary strategies are powerful, they still often rely on manual validation. This is where human error can creep in. Argo Rollouts introduces Analysis Templates to automate this crucial step.

Think of Analysis Templates as reusable release gates. They define:

  • What to check: Metrics, tests, or jobs.
  • How often: The interval for checks.
  • Success/Failure criteria: What passes and what fails.

This creates a metrics-driven policy loop, moving away from solely human checks. If an analysis run fails (e.g., success rate drops below a threshold), the rollout aborts, and the Canary version is scaled back to zero traffic.

Example Analysis Template (Success Rate):

# Example snippet
spec:
  templates:
    - template:
        type: metrics
        metrics:
        - name: success-rate
          interval: 1m
          successCondition: {{.}} > 0.95 # 95% success rate
          failureLimit: 3
          provider:
            prometheus:
              address: http://prometheus.example.com:9090
              query: |
                sum(rate(http_requests_total{status="2xx"}[5m])) / sum(rate(http_requests_total[5m]))

๐Ÿ’ก Integrating Analysis Templates:

You define your Analysis Template within your Rollout object. You can pass arguments to the template, making it highly reusable across different applications or teams.

๐Ÿ—„๏ธ Supported Metric Types:

  • Prometheus Metrics: The most common.
  • DataDog, Amazon CloudWatch: Integrations with other popular monitoring tools.
  • HTTP Endpoints: Make calls to your application’s endpoints.
  • Kubernetes Jobs: Run quick scripts or end-to-end tests. A successful job completion counts as a successful analysis.

๐ŸŽฌ Live Demo: Seeing is Believing! โœจ

Antonio walked us through a live demonstration, showcasing the power of Argo Rollouts:

  1. Setting the Stage: Reviewing Rollout and Analysis CRDs.
  2. Real-time Monitoring: Using the kubectl argo-rollouts plugin to visualize the deployment.
  3. Successful Rollout: Deploying a new version (blue) and seeing it progressively promoted after passing analysis.
  4. Automated Rollback: Deploying a failing version (yellow) that triggers an analysis failure, leading to an automatic rollback. The plugin and UI clearly showed the failure and the rollback process, highlighting the magic of metric-driven validation! ๐Ÿคฏ

The demo vividly illustrated how Argo Rollouts automatically aborts and rolls back when the Analysis Template identifies a problem, ensuring only stable versions reach your users.


๐ŸŒ Real-World Use Cases at Thousand Eyes ๐Ÿš€

Chris shared how Thousand Eyes leverages Argo Rollouts:

1. Public API Service: ๐ŸŒ

  • Strategy: Canary deployment with a background analysis.
  • Validation: Checks the percentage of successful requests against the total, using Istio service mesh metrics.
  • Rollback Trigger: 5 or more failed measurements.
  • Flow: Analysis runs throughout the canary deployment. If successful, it pauses at 70% traffic for 1 minute, then proceeds to 100%.

2. Authentication Service: ๐Ÿ”’ (Mission Critical!)

  • Strategy: Phased traffic shifts (1%, 5%, 10%, etc.).
  • Validation:
    • End-to-End Tests: Run before any traffic shift (Kubernetes Jobs).
    • Background Metrics: Continuously monitor authentication availability, latency, and resource usage.
  • Impact: Teams report saving approximately 15 minutes per region and achieving much higher reliability compared to manual checks.

Analysis Template Examples:

  • Job Template: Runs a standard Kubernetes job (e.g., for end-to-end tests). Success of the job means success of the analysis.
  • SLO Checks: Validates availability, latency, and resource usage metrics.

๐Ÿš€ Bonus: Argo CD & Cargo for Consistency!

For managing deployments across multiple complex environments (regions, staging, production), Thousand Eyes uses Cargo. Cargo treats new software versions as “freight” that moves between stages. Stages can be configured with verifications using Analysis Templates, ensuring consistency across all environments. If a verification fails in staging, the entire rollout to production is canceled.


๐Ÿ† Best Practices for Argo Rollouts ๐Ÿ‘จโ€๐Ÿ’ป

Argo Rollouts is a powerful tool, but it’s not a silver bullet for every application.

๐Ÿค” When to Use Argo Rollouts:

  • Horizontally Scalable Applications: Apps that can run multiple versions simultaneously.
  • Applications with Metrics: Where metrics provide clear indicators of application health and behavior.
  • Need for Automated Promotion/Rollback: When you want to automate these critical processes.

โŒ When NOT to Use Argo Rollouts:

  • System-level Components: Like cert-manager or ingress-nginx controllers.
  • Single-version Applications: Or applications with very long lifecycles that aren’t replaced periodically.

โš ๏ธ Application Requirements:

  • Simultaneous Version Support: Both V1 and V2 must be able to run at the same time.
  • Safe Inter-version Communication: Crucially, avoid simultaneous writes to the same files or schema incompatibilities between versions.
  • Automated Promotion/Rollback Capability: The application’s architecture must support this.

โณ Rollout Guidelines:

  • Rollouts should ideally not exceed one hour.
  • Avoid long approval processes.
  • Argo Rollouts is not designed for permanent parallel releases.

๐Ÿ’ก Mindset Shift: Metrics First! ๐Ÿ’ก

  • Embrace Metrics: Use metrics (success rate, failures, latency, SLO KPIs) to determine application health.
  • Timeliness is Key: Aim for metric validation within five minutes for immediate rollback decisions.
  • Automate Everything: Manual promotion should be reserved for debugging or testing, not production.

๐Ÿš€ Beyond Metrics: Broader Validation

Remember that validation can encompass more than just metrics:

  • CI/CD in Blue/Green or Preview: Run automated pipelines.
  • Tests on Blue/Green: Execute comprehensive test suites.
  • Experiment Comparisons: Compare two versions directly.

๐Ÿ“ฃ Notifications: Alert Sooner, Better!

Integrate notifications (Slack, webhooks) to alert teams immediately when a problem is identified. The goal is to identify and share issues as fast as possible.


This session has truly illuminated the path to more robust, reliable, and less stressful application deployments. By embracing progressive delivery with Argo Rollouts and leveraging the power of metrics for continuous verification, you can build confidence and accelerate your development cycles. Happy rolling out! ๐ŸŽ‰

Appendix