Presenters

Source

🚀 From Anxiety to Autonomy: Revolutionizing Deployments at a Global Scale 🌍

In the fast-paced world of tech, deploying new features and updates can often feel like a high-stakes gamble. For a massive corporation serving over 100 million users, this process was once fraught with manual toil and the ever-present fear of production failures. But a remarkable transformation has occurred, shifting their deployment strategy from a source of constant anxiety to one of peace of mind. Join us as we dive into the journey of how they embraced Argo CD and the GitOps model to achieve unprecedented agility and reliability.


🌌 The Dawn of Deployments: From Manual Labor to Automated Pipelines 🛠️

Seven to eight years ago, our story begins with a focus on the code itself, rather than the infrastructure or deployment models. The initial approach relied on a click-ops model, meaning everything was handled manually. This was a far cry from modern CI/CD practices.

The first major leap was automating this process. This involved:

  • Infrastructure as Code: Using Terraform to define and manage infrastructure, ensuring consistency and repeatability.
  • CI/CD Automation: Implementing YAML files for continuous integration and continuous deployment.
  • Unified Pipelines: The ultimate goal was to achieve a single pipeline capable of deploying both the application and its associated infrastructure simultaneously.

😟 The QA’s Struggle: The “Space Stone Age” of Quality Assurance 🗿

While the engineering team made strides in automation, the Quality Assurance (QA) team found themselves in what they describe as the “space stone age”. Two critical challenges emerged:

  • Late-Stage Quality Checks: QA was treated as a final customs office, only appearing at the very end of the process. This meant that any issues discovered late carried an enormous cost to fix, especially considering the impact on 120 million users of applications like Mercadona’s app, used for discounts and payments.
  • Resource Imbalance: In multidisciplinary teams, a single QA engineer was often responsible for seven developers. This imbalance made thorough testing extremely difficult.
  • Disconnection and Test Instability: The manual nature of deployments led to a disconnect. QA tests would sometimes fail not due to bugs, but because of uncontrolled changes in the environment, leaving QA with insufficient time to update their tests.

This era was characterized by a desire to implement quality but a lack of clear understanding of how to achieve it effectively.


⏳ The “Stone Age” Deployment Process: A Slow and Risky Orbit 🚀

Around two years prior to this narrative, the deployment process, while improved from manual clicks, was still far from agile. The pipeline involved:

  • Multiple Environments: Navigating four to five different environments, each dependent on the other.
  • Slow Production Deployments: Reaching production, even for urgent bug fixes, could take more than 10 to 20 minutes due to the sequential nature of passing through all environments.
  • Limited Test Coverage: While unit tests ran on pull requests, the overall coverage was not very good.
  • Production Freeze: A one-week freeze in UAT (User Acceptance Testing) was implemented before production deployments to allow for manual regression testing.
  • Manual Sanity Checks: Even after all these steps, manual sanity tests were required in production to validate the deployment.

This process was not only slow but also inherently risky, with manual validations increasing the potential for errors.


💡 The Azure Web Apps Bottleneck & The Kubernetes Revolution 🌐

As the product and user base grew, the Azure Web Apps platform, initially used for deployments, began to falter. It simply couldn’t keep up with the increasing demands. This, combined with the slow deployment pipelines, prompted a search for a new solution.

The answer was Kubernetes. While acknowledging that Kubernetes is not a silver bullet and has a steep learning curve requiring expert guidance, the team embraced it as the perfect solution for their needs.

However, the initial move to Kubernetes didn’t immediately adopt the GitOps model. Instead, they maintained a similar pipeline structure:

  • Separate CI/CD Pipelines: One pipeline for building container images and another for deploying them to various clusters.
  • Phased Adoption: This step-by-step approach was crucial for a large corporation with many teams. First, introduce Kubernetes, and once teams were familiar, transition to GitOps.

🌟 Embracing GitOps and Argo CD: The Single Source of Truth ✨

The realization that they needed a single source of truth led them to the GitOps model. This shift addressed several key issues:

  • Eliminating Disconnection: GitOps provides a clear audit trail, answering who, what, when, and why changes were made.
  • Declarative Approach: Moving from an imperative (defining every step) to a declarative (defining the desired end state) way of working.
  • Self-Healing Systems: The system constantly monitors the desired state against the live state and automatically corrects any drift.

To implement GitOps, Argo CD was chosen. Argo CD is a powerful, declarative, continuous delivery tool for Kubernetes that leverages Git repositories as the source of truth. Its key benefits for the team included:

  • Source of Truth: Using Git to define the desired state of applications.
  • Automated Synchronization: Continuously comparing the desired state in Git with the live state in the cluster and synchronizing them.
  • Rollback Capabilities: The ability to easily roll back to previous versions.
  • Multicluster Support: Crucial for managing numerous products and clusters, enabling rapid environment recovery.

🚀 The Modern Deployment Flow: Argo CD in Action 🤖

The current deployment flow at the company is a testament to their transformation:

  1. Event Trigger: A backend developer creates a pull request.
  2. Argo CD Components Engage:
    • API Server: Communicates with the Git repository to fetch changes.
    • Repo Server: Clones the repository, renders templates, and prepares the deployment.
    • Application Controller: Operates in a continuous loop, comparing the live state in the cluster with the desired state defined in Git.
  3. Automatic Synchronization: If any drift is detected, Argo CD automatically synchronizes the cluster to match the Git repository.

This streamlined process has drastically reduced recovery time. What once took one week for disaster recovery can now be accomplished in less than five minutes!


🏭 The Power of Application Sets and Rollouts: Scaling with Precision 🎯

To manage their vast number of applications and clusters, the team leverages two key Argo CD features:

  • Application: This acts as a template, mapping one-to-one with the resources to be deployed in a cluster.
  • Application Sets: These function as a “factory of applications”. Using generators (like list or cluster generators), they can create a large number of applications from a single definition, enabling rapid deployment of new products and quick recovery.

The final piece of the puzzle is Argo Rollouts. This controller revolutionizes deployments by moving beyond Kubernetes’ “all or nothing” approach:

  • Canary Deployments: Argo Rollouts enables phased rollouts using strategies like canary deployments.
  • Automated Testing: New pods are created and made accessible via a canary header, allowing for end-to-end tests and contract tests (using tools like Packer) without impacting end-users.
  • Progressive Rollouts: If tests pass, traffic can be gradually increased (e.g., 5%, 30%) with performance tests (checking response time and failure rates).
  • Automatic Rollback: If any validation fails at any stage, Argo Rollouts automatically executes a rollback, providing a safety net.

For QA, this shift is monumental. Their role transforms from validation after disaster to defining the rules for a successful landing.


📹 A Glimpse into the Future: The Demo and Beyond 🎬

A recorded demo showcased the power of Argo Rollouts, illustrating the progression through different stages, automated analysis, and the creation of new pods. While sensitive information was blurred, the visual representation highlighted the speed and efficiency of the process. The demo concluded with the successful completion of the rollout, turning the indicator green.

The journey from manual, anxiety-ridden deployments to an automated, GitOps-driven system with Argo CD and Argo Rollouts has profoundly changed their lives and the lives of their users. The next frontier is to achieve fully automated processes without any human intervention.


This incredible transformation showcases how embracing modern tools and methodologies can turn even the most daunting challenges into opportunities for innovation and success. The team’s dedication to continuous improvement and their willingness to adopt new technologies have paved the way for a more reliable, agile, and ultimately, more peaceful deployment future. 🕊️

Appendix