Presenters
Source
Sailing Argo CD Through the Storm: GitOps at the Edge 🌊⚓️
Welcome, fellow tech enthusiasts! It’s fantastic to be here, surrounded by so much passion and innovation. For many of us, this might be our first time at a conference of this scale, and it’s truly exhilarating to step out and connect! 🚀 I’m Carson Culler, a DevOps engineer at Liatrio, and today, we’re diving deep into a topic that might make some of us a little nervous but is incredibly rewarding: making Argo CD work its magic in challenging environments. Think high latency, intermittent connectivity, and even completely disconnected scenarios – the “edge” of our technological world.
The Cloud’s Cozy Assumptions ☁️
We all know and love Argo CD, right? It’s a cornerstone of our GitOps workflows. But have you ever stopped to think why it works so beautifully in our typical cloud deployments? It all boils down to a few fundamental assumptions that are almost always true in the cloud:
- Git is Reachable: Argo CD can reliably connect to your Git repositories.
- Images are Pullable: Container images can be effortlessly downloaded.
These assumptions are safe because the cloud provides us with inherent guarantees of availability and connectivity. Our workloads can talk to the internet, and they can do it quickly.
When the Foundation Crumbles: The Edge Challenges 🚢
But what happens when we take Kubernetes and Argo CD away from the cloud’s comforting embrace and deploy them at the edge? This is where things get really interesting, and frankly, a bit daunting.
Imagine deploying a Kubernetes cluster on a ship in the middle of the ocean. 🌊 Internet connectivity? It’s a satellite relay, a journey through space, prone to the whims of weather and orbital mechanics. One bad storm, one dropped signal, and your connection is gone.
This physical, geographical reality directly impacts our workloads’ ability to communicate and do so promptly. So, what happens when Argo CD finds itself in such an environment, and the cluster goes offline?
The Triple Threat of Disconnection 💥
When Argo CD is cut off from the network, we face a trifecta of critical problems:
- No Git Sync: If Argo can’t talk to Git, it has no desired state to compare against. This means it can’t detect or reconcile any drift in our cluster, leaving our applications potentially out of sync with our intended configuration.
- Broken Image Pulls: This is arguably the biggest hurdle. If we go offline right after a sync, but before all necessary container images have been pulled, our cluster could be left in a severely broken state.
- Failed Rollbacks: If the cluster is unhealthy and offline, we can’t fix issues going forward. And since Argo CD doesn’t cache endlessly, the ability to perform an offline rollback becomes extremely slim, leaving us stranded.
In essence, when the cluster is offline, Argo can’t update, its updates might break things, and if they do, we might not be able to recover. 😱
Rethinking the Workflow: Shifting the Failure Modes 🔄
The good news? This isn’t an insurmountable challenge. The key lies in understanding how Argo CD makes changes and then strategically reordering those steps to mitigate risks.
Argo CD’s deployment process typically involves three main stages:
- Refresh: Pulling the latest updates from Git.
- Sync: Deploying those updates to the cluster.
- Image Pulls: Downloading any necessary container images.
The critical insight is that failures after changes are deployed are unacceptable. Failures before deployment, however, can be managed. So, what if we shifted the risky parts – the image pulls and Git checks – to happen before Argo CD even attempts to deploy changes?
By ensuring that manifests and their required images are readily available before the cluster needs them, we can dramatically reduce the risk of a broken deployment.
The Liatrio Solution: Building Resilient Edge Workflows 🛠️💡
At Liatrio, we’ve developed an elegant solution to bring Argo CD’s benefits to these challenging edge environments. It’s not about rewriting Argo CD, but about creating a robust ecosystem around it.
Tackling Git Availability: Local Mirrors 📡
The first problem is Git connectivity. Instead of relying on a distant cloud-hosted Git provider, we bring Git closer to Argo CD.
git-server(git-annex): We run agit-serverinstance right alongside Argo CD.- Pull-Through Mirror: This
git-serveris configured as a pull-through mirror for our upstream GitHub repositories. - Local Clones: It periodically attempts to pull changes. If the cluster is offline, these pulls will simply fail gracefully.
- Argo CD Configuration: Argo CD is then pointed at these local Git repository clones.
This ensures that as long as the Argo CD pods are running, it has guaranteed connectivity to its Git source of truth.
Securing Image Pulls: Bundling and Verification 📦
The second and third problems – image pulls and rollback capabilities – are tackled through a combination of tools and a clever pre-sync hook.
- Image Bundling: During our release process, we identify all container images associated with our static Kubernetes manifests. These images are then pulled, bundled together, and stored in a container registry.
- Local Registry: On the edge machine, we maintain a local container registry.
git-serverIntegration: A scheduled job on the edge machine queries our localgit-serverfor new revisions.- Bundle Transfer: If new revisions are found, the corresponding image bundle is pulled onto the local registry. This process requires an internet connection, but it’s done before deployment.
- Argo CD Pre-Sync Hook: This is the crucial piece! An Argo CD pre-sync hook
verifies two things before any deployment occurs:
- The correct Git revision is present in our local
git-server. - The corresponding image bundle is available in the local registry.
- The correct Git revision is present in our local
By enforcing these checks, we ensure that all necessary components are in place before Argo CD attempts to sync changes to the cluster. This allows for both successful syncs and reliable rollbacks, even in air-gapped scenarios.
Ensuring Cluster State Persistence 💾
Finally, to ensure that our Kubernetes cluster (we’re using K3S in this scenario) can gracefully handle restarts and updates without losing critical data, we configure its internal data to be stored directly on the file system.
The Takeaway: Innovation at the Edge ✨
Building resilient systems at the edge is undeniably challenging. It forces us to confront a new set of problems that the cloud environment often shields us from. But it’s precisely in these demanding situations that we discover the true capabilities of our tools and, more importantly, our own ingenuity.
The moment we shift from asking, “Won’t work here” to “What would it take to make this work here?” is when real innovation blossoms. This is how we take powerful cloud-native tools like Argo CD and make them sail smoothly, even in the middle of the ocean. This is how we push the boundaries of GitOps and literally take it to the edge.
Thank you for joining me on this journey. If you’re passionate about tackling complex technological challenges and transforming how digital value is delivered, consider reaching out to Liatrio. We’re always excited to collaborate and find solutions together. Let’s keep building the future, wherever it may be! 🌐👨💻