Presenters
Source
From Chaos to Control: Morgan Stanley’s Epic GitOps Adventure with Flux! 🚀
Ever felt like wrangling Kubernetes infrastructure is like herding cats? You’re not alone! Morgan Stanley’s journey into the world of GitOps with Flux is a masterclass in transforming complex, regulated environments into streamlined, production-ready powerhouses. Simo Borasa and Tiffany Wang shared their incredible story, and let me tell you, it’s packed with valuable lessons for anyone looking to level up their DevOps game.
The Early Days: When Visibility Was a Foreign Concept 🌌
Picture this: Morgan Stanley, a titan in the financial world, grappling with a fragmented Kubernetes landscape. Platform engineers built the clusters, but application teams were off doing their own thing with code and manifests. This created a breeding ground for problems:
- Blind Spots Galore: No one truly knew what was running where. Workloads and platform resources were like ghosts in the machine.
- The Dreaded Data Loss: Cluster rebuilds meant manual redeployments. Without a GitOps agent to enforce the desired state, configuration drift was a constant threat, and valuable data could easily vanish.
- Kubernetes Newbies: Many developers were just starting their Kubernetes adventure, making these challenges even more pronounced.
Enter Flux! It promised continuous reconciliation and drift detection, the holy grail for managing declarative state. But integrating it into a highly regulated enterprise? That’s where the real adventure began.
Navigating the Regulatory Maze 🛡️
The biggest hurdle? Authentication and authorization. Early Flux V2 lacked MTLS, forcing Morgan Stanley to get creative. They smartly leveraged Flux’s multi-tenancy capabilities and service account impersonation. This ensured the principle of least privilege was upheld across namespaces, significantly bolstering their security posture.
Building a Self-Service GitOps Dream Machine 🛠️
To empower their application teams, Morgan Stanley didn’t just hand over Flux; they built a self-service tooling ecosystem. This was a game-changer:
- CMDB Platform Power: Custom tooling ensured privilege checks, namespace creation, and seamless updates to their configuration management database (CMDB).
- GitOps Platform Prowess: For GitOps workflows, namespaces were provisioned through a push service at CI time. This service acted as a gatekeeper, verifying entitlements and enforcing change control for higher environments before pushing anything to the source store.
- Developer’s Best Friend: A standardized repository structure and a sample application provided a clear roadmap for application teams, drastically simplifying their onboarding.
This meticulous approach allowed them to introduce Flux custom resources, Kubernetes concepts, and declarative Helm usage to their developers. The result? Flux was successfully onboarded onto over 50 clusters, all while maintaining compliance. Impressive!
Scaling to Production: Listening, Learning, and Evolving ✨
Hitting production was a huge win, but the journey didn’t stop there. Morgan Stanley doubled down on user feedback and continuous improvement:
- Effortless Onboarding: GitOps provisioning became fully self-service, removing an extra step for end-users and making life much easier.
- The Supercharged Source Store: For high availability and High Disaster Recovery (HDR) compliance, they ditched their self-hosted Git provider for a robust S3 bucket as their source store.
- Granular Control and Shared Clusters: They refined their architecture to support multiple applications within a single namespace. Cluster owners gained the ability to manage resources across tenants, especially for critical resource quotas on shared clusters.
- Observability Nirvana: A central solution featuring Grafana and enhanced OSS Flux Grafana dashboards became their single pane of glass, offering crystal-clear visibility into cluster status, Flux health, and application reconciliation.
The Era of Scale: Performance Prowess and Happy Developers 👨💻
As Flux became the backbone of their platform, Morgan Stanley scaled massively. We’re talking over 500 clusters, 2,000+ nodes, and 100,000+ containers! This gargantuan scale necessitated a laser focus on:
- Optimizing Everything: They dove deep into Flux performance, scrutinizing metrics like control plane CPU utilization, reconciliation intervals, and concurrency. Key tweaks to flux max workq and controller concurrency yielded significant performance improvements, freeing up valuable headroom and boosting overall cluster performance.
- Developer Delight with Notifications: Understanding the developer’s need for timely feedback, they integrated notifications for reconciliation and deployment success or failure. By leveraging event metadata from object annotations, they could provide specific, granular notifications tied to the exact version being deployed.
- Tailored Insights: While the OSS dashboards were great, they augmented Flux metrics with custom resource state config to provide even more meaningful and actionable insights for their application developers.
The Future is Bright: Flux Sharding, OCI, and Beyond! 🌟
Morgan Stanley isn’t resting on their laurels. They’re actively exploring exciting new frontiers:
- Flux Sharding: To manage increasing scale for specific users, they’re looking at enabling Flux configuration per cluster.
- Exploring OCI: Evaluating alternatives like OCI for their source store could further enhance their GitOps journey.
- Continuous Refinement: Ongoing engagement with users ensures continuous fine-tuning of Flux configurations.
- Elevating Developer Experience: Features like the Flux operator, resource sets, and granular dependency management are on the roadmap to simplify Flux management even further.
- Progressive Delivery with Flagger: They’re considering enabling Flagger for progressive delivery, empowering their developer communities with safer, more controlled rollouts.
Morgan Stanley’s story is a powerful testament that achieving production-ready GitOps is as much a cultural shift as it is a technical one. It’s about continuous engagement, iterative improvement, and an unwavering commitment to a stellar developer experience. If you’re on your own GitOps journey, take inspiration from their triumphs – your production-ready future awaits!