Presenters
Source
๐ Supercharging Argo Workflows: The Power of Artifact Driver Plugins
In the world of cloud-native orchestration, data is king. Whether you are passing machine learning models between steps or saving logs for compliance, how you handle artifacts defines the efficiency of your pipelines. At a recent talk, Alan Clucas, a maintainer of Argo Workflows from PipeKit, unveiled a game-changing feature: Artifact Driver Plugins.
If you have ever felt limited by built-in storage options or struggled with complex scripts to move data, this post is for you. Letโs dive into how these plugins make Argo Workflows more flexible than ever! ๐
๐ฆ What Exactly is an Artifact?
In Argo Workflows, artifacts are the files or directories produced by your jobs. Think of them as the “bridge” between steps. ๐
- Output Artifacts: Files generated by a pod (like a processed dataset) that you want to save.
- Input Artifacts: Files acquired from an external system or a previous step to be used in the current task.
Typically, a step might upload a file to S3, and a subsequent step downloads it. While Argo has built-in support for S3, Git (read-only), and various blob storages, the community needed more. That is where the Artifact Driver Plugin system, introduced in version 4.0.0, comes into play. ๐ ๏ธ
โจ Why Choose Plugins Over Scripts?
Before version 4.0.0, developers often wrote custom scripts within their workflows to handle unique storage backends. While functional, this approach lacks the “magic” of native integration. By using a plugin, you gain:
- Garbage Collection (GC): Automatically delete artifacts via a TTL (Time To Live) when a workflow is deleted. ๐งน
- UI Visibility: View and download your custom artifacts directly from the Argo UI.
- Automatic Compression: Built-in support for archiving (tar) and compression (gz).
- Decoupled Releases: You own the image and the release cycle. You don’t need to fork Argo or wait for a new CRD change to support your private organizationโs backend. ๐
๐๏ธ How to Build Your Own Driver
Creating a plugin is surprisingly straightforward. You essentially need to build a gRPC server that implements the Artifact Service contract.
The Technical Requirements ๐พ
- Language Agnostic: Use any language with gRPC support (Python, Go, Java, etc.). This is powerful because you can use a storage system’s native SDK without worrying about Go compatibility.
- The Contract: Your server must adhere to the
artifact.protofile. Key methods include:save: Upload a file to your repository.load: Download a file from your repository.listObjects&isDirectory: Help Argo navigate your storage.delete: Enable Garbage Collection.openStream: Provide a network stream of the data (optional but recommended for efficiency).
Packaging and Security ๐ก๏ธ
You must package your gRPC server as a Docker image.
- User Permissions: The image must run as a non-root user.
- Storage: The
/tmpdirectory must be writable by the default user. - Minimalism: Keep your image small to ensure fast pod startup times.
โ๏ธ Under the Hood: The Orchestration Flow
How does Argo actually use your plugin? It uses a combination of init containers and sidecars. ๐ฆพ
๐ฅ For Inputs:
Argo injects your plugin as an init container. If you use 15 different plugins for 15 different artifacts, Argo will run 15 init containers in sequence. These containers download the files into a shared emptyDir volume so your main application can access them.
๐ค For Outputs:
Your plugin runs as a legacy sidecar alongside your workload. When your job
finishes, the wait container calls the save method on your plugin sidecar
to upload the data.
๐ฅ๏ธ For the UI:
To see artifacts in the Argo UI, you must also run your plugin as a sidecar to
the Argo Server. This allows the server to call openStream and pipe the
file directly to your browser. ๐
๐ ๏ธ Registration and Configuration
To get your plugin running, you need to register it in two places:
- Workflow Controller ConfigMap: Define the plugin name and the Docker image URI.
- Argo Server: Manually add the plugin as a sidecar. โ ๏ธ Note: If you register a plugin in the ConfigMap but forget the sidecar on the Argo Server, the server will enter a crash loop to alert you of the configuration error.
In your workflow YAML, you simply replace s3 or git with the keyword
plugin:
artifacts:
- name: my-custom-artifact
plugin:
name: my-custom-driver
config: "bucket=my-private-store;region=us-west" # Passed as a raw string to your driver
โ๏ธ Tradeoffs and Challenges
While powerful, there are a few things to keep in mind:
- Configuration: Unlike built-in drivers, plugins receive configuration as a raw string. There is no structured validation via Argo’s CRD, so your driver must handle the parsing (Alan suggests using YAML within that string). ๐
- Performance: Using multiple init containers can add overhead to pod startup times.
- Manual Setup: You currently have to manually configure the Argo Server sidecars, though this may be automated in future Helm chart updates.
๐ฎ The Future of Artifacts
Alan shared exciting prospects for the ecosystem:
- Community Index: A shared repository where developers can discover and contribute to existing drivers. ๐ค
- Image Volumes: Once Kubernetes Image Volumes become stable across all supported versions, Argo may move away from init containers to mount artifact binaries directly, further optimizing performance.
โ Q&A Segment
Audience: No questions were asked during the session. Alan Clucas: Alan wrapped up by encouraging everyone to visit the PipeKit booth (494) at KubeCon to discuss workflows or share what they are building.
๐ Final Thoughts
Artifact Driver Plugins represent a massive leap in extensibility for Argo Workflows. By moving storage logic out of the core controller and into pluggable containers, the community can now support any storage backend imaginable without waiting for upstream PRs.
Are you ready to build your first driver? Check out the Argo Workflows documentation and start streamlining your data flow today! ๐พโจ