Presenters

Source

๐Ÿš€ Supercharging Argo Workflows: The Power of Artifact Driver Plugins

In the world of cloud-native orchestration, data is king. Whether you are passing machine learning models between steps or saving logs for compliance, how you handle artifacts defines the efficiency of your pipelines. At a recent talk, Alan Clucas, a maintainer of Argo Workflows from PipeKit, unveiled a game-changing feature: Artifact Driver Plugins.

If you have ever felt limited by built-in storage options or struggled with complex scripts to move data, this post is for you. Letโ€™s dive into how these plugins make Argo Workflows more flexible than ever! ๐ŸŒ


๐Ÿ“ฆ What Exactly is an Artifact?

In Argo Workflows, artifacts are the files or directories produced by your jobs. Think of them as the “bridge” between steps. ๐ŸŒ‰

  • Output Artifacts: Files generated by a pod (like a processed dataset) that you want to save.
  • Input Artifacts: Files acquired from an external system or a previous step to be used in the current task.

Typically, a step might upload a file to S3, and a subsequent step downloads it. While Argo has built-in support for S3, Git (read-only), and various blob storages, the community needed more. That is where the Artifact Driver Plugin system, introduced in version 4.0.0, comes into play. ๐Ÿ› ๏ธ


โœจ Why Choose Plugins Over Scripts?

Before version 4.0.0, developers often wrote custom scripts within their workflows to handle unique storage backends. While functional, this approach lacks the “magic” of native integration. By using a plugin, you gain:

  1. Garbage Collection (GC): Automatically delete artifacts via a TTL (Time To Live) when a workflow is deleted. ๐Ÿงน
  2. UI Visibility: View and download your custom artifacts directly from the Argo UI.
  3. Automatic Compression: Built-in support for archiving (tar) and compression (gz).
  4. Decoupled Releases: You own the image and the release cycle. You don’t need to fork Argo or wait for a new CRD change to support your private organizationโ€™s backend. ๐Ÿ”“

๐Ÿ—๏ธ How to Build Your Own Driver

Creating a plugin is surprisingly straightforward. You essentially need to build a gRPC server that implements the Artifact Service contract.

The Technical Requirements ๐Ÿ’พ

  • Language Agnostic: Use any language with gRPC support (Python, Go, Java, etc.). This is powerful because you can use a storage system’s native SDK without worrying about Go compatibility.
  • The Contract: Your server must adhere to the artifact.proto file. Key methods include:
    • save: Upload a file to your repository.
    • load: Download a file from your repository.
    • listObjects & isDirectory: Help Argo navigate your storage.
    • delete: Enable Garbage Collection.
    • openStream: Provide a network stream of the data (optional but recommended for efficiency).

Packaging and Security ๐Ÿ›ก๏ธ

You must package your gRPC server as a Docker image.

  • User Permissions: The image must run as a non-root user.
  • Storage: The /tmp directory must be writable by the default user.
  • Minimalism: Keep your image small to ensure fast pod startup times.

โš™๏ธ Under the Hood: The Orchestration Flow

How does Argo actually use your plugin? It uses a combination of init containers and sidecars. ๐Ÿฆพ

๐Ÿ“ฅ For Inputs:

Argo injects your plugin as an init container. If you use 15 different plugins for 15 different artifacts, Argo will run 15 init containers in sequence. These containers download the files into a shared emptyDir volume so your main application can access them.

๐Ÿ“ค For Outputs:

Your plugin runs as a legacy sidecar alongside your workload. When your job finishes, the wait container calls the save method on your plugin sidecar to upload the data.

๐Ÿ–ฅ๏ธ For the UI:

To see artifacts in the Argo UI, you must also run your plugin as a sidecar to the Argo Server. This allows the server to call openStream and pipe the file directly to your browser. ๐Ÿ”


๐Ÿ› ๏ธ Registration and Configuration

To get your plugin running, you need to register it in two places:

  1. Workflow Controller ConfigMap: Define the plugin name and the Docker image URI.
  2. Argo Server: Manually add the plugin as a sidecar. โš ๏ธ Note: If you register a plugin in the ConfigMap but forget the sidecar on the Argo Server, the server will enter a crash loop to alert you of the configuration error.

In your workflow YAML, you simply replace s3 or git with the keyword plugin:

artifacts:
  - name: my-custom-artifact
    plugin:
      name: my-custom-driver
      config: "bucket=my-private-store;region=us-west" # Passed as a raw string to your driver

โš–๏ธ Tradeoffs and Challenges

While powerful, there are a few things to keep in mind:

  • Configuration: Unlike built-in drivers, plugins receive configuration as a raw string. There is no structured validation via Argo’s CRD, so your driver must handle the parsing (Alan suggests using YAML within that string). ๐Ÿ“
  • Performance: Using multiple init containers can add overhead to pod startup times.
  • Manual Setup: You currently have to manually configure the Argo Server sidecars, though this may be automated in future Helm chart updates.

๐Ÿ”ฎ The Future of Artifacts

Alan shared exciting prospects for the ecosystem:

  • Community Index: A shared repository where developers can discover and contribute to existing drivers. ๐Ÿค
  • Image Volumes: Once Kubernetes Image Volumes become stable across all supported versions, Argo may move away from init containers to mount artifact binaries directly, further optimizing performance.

โ“ Q&A Segment

Audience: No questions were asked during the session. Alan Clucas: Alan wrapped up by encouraging everyone to visit the PipeKit booth (494) at KubeCon to discuss workflows or share what they are building.


๐Ÿš€ Final Thoughts

Artifact Driver Plugins represent a massive leap in extensibility for Argo Workflows. By moving storage logic out of the core controller and into pluggable containers, the community can now support any storage backend imaginable without waiting for upstream PRs.

Are you ready to build your first driver? Check out the Argo Workflows documentation and start streamlining your data flow today! ๐Ÿ‘พโœจ

Appendix