Skip to main content

Clowder Architecture

Clowder is a sophisticated AI inference orchestration system designed from the ground up to deploy, operate, and manage AI models at scale. It is an open-source framework released under the Apache 2.0 license, ensuring that it is freely available for use and modification.

Clowder consists of several components that work together to provide a complete solution for AI model management. We recommend reading the Clowder Core Concepts to understand the terminology and concepts used in this document.

  • Worker Node - physical or virtual machines that run the AI models.
  • Storage Node - a storage system, local to the worker nodes, that stores the AI models and their metadata.
  • Controller - software that managers the worker nodes and storage nodes, and provides an API for managing the storage and worker nodes.
  • Load Balancer - LLM-aware software that intelligently routes inference requests to the worker nodes based on their load, capabilities, availability, and cache state.
  • Runtime - software engine that loads the AI models into memory and executes them on the worker nodes.

And lastly:

  • Kubernetes - Clowder is designed to run on Kubernetes, a container orchestration platform that provides a scalable and reliable environment for deploying and managing applications.

Inference Request Flow

The inference request flow is as follows:

  1. Administrator defines an AImage via the control plane API, providing both a source URL and a representative name, or alias, e.g. "model1".
  2. User makes inference request to the API gateway, providing the alias and the input data.
  3. API gateway connects to the control plane, which determines which worker node should handle the request, see above.
  4. API gateway requests of storage manager on the selected node to verify that the AImage is available.
  5. If the model and components are not available, controller requests of storage manager to download the AImage.
  6. Storage manager responds that the AImage is available.
  7. Control plane responds to API gateway with the address of the worker process on the selected node.
  8. API gateway forwards request to worker process, which performs inference.

As a sequence diagram:

As a flow diagram: