Clowder Architecture
Clowder is a sophisticated AI inference orchestration system designed from the ground up to deploy, operate, and manage AI models at scale. It is an open-source framework released under the Apache 2.0 license, ensuring that it is freely available for use and modification.
Clowder consists of several components that work together to provide a complete solution for AI model management. We recommend reading the Clowder Core Concepts to understand the terminology and concepts used in this document.
- Worker Node - physical or virtual machines that run the AI models.
- Storage Node - a storage system, local to the worker nodes, that stores the AI models and their metadata.
- Controller - software that managers the worker nodes and storage nodes, and provides an API for managing the storage and worker nodes.
- Load Balancer - LLM-aware software that intelligently routes inference requests to the worker nodes based on their load, capabilities, availability, and cache state.
- Runtime - software engine that loads the AI models into memory and executes them on the worker nodes.
And lastly:
- Kubernetes - Clowder is designed to run on Kubernetes, a container orchestration platform that provides a scalable and reliable environment for deploying and managing applications.
Inference Request Flow
The inference request flow is as follows:
- Administrator defines an AImage via the control plane API, providing both a source URL and a representative name, or alias, e.g. "model1".
- User makes inference request to the API gateway, providing the alias and the input data.
- API gateway connects to the control plane, which determines which worker node should handle the request, see above.
- API gateway requests of storage manager on the selected node to verify that the AImage is available.
- If the model and components are not available, controller requests of storage manager to download the AImage.
- Storage manager responds that the AImage is available.
- Control plane responds to API gateway with the address of the worker process on the selected node.
- API gateway forwards request to worker process, which performs inference.
As a sequence diagram:
As a flow diagram: