Hello from Clowder

Distributed inference across multiple runtime engines, running multiple models, on Kubernetes.

Load balancing that is aware of the LLMs running on each node, their load, and context history. Route to the optimal node for inference requests.

Manage which models are available, control replicas, automated and manual scaling and distribution. Full support for multiple model formats.