Skip to main content

Clowder

Distributed Ainekko Inference Engine

Easy to Use

Distributed Inference Runtime

Distributed inference across multiple runtime engines, running multiple models, on Kubernetes.

Focus on What Matters

LLM-aware load balancing

Load balancing that is aware of the LLMs running on each node, their load, and context history. Route to the optimal node for inference requests.

Powered by React

Orchestration of multiple models and formats

Manage which models are available, control replicas, automated and manual scaling and distribution. Full support for multiple model formats.