Quick Start
Welcome to the Clowder quickstart guide.
Requirements
What do you need to run Clowder?
Clowder runs on top of Kubernetes, so you will need a Kubernetes cluster. You can use any Kubernetes provider, such as:
As well as managed Kubernetes services, such as:
Because Clowder runs inference on AI models, you will want hardware to run your model. That can be both a CPU and a GPU as well as dedicated inference processors.
If you want to run on an 0xide sled, you will need to deploy a Kubernetes cluster to it. The good folks at Ainekko have created a single-binary Kubernetes installer and controller for 0xide sleds.
Deployment
Once you have your Kubernetes cluster up and running, and credentials to access it, deploy Clowder to the Kubernetes cluster.
You can deploy Clowder using either helm
or kubectl
. The recommended way is to use helm
, but you can also use kubectl
directly.
With helm
The easiest way to install Clowder is using helm package manager:
helm install clowder oci://docker.io/aifoundryorg/clowder
With kubectl
Otherwise you can use kubernetes cli directly. For this to work you should clone this repository first:
git clone https://github.com/clowder-dev/clowder.git
cd clowder
Then deploy all the parts with a single command:
kubectl apply -f k8s/
Local Access
The Clowder API is exposed using a single Kubernetes Service
. If you want it exposed to your local
machine, use kubectl port-forward
:
# Start this and leave running in a separate terminal window.
kubectl port-forward svc/nekko-lb-svc 3090:3090
Use it!
For our examples, we will assumg you have forwarded the Clowder API to your local machine on port 3090
, per the above instructions.
If you are running it elsewhere, you will need to adjust the API URL accordingly.
Let's download a model and run inference on it. For access, we need the authentication token for the Clowder API.
Unless configured otherwise, the default token is nekko-admin-token
. Since we did a default quickstart installation, we can use this token to
access the API.
First, download the model. We will use the SmolLM2-135M-Instruct model from Hugging Face.
Let's get a list of physical nodes to decide on which node to deploy our model with runtime:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
ainekko-control-plane-0-10ahb6ro Ready control-plane,etcd,master 85m v1.32.4+k3s1
ainekko-worker-1747986577-r7qql3ie Ready <none> 85m v1.32.4+k3s1
Mark that the worker node is called ainekko-worker-1747986577-r7qql3ie
. We will use this node to deploy our model.
In the future, you will be able to:
- use the Clowder API to get a list of nodes
- select a node based on its labels
- select a node based on its capabilities, such as GPU or CPU
- tell Clowder to automatically select a node for you
For now, we will use the node name directly.
curl -H "Authorization: Bearer nekko-admin-token" \
-X POST \
--data '{"modelUrl": "hf:///unsloth//SmolLM2-135M-Instruct-GGUF/SmolLM2-135M-Instruct-Q4_K_M.gguf", "modelAlias": "smol", "nodeName": "ainekko-worker-1747986577-r7qql3ie", "credentials": "YOUR_HUGGING_FACE_TOKEN"}' \
-i \
http://localhost:3090/api/v1/workers
This downloads the model and starts a worker runtime pod, let's check:
curl -H "Authorization: Bearer nekko-admin-token" http://localhost:3090/api/v1/workers/list
This gives the result:
{"count":1,"status":"success","workers":[{"name":"","model_name":"smol","model_alias":"smol"}]}
We can now use http://localhost:3090/v1/chat_completions
as we would any
OpenAI API compatible chat completions endpoint.
Web UI
By default, Open WebUI client app is deployed on the cluster.
Expose it locally with:
# Start this and leave it running in a separate terminal window.
kubectl port-forward svc/open-webui 4080:4080
Now we can open the UI at http://localhost:4080, select the model and have a chat.