Clowder Fine-Tuning

In general, a base model may be fine-tuned in one of two ways:

By changing the base model, which creates a new model, in which case it has no external layers but a different model.
By adding EP layers, which retains the original base model, but provides additional layers that modify the model’s operation when loaded in memory to run.

This is a little like how to get a cherry cake. You can take a regular cake and put some cherries on top, or you can bake a cake with cherries in it. In both cases, you have a cake with cherries. However, one of them clearly is a cake layer and a cherry layer, while the other is a brand-new cake.

The end result is the same, but the way to get there is different.

From Clowder's perspective, if you have fine-tuned a model and redeployed it as a brand-new model, it is just yet another model. Clowder does not know or care that it originally was a base model that was fine-tuned. It is just a model, and it is treated as such.

The second option, where the base model is unchanged and EP layers are added, is more interesting. It allows for multiple fine-tunings of the same base model, and it allows for the base model to be reused in multiple different ways.

EP Layers are modifiers that are “layered” on top of a base model to modify, or fine-tune, it for runtime.

There are two main ways to use EP layers:

When loading the model into memory, modify the model weights with the information in the EP layers. This looks in memory as if the original model had been modified.
When loading the model into memory, load both the base model and the EP layers. With each inference, some of the model layers’ operations are modified to add additional steps after using the base model’s weights.

Scenario 1: Changed base model

Fine-tuning modifies the actual weights of the base model, leading to a new model.

Scenario 1

Scenario 2: Load-time modify weights

Fine-tuning generates an EP Layer. The base model is unchanged, and both the base model and EP Layer are stored separately. When loading the model into memory for inference, information in the EP Layer is used to modify the weights in memory in the base model.

Scenario 2

Scenario 3: Inference-time modify operations

Fine-tuning generates an EP Layer. The base model is unchanged, and both the base model and EP Layer are stored separately. When loading the model into memory for inference, the operations at affected layers are modified, such that after weights are applied, the output information is further modified by the EP Layer information, and then passed on to the next model layer.

Scenario 3

The actual net result of inference in all three methods is the same. The effective differences are as follows:

Feature	Changed Model	Load-Time Modify Weights	Inference-Time Modify Operations
Inference Results	Fine-tuned	Fine-tuned	Fine-tuned
Storage	Same as original model	Extra	Extra
Memory Usage	Same as original model	Same as original model	Extra
Processing Usage	Same as original model	Extra one-time cost on loading	Extra cost with each inference
Map to Original Model	No	Yes	Yes
Reusable Base Model Storage	No	Yes	Yes
Reusable Base Model GPU Memory	No	No	Possibly

The runtime results of inference in all 3 scenarios is the same, but the storage, modifiability, and reusability differ.

Scenario 1: Changed base model​

Scenario 2: Load-time modify weights​

Scenario 3: Inference-time modify operations​

Scenario 1: Changed base model

Scenario 2: Load-time modify weights

Scenario 3: Inference-time modify operations