Cold boots

To make good use of resources, we only run the models that are actually being used. When a model hasn’t been used for a little while, we turn it off.

When you start a request, the speed of response depends on whether the model is "warm" or "cold". A "warm" model, already running, yields faster responses, while a "cold" model, starting up, leads to slower responses.

Machine learning models, being resource-intensive and sometimes very large, require fetching and loading several gigabytes of code. This process can occasionally extend to several minutes.

Last updated