Commit graph

7 commits

Author SHA1 Message Date
kingbri
871c89063d Model: Add Tensor Parallel support
Use the tensor parallel loader when the flag is enabled. The new loader
has its own autosplit implementation, so gpu_split_auto isn't valid
here.

Also make it easier to determine which cache type to use rather than
multiple if/else statements.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-22 14:15:19 -04:00
kingbri
3e42211c3e Config: Embeddings: Make embeddings_device a default when API loading
When loading from the API, the fallback for embeddings_device will be
the same as the config.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-01 13:59:49 -04:00
kingbri
54aeebaec1 API: Fix return of current embeddings model
Return a ModelCard instead of a ModelList.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-01 13:43:31 -04:00
kingbri
dc3dcc9c0d Embeddings: Update config, args, and parameter names
Use embeddings_device as the parameter for device to remove ambiguity.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-30 15:32:26 -04:00
kingbri
bfa011e0ce Embeddings: Add model management
Embedding models are managed on a separate backend, but are run
in parallel with the model itself. Therefore, manage this in a separate
container with separate routes.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-30 15:19:27 -04:00
kingbri
71de3060bb Downloader: Make timeout configurable
Add an API parameter to set the timeout in seconds. Keep it to None
by default for uninterrupted downloads.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-23 21:42:38 -04:00
kingbri
9ad69e8ab6 API: Migrate universal routes to core
Place OAI specific routes in the appropriate folder. This is in
preperation for adding new API servers that can be optionally enabled.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-23 14:08:48 -04:00