tabbyAPI-ollama/endpoints/core
kingbri 871c89063d Model: Add Tensor Parallel support
Use the tensor parallel loader when the flag is enabled. The new loader
has its own autosplit implementation, so gpu_split_auto isn't valid
here.

Also make it easier to determine which cache type to use rather than
multiple if/else statements.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-22 14:15:19 -04:00
..
types Model: Add Tensor Parallel support 2024-08-22 14:15:19 -04:00
utils Embeddings: Add model management 2024-07-30 15:19:27 -04:00
router.py API: Fix return of current embeddings model 2024-08-01 13:43:31 -04:00