tabbyAPI-ollama

History

kingbri 871c89063d Model: Add Tensor Parallel support Use the tensor parallel loader when the flag is enabled. The new loader has its own autosplit implementation, so gpu_split_auto isn't valid here. Also make it easier to determine which cache type to use rather than multiple if/else statements. Signed-off-by: kingbri <bdashore3@proton.me>		2024-08-22 14:15:19 -04:00
..
types	Model: Add Tensor Parallel support	2024-08-22 14:15:19 -04:00
utils	Embeddings: Add model management	2024-07-30 15:19:27 -04:00
router.py	API: Fix return of current embeddings model	2024-08-01 13:43:31 -04:00