tabbyAPI-ollama

History

kingbri 9f649647f0 Model + API: GPU split updates and fixes For the TP loader, GPU split cannot be an empty array. However, defaulting the parameter to an empty array makes it easier to calculate the device list. Therefore, cast an empty array to None using falsy comparisons at load time. Also add draft_gpu_split to the load request. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>		2025-02-15 21:50:14 -05:00
..
grammar.py	Grammar: Cache the engine vocabulary	2024-12-05 21:36:37 -08:00
model.py	Model + API: GPU split updates and fixes	2025-02-15 21:50:14 -05:00
utils.py	Dependencies: Update torch, exllamav2, and flash-attn	2025-02-09 01:27:48 -05:00
version.py	Dependencies: Update torch, exllamav2, and flash-attn	2025-02-09 01:27:48 -05:00
vision.py	Dependencies: Fix OpenAPI generation	2024-11-22 17:59:20 -05:00