tabbyAPI-ollama

History

kingbri 871c89063d Model: Add Tensor Parallel support Use the tensor parallel loader when the flag is enabled. The new loader has its own autosplit implementation, so gpu_split_auto isn't valid here. Also make it easier to determine which cache type to use rather than multiple if/else statements. Signed-off-by: kingbri <bdashore3@proton.me>		2024-08-22 14:15:19 -04:00
..
core	Model: Add Tensor Parallel support	2024-08-22 14:15:19 -04:00
Kobold	Kobold: Fix max length type	2024-07-26 23:00:26 -04:00
OAI	Templates: Switch to async jinja engine	2024-08-17 12:03:41 -04:00
server.py	API: Add setup function to routers	2024-07-26 22:24:33 -04:00
utils.py	Main: Make openapi export store locally	2024-07-08 14:54:06 -04:00