tabbyAPI-ollama/endpoints
kingbri 871c89063d Model: Add Tensor Parallel support
Use the tensor parallel loader when the flag is enabled. The new loader
has its own autosplit implementation, so gpu_split_auto isn't valid
here.

Also make it easier to determine which cache type to use rather than
multiple if/else statements.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-22 14:15:19 -04:00
..
core Model: Add Tensor Parallel support 2024-08-22 14:15:19 -04:00
Kobold Kobold: Fix max length type 2024-07-26 23:00:26 -04:00
OAI Templates: Switch to async jinja engine 2024-08-17 12:03:41 -04:00
server.py API: Add setup function to routers 2024-07-26 22:24:33 -04:00
utils.py Main: Make openapi export store locally 2024-07-08 14:54:06 -04:00