tabbyAPI-ollama/OAI/types
kingbri 2f568ff573 Config: Expose auto GPU split reserve config
The GPU reserve is used as a VRAM buffer to prevent GPU overflow
when automatically deciding how to load a model on multiple GPUs.
Make this configurable.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-08 22:09:50 -05:00
..
chat_completion.py API: Add logprobs for chat completions 2024-02-08 21:26:53 -05:00
common.py Model: Add logprobs support 2024-02-08 21:26:53 -05:00
completion.py Model: Add logprobs support 2024-02-08 21:26:53 -05:00
lora.py Tree: Refactor code organization 2024-01-25 00:15:40 -05:00
model.py Config: Expose auto GPU split reserve config 2024-02-08 22:09:50 -05:00
sampler_overrides.py API: Add sampler override switching 2024-01-25 00:15:40 -05:00
template.py API: Add template switching and unload endpoints 2024-01-25 00:15:40 -05:00
token.py Tree: Refactor code organization 2024-01-25 00:15:40 -05:00