tabbyAPI-ollama/OAI/types
kingbri b0c295dd2f API: Add more methods to semaphore
The semaphore/queue model for Tabby is as follows:
- Any load requests go through the semaphore by default
- Any load request can include the skip_queue parameter to bypass
the semaphore
- Any unload requests are immediately executed
- All completion requests are placed inside the semaphore by default

This model preserves the parallelism of single-user mode with extra
convenience methods for queues in multi-user. It also helps mitigate
problems that were previously present in the concurrency stack.

Also change how the program's loop runs so it exits when the API thread
dies.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-04 23:21:40 -05:00
..
chat_completion.py API: Add logprobs for chat completions 2024-02-08 21:26:53 -05:00
common.py Model: Add logprobs support 2024-02-08 21:26:53 -05:00
completion.py Model: Add logprobs support 2024-02-08 21:26:53 -05:00
lora.py API: Add more methods to semaphore 2024-03-04 23:21:40 -05:00
model.py API: Add more methods to semaphore 2024-03-04 23:21:40 -05:00
sampler_overrides.py API: Add sampler override switching 2024-01-25 00:15:40 -05:00
template.py API: Add template switching and unload endpoints 2024-01-25 00:15:40 -05:00
token.py Tree: Refactor code organization 2024-01-25 00:15:40 -05:00