tabbyAPI-ollama/common
kingbri b0c295dd2f API: Add more methods to semaphore
The semaphore/queue model for Tabby is as follows:
- Any load requests go through the semaphore by default
- Any load request can include the skip_queue parameter to bypass
the semaphore
- Any unload requests are immediately executed
- All completion requests are placed inside the semaphore by default

This model preserves the parallelism of single-user mode with extra
convenience methods for queues in multi-user. It also helps mitigate
problems that were previously present in the concurrency stack.

Also change how the program's loop runs so it exits when the API thread
dies.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-04 23:21:40 -05:00
..
args.py Config: Add experimental torch cuda malloc backend 2024-02-14 21:45:56 -05:00
auth.py Auth: Create keys on different exception 2024-02-04 01:56:42 -05:00
config.py Launch: Make exllamav2 requirement more friendly 2024-02-02 23:36:17 -05:00
gen_logging.py Tree: Refactor code organization 2024-01-25 00:15:40 -05:00
generators.py API: Fix issues with concurrent requests and queueing 2024-03-04 23:21:40 -05:00
logger.py Tree: Refactor code organization 2024-01-25 00:15:40 -05:00
sampling.py Model: Add EBNF grammar support 2024-02-24 23:40:11 -05:00
templating.py API: Add template switching and unload endpoints 2024-01-25 00:15:40 -05:00
utils.py API: Add more methods to semaphore 2024-03-04 23:21:40 -05:00