The semaphore/queue model for Tabby is as follows: - Any load requests go through the semaphore by default - Any load request can include the skip_queue parameter to bypass the semaphore - Any unload requests are immediately executed - All completion requests are placed inside the semaphore by default This model preserves the parallelism of single-user mode with extra convenience methods for queues in multi-user. It also helps mitigate problems that were previously present in the concurrency stack. Also change how the program's loop runs so it exits when the API thread dies. Signed-off-by: kingbri <bdashore3@proton.me> |
||
|---|---|---|
| .. | ||
| chat_completion.py | ||
| common.py | ||
| completion.py | ||
| lora.py | ||
| model.py | ||
| sampler_overrides.py | ||
| template.py | ||
| token.py | ||