tabbyAPI-ollama

History

kingbri b0c295dd2f API: Add more methods to semaphore The semaphore/queue model for Tabby is as follows: - Any load requests go through the semaphore by default - Any load request can include the skip_queue parameter to bypass the semaphore - Any unload requests are immediately executed - All completion requests are placed inside the semaphore by default This model preserves the parallelism of single-user mode with extra convenience methods for queues in multi-user. It also helps mitigate problems that were previously present in the concurrency stack. Also change how the program's loop runs so it exits when the API thread dies. Signed-off-by: kingbri <bdashore3@proton.me>		2024-03-04 23:21:40 -05:00
..
args.py	Config: Add experimental torch cuda malloc backend	2024-02-14 21:45:56 -05:00
auth.py	Auth: Create keys on different exception	2024-02-04 01:56:42 -05:00
config.py	Launch: Make exllamav2 requirement more friendly	2024-02-02 23:36:17 -05:00
gen_logging.py	Tree: Refactor code organization	2024-01-25 00:15:40 -05:00
generators.py	API: Fix issues with concurrent requests and queueing	2024-03-04 23:21:40 -05:00
logger.py	Tree: Refactor code organization	2024-01-25 00:15:40 -05:00
sampling.py	Model: Add EBNF grammar support	2024-02-24 23:40:11 -05:00
templating.py	API: Add template switching and unload endpoints	2024-01-25 00:15:40 -05:00
utils.py	API: Add more methods to semaphore	2024-03-04 23:21:40 -05:00