jalr/tabbyAPI-ollama

1076 commits 1 branch 0 tags 1.4 MiB

Author	SHA1	Message	Date
kingbri	43cd7f57e8	API + Model: Add blocks and checks for various load requests Add a sequential lock and wait until jobs are completed before executing any loading requests that directly alter the model. However, we also need to block any new requests that come in until the load is finished, so add a condition that triggers once the lock is free. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-25 21:16:14 -04:00
kingbri	c474076b22	Concurrency: Remove release_semaphore method At any point for any request cancellation, the semaphore will be decremented. This is an issue since an arbitrary request can desync the semaphore, causing multiple tasks to be processed at once and break generation. Remove this from the networking handlers and therefore, remove the release_semaphore function itself. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-19 10:42:26 -04:00
kingbri	2755fd1af0	API: Fix blocking iterator execution Run these iterators on the background thread. On startup, the API spawns a background thread as needed to run sync code on without blocking the event loop. Use asyncio's run_thread function since it allows for errors to be propegated. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-16 23:23:31 -04:00
kingbri	7fded4f183	Tree: Switch to async generators Async generation helps remove many roadblocks to managing tasks using threads. It should allow for abortables and modern-day paradigms. NOTE: Exllamav2 itself is not an asynchronous library. It's just been added into tabby's async nature to allow for a fast and concurrent API server. It's still being debated to run stream_ex in a separate thread or manually manage it using asyncio.sleep(0) Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-16 23:23:31 -04:00

Renamed from common/generators.py (Browse further)

4 commits