jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
kingbri	6f03be9523	API: Split functions into their own files Previously, generation function were bundled with the request function causing the overall code structure and API to look ugly and unreadable. Split these up and cleanup a lot of the methods that were previously overlooked in the API itself. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-12 23:59:30 -04:00
kingbri	104a6121cb	API: Split into separate folder Moving the API into its own directory helps compartmentalize it and allows for cleaning up the main file to just contain bootstrapping and the entry point. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-12 23:59:30 -04:00
kingbri	5a2de30066	Tree: Update to cleanup globals Use the module singleton pattern to share global state. This can also be a modified version of the Global Object Pattern. The main reason this pattern is used is for ease of use when handling global state rather than adding extra dependencies for a DI parameter. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-12 23:59:30 -04:00
kingbri	b373b25235	API: Move to ModelManager This is a shared module which manages the model container and provides extra utility functions around it to help slim down the API. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-12 23:59:30 -04:00
kingbri	8b46282aef	Model: Fix state flag sets on unload The load state should be false only if the models are unloaded. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-12 23:59:30 -04:00
kingbri	894be4a818	Startup: Check if the port is available and fallback Similar to Gradio, fall back to port + 1 if the config port isn't bindable. If both ports aren't available, let the user know and exit. An infinite loop of finding a port isn't advisable. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-11 21:57:28 -04:00
kingbri	7c6fd7ac60	Main: Cleanup Remove leftover debug statements. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-11 18:10:35 -04:00
kingbri	53d889e0f0	Logging: Fix legacy warn statement Warn is not a valid method with loguru. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-11 01:31:43 -04:00
kingbri	ba3da6d92f	Logging: Escape rich markup sequences Rich markup sequences inside the log string were causing issues with printing. Fix this by using their escape function. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-11 00:28:48 -04:00
kingbri	4cc0b59bdc	Requirements: Add sse-starlette Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-10 19:41:08 -04:00
kingbri	42c0dbe795	Generation: Explicitly release semaphore on disconnect This prevents any lockups when querying another request. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-10 17:54:48 -04:00
kingbri	2025a1c857	Requirements: Unpin uvicorn v0.28.0 works now and the underlying errors were fixed. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-10 17:48:43 -04:00
kingbri	bbb1a4ec20	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-10 17:45:09 -04:00
kingbri	045262f51f	Logging: Loglevel INFO This is the max that Tabby should log because debug and trace aren't used within the application. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-10 17:44:19 -04:00
kingbri	d45e847c7a	API: Fix disconnect handling on streaming responses Starlette's StreamingResponse has an issue where it yields after a request has disconnected. A bugfix to starlette will fix this issue, but FastAPI uses starlette <= 0.36 which isn't ideal. Therefore, switch back to sse-starlette which handles these disconnects correctly. Also don't try yielding after the request is disconnected. Just return out of the generator instead. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-10 17:43:13 -04:00
kingbri	6b4f100db2	Logger: Escape tags Angle brackets should be escaped to avoid mistaken color formatting. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-10 01:24:50 -05:00
kingbri	e33971859b	Requirements: Pin uvicorn Pin uvicorn due to issues with request disconnection in the latest version. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-10 01:23:36 -05:00
kingbri	a69ee976f0	API: Let the user know if a disconnect occurred If a user disconnects from a request, log this in the console. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-09 15:48:27 -05:00
kingbri	c77259bfbb	Logger: Fix reformatting of message Use the reformatted message when splitting lines instead of the raw message to prevent exceptions. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-09 15:40:37 -05:00
kingbri	4d09226364	Logging: Fix Uvicorn hook The Uvicorn logging config wasn't being set. Fix that when creating a new server. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-08 17:56:48 -05:00
kingbri	2295b12643	Progress: Fix bar with draft models Show two bars and clarify which bar is which. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-08 01:48:06 -05:00
kingbri	c9b4b7c509	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-08 01:00:48 -05:00
kingbri	cad72315f4	Init: Switch to display redoc endpoint Redoc looks much better than Swagger docs, so show that by default. Both endpoints still exist. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-08 01:00:48 -05:00
kingbri	ef2dc326f5	Logging: Fix inconsistent formatting Some colorization was incorrect and the separator insertion has become more robust. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-08 01:00:48 -05:00
kingbri	228c227c1e	Logging: Switch to loguru Loguru is a flexible logger that allows for easier hooking and imports into Rich with no problems. Also makes progress bars stick to the bottom of the terminal window. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-08 01:00:48 -05:00
kingbri	fe0ff240e7	Progress: Switch to Rich Rich is a more mature library for displaying progress bars, logging, and console output. This should help properly align progress bars within the terminal. Side note: "We're Rich!" Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-08 01:00:48 -05:00
kingbri	39617adb65	Requirements: Update Exllamav2 v0.0.15 Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-06 22:29:55 -05:00
Brian Dashore	47c42a23d4	Merge pull request #72 from djmaze/patch-1 Remove explicit install of pytorch & exllamav2 in Dockerfile	2024-03-06 01:13:37 -05:00
kingbri	9a007c4707	Model: Add support for Q4 cache Add this in addition to 8bit cache and 16bit cache. Passing "Q4" with the cache_mode request parameter will set this on model load. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-06 00:59:28 -05:00
kingbri	0b25c208d6	API: Fix error reporting Make a disconnect on load error consistently. It should be safer to warn the user to run unload (or re-run load) if a model does not load correctly. Also don't log the traceback for request errors that don't have one. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-05 18:16:02 -05:00
kingbri	165cc6fc2d	API: Remove unnecessary endpoint This used to be a shim for ooba, but it's no longer necessary. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-04 23:21:40 -05:00
kingbri	d2c6ae2d35	API: Back to async According to FastAPI docs, if you're using a generic function, running it in async will make it more performant (which makes sense since running def functions for routes will automatically run the caller through a threadpool). Tested and everything works fine. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-04 23:21:40 -05:00
kingbri	b0c295dd2f	API: Add more methods to semaphore The semaphore/queue model for Tabby is as follows: - Any load requests go through the semaphore by default - Any load request can include the skip_queue parameter to bypass the semaphore - Any unload requests are immediately executed - All completion requests are placed inside the semaphore by default This model preserves the parallelism of single-user mode with extra convenience methods for queues in multi-user. It also helps mitigate problems that were previously present in the concurrency stack. Also change how the program's loop runs so it exits when the API thread dies. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-04 23:21:40 -05:00
kingbri	c82697fef2	API: Fix issues with concurrent requests and queueing This is the first in many future commits that will overhaul the API to be more robust and concurrent. The model is admin-first where the admin can do anything in-case something goes awry. Previously, calls to long running synchronous background tasks would block the entire API, making it ignore any terminal signals until generation is completed. To fix this, levrage FastAPI's run_in_threadpool to offload the long running tasks to another thread. However, signals to abort the process still kept the background thread running and made the terminal hang. This was due to an issue with Uvicorn not propegating the SIGINT signal across threads in its event loop. To fix this in a catch-all way, run the API processes in a separate thread so the main thread can still kill the process if needed. In addition, make request error logging more robust and refer to the console for full error logs rather than creating a long message on the client-side. Finally, add state checks to see if a model is fully loaded before generating a completion. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-04 23:21:40 -05:00
Brian Dashore	de91eade4b	Merge pull request #75 from DocShotgun/main Additional clarification for override_base_seq_len	2024-03-03 01:30:45 -05:00
DocShotgun	8245488926	Additional clarification for override_base_seq_len	2024-03-02 09:29:50 -08:00
Martin Honermeyer	4afb4137f7	Remove explicit pytorch & exllamav2 in Dockerfile These packages are already installed via requirements.txt.	2024-02-25 18:03:01 +01:00
kingbri	fc857893ee	Model: Remove Exllamav2 patches These classes are in the newest version now. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-24 23:40:11 -05:00
kingbri	73a1d9ef78	Model: Fix imports Use the standard import ordering. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-24 23:40:11 -05:00
kingbri	f6d749c771	Model: Add EBNF grammar support Using the Outlines library, add support to supply EBNF strings and pass them to the library for parsing. From there, a wrapper is created and a filter is passed to generation. Replace with an in-house solution at some point that's more flexible. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-24 23:40:11 -05:00
kingbri	57b3d69949	API + Model: Add support for JSON schema constraints Add the ability to constrain the return value of a model to be JSON. Built using the JSON schema standard to define the properties of what the model should return. This feature should be more accurate than using GBNF/EBNF to yield the same results due to the use of lmformatenforcer. GBNF/EBNF will be added in a different commit/branch. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-24 23:40:11 -05:00
kingbri	ccd41d720d	Requirements: Bump ExllamaV2 v0.0.14 Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-24 12:26:08 -05:00
kingbri	360802762c	Model: Fix logit bias token checks Accidentally checked on the token bias tensor which didn't contain the token IDs. Check if the index exists on the id_to_piece list instead. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-22 21:44:15 -05:00
kingbri	5a23b9ebc9	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-22 01:28:30 -05:00
kingbri	bee26a2f2c	API: Auto-unload on a load request Automatically unload the existing model when calling /load. This was requested many times, and does make more sense in the long run. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-21 23:00:11 -05:00
kingbri	368eb2e2d9	Update README Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-20 00:19:31 -05:00
kingbri	a19a4eb1be	Model: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-18 18:31:31 -05:00
kingbri	7def32e4de	Model: Fix logit bias handling If the token doesn't exist, gracefully warn instead of erroring out. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-18 18:30:58 -05:00
kingbri	aa34b2e5fd	Model: Prefer auto over manual GPU split For safety reasons, always use auto unless a manual split is provided and auto is forced off. If auto is forced off and a manual split isn't provided, a manual split will be attempted. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-17 00:21:48 -05:00
kingbri	ea00a6bd45	Requirements: Update Exllamav2 Update to v0.0.13.post2 Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-14 21:51:25 -05:00

... 5 6 7 8 9 ...

641 commits