jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
kingbri	ae879a623f	Main: Add await to an async function load_loras wasn't properly updated. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-02 21:24:43 -04:00
kingbri	5bb4995a7c	API: Move OAI to APIRouter This makes the API more modular for other API implementations in the future. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-06 01:25:31 -04:00
kingbri	6dfcbbd813	Common: Migrate request utils to networking Helps organize the project better. Utils is meant to be for simple functions like unwrap. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-21 23:21:57 -04:00
kingbri	14d8ec2007	Signal: Fix signal handlers for uvicorn Add the ability to override uvicorn's signal handler in addition to using main's signal handler for any SIGINTs before the API server starts. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-16 23:23:31 -04:00
kingbri	7fded4f183	Tree: Switch to async generators Async generation helps remove many roadblocks to managing tasks using threads. It should allow for abortables and modern-day paradigms. NOTE: Exllamav2 itself is not an asynchronous library. It's just been added into tabby's async nature to allow for a fast and concurrent API server. It's still being debated to run stream_ex in a separate thread or manually manage it using asyncio.sleep(0) Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-16 23:23:31 -04:00
kingbri	104a6121cb	API: Split into separate folder Moving the API into its own directory helps compartmentalize it and allows for cleaning up the main file to just contain bootstrapping and the entry point. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-12 23:59:30 -04:00
kingbri	5a2de30066	Tree: Update to cleanup globals Use the module singleton pattern to share global state. This can also be a modified version of the Global Object Pattern. The main reason this pattern is used is for ease of use when handling global state rather than adding extra dependencies for a DI parameter. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-12 23:59:30 -04:00
kingbri	b373b25235	API: Move to ModelManager This is a shared module which manages the model container and provides extra utility functions around it to help slim down the API. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-12 23:59:30 -04:00
kingbri	894be4a818	Startup: Check if the port is available and fallback Similar to Gradio, fall back to port + 1 if the config port isn't bindable. If both ports aren't available, let the user know and exit. An infinite loop of finding a port isn't advisable. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-11 21:57:28 -04:00
kingbri	7c6fd7ac60	Main: Cleanup Remove leftover debug statements. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-11 18:10:35 -04:00
kingbri	42c0dbe795	Generation: Explicitly release semaphore on disconnect This prevents any lockups when querying another request. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-10 17:54:48 -04:00
kingbri	bbb1a4ec20	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-10 17:45:09 -04:00
kingbri	d45e847c7a	API: Fix disconnect handling on streaming responses Starlette's StreamingResponse has an issue where it yields after a request has disconnected. A bugfix to starlette will fix this issue, but FastAPI uses starlette <= 0.36 which isn't ideal. Therefore, switch back to sse-starlette which handles these disconnects correctly. Also don't try yielding after the request is disconnected. Just return out of the generator instead. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-10 17:43:13 -04:00
kingbri	a69ee976f0	API: Let the user know if a disconnect occurred If a user disconnects from a request, log this in the console. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-09 15:48:27 -05:00
kingbri	4d09226364	Logging: Fix Uvicorn hook The Uvicorn logging config wasn't being set. Fix that when creating a new server. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-08 17:56:48 -05:00
kingbri	2295b12643	Progress: Fix bar with draft models Show two bars and clarify which bar is which. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-08 01:48:06 -05:00
kingbri	cad72315f4	Init: Switch to display redoc endpoint Redoc looks much better than Swagger docs, so show that by default. Both endpoints still exist. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-08 01:00:48 -05:00
kingbri	228c227c1e	Logging: Switch to loguru Loguru is a flexible logger that allows for easier hooking and imports into Rich with no problems. Also makes progress bars stick to the bottom of the terminal window. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-08 01:00:48 -05:00
kingbri	fe0ff240e7	Progress: Switch to Rich Rich is a more mature library for displaying progress bars, logging, and console output. This should help properly align progress bars within the terminal. Side note: "We're Rich!" Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-08 01:00:48 -05:00
kingbri	9a007c4707	Model: Add support for Q4 cache Add this in addition to 8bit cache and 16bit cache. Passing "Q4" with the cache_mode request parameter will set this on model load. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-06 00:59:28 -05:00
kingbri	0b25c208d6	API: Fix error reporting Make a disconnect on load error consistently. It should be safer to warn the user to run unload (or re-run load) if a model does not load correctly. Also don't log the traceback for request errors that don't have one. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-05 18:16:02 -05:00
kingbri	165cc6fc2d	API: Remove unnecessary endpoint This used to be a shim for ooba, but it's no longer necessary. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-04 23:21:40 -05:00
kingbri	d2c6ae2d35	API: Back to async According to FastAPI docs, if you're using a generic function, running it in async will make it more performant (which makes sense since running def functions for routes will automatically run the caller through a threadpool). Tested and everything works fine. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-04 23:21:40 -05:00
kingbri	b0c295dd2f	API: Add more methods to semaphore The semaphore/queue model for Tabby is as follows: - Any load requests go through the semaphore by default - Any load request can include the skip_queue parameter to bypass the semaphore - Any unload requests are immediately executed - All completion requests are placed inside the semaphore by default This model preserves the parallelism of single-user mode with extra convenience methods for queues in multi-user. It also helps mitigate problems that were previously present in the concurrency stack. Also change how the program's loop runs so it exits when the API thread dies. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-04 23:21:40 -05:00
kingbri	c82697fef2	API: Fix issues with concurrent requests and queueing This is the first in many future commits that will overhaul the API to be more robust and concurrent. The model is admin-first where the admin can do anything in-case something goes awry. Previously, calls to long running synchronous background tasks would block the entire API, making it ignore any terminal signals until generation is completed. To fix this, levrage FastAPI's run_in_threadpool to offload the long running tasks to another thread. However, signals to abort the process still kept the background thread running and made the terminal hang. This was due to an issue with Uvicorn not propegating the SIGINT signal across threads in its event loop. To fix this in a catch-all way, run the API processes in a separate thread so the main thread can still kill the process if needed. In addition, make request error logging more robust and refer to the console for full error logs rather than creating a long message on the client-side. Finally, add state checks to see if a model is fully loaded before generating a completion. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-04 23:21:40 -05:00
kingbri	5a23b9ebc9	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-22 01:28:30 -05:00
kingbri	bee26a2f2c	API: Auto-unload on a load request Automatically unload the existing model when calling /load. This was requested many times, and does make more sense in the long run. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-21 23:00:11 -05:00
kingbri	949248fb94	Config: Add experimental torch cuda malloc backend This option saves some VRAM, but does have the chance to error out. Add this in the experimental config section. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-14 21:45:56 -05:00
kingbri	c02fe4d1db	API: Fix response creation Change chat completion and text completion responses to be more flexible. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-08 21:26:53 -05:00
kingbri	0af6a38af3	Model: Add logprobs support Returns token offsets, selected tokens, probabilities of tokens post-sampling, and normalized probability of selecting a token pre-sampling (for efficiency purposes). Only for text completions. Chat completions in a later commit. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-08 21:26:53 -05:00
kingbri	284f20263f	API: Clean up tokenizing endpoint Split the get tokens function into separate wrapper encode and decode functions for overall code cleanliness. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-08 21:26:53 -05:00
kingbri	58590a6c57	Config: Add option to force streaming off Many APIs automatically ask for request streaming without giving the user the option to turn it off. Therefore, give the user more freedom by giving a server-side kill switch. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-07 21:09:59 -05:00
kingbri	1919bf7705	Launch: Make exllamav2 requirement more friendly Add the ability to use an unsafe config flag if needed and migrate the exl2 check to a different file within the exl2 backend code. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-02 23:36:17 -05:00
kingbri	2ea063cea9	Tree: Require exllamav2 version for startup Exllamav2 is currently supported on all GPUs and versions. Therefore, it should be expected that users use the latest version of exllamav2 to get the latest features. Doing this helps reduce checks that don't really serve any purpose. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-02 23:36:17 -05:00
kingbri	d3781920b3	OAI: Split up utility functions Just like types, put utility functions in their own separate module based on the route. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-02 23:36:17 -05:00
kingbri	b14c5443fd	API: Add sampler override switching Allow users to switch the currently overriden samplers via the API so a restart isn't required to switch the overrides. Signed-off-by: kingbri <bdashore3@proton.me>	2024-01-25 00:15:40 -05:00
kingbri	de0ba7214c	API: Add template switching and unload endpoints Templates can be switched and unloaded without reloading the entire model. Signed-off-by: kingbri <bdashore3@proton.me>	2024-01-25 00:15:40 -05:00
kingbri	6c30f24c83	Tree: Unify sampler parameters and add override support Unify API sampler params into a superclass which should make them easier to manage and inherit generic functions from. Not all frontends expose all sampling parameters due to connections with OAI (that handles sampling themselves with the exception of a few sliders). Add the ability for the user to customize fallback parameters from server-side. In addition, parameters can be forced to a certain value server-side in case the repo automatically sets other sampler values in the background that the user doesn't want. Signed-off-by: kingbri <bdashore3@proton.me>	2024-01-25 00:15:40 -05:00
kingbri	78f920eeda	Tree: Refactor code organization Move common functions into their own folder and refactor the backends to use their own folder as well. Also cleanup imports and alphabetize import statments themselves. Finally, move colab and docker into their own folders as well. Signed-off-by: kingbri <bdashore3@proton.me>	2024-01-25 00:15:40 -05:00
kingbri	902e841c39	Main: Add logging for API routes Helps users get started with accessing the docs. Signed-off-by: kingbri <bdashore3@proton.me>	2024-01-10 23:50:11 -05:00
kingbri	c1642076c2	API: Switch unload method to POST GET and POST can be used interchangeably in this case, but adhere to the HTTP spec. Signed-off-by: kingbri <bdashore3@proton.me>	2024-01-04 21:11:36 -05:00
kingbri	451042aadf	Main: Don't load if model_name/loras is blank Previously, if model_name was commented out, a load would not occur. Add the case if model_name or loras is blank which returns None when parsing the YAML. Signed-off-by: kingbri <bdashore3@proton.me>	2024-01-02 13:56:25 -05:00
kingbri	6b04463051	API: Fix CFG reporting THe model endpoint wasn't reporting if CFG is on. Signed-off-by: kingbri <bdashore3@proton.me>	2024-01-02 13:54:16 -05:00
kingbri	bb7a8e4614	Config: Add override argparser Add an argparser that casts over to dictionaries of subgroups to integrate with the config. This argparser doesn't contain everything in the config due to complexity issues with CLI args, but will eventually progress to parity. In addition, it's used to override the config.yml rather than replace it. A config arg is also provided if the user wants to fully override the config yaml with another file path. Signed-off-by: kingbri <bdashore3@proton.me>	2024-01-01 14:27:12 -05:00
kingbri	79a57588d5	API: Add template list endpoint Fetches all template names that a user has in the templates directory for chat completions. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-29 22:58:55 -05:00
kingbri	dce8c74edc	API: Add clarification and cleanup autodocs It's possible to override parts of the example JSON to give proper examples of values. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-29 10:28:06 -05:00
kingbri	3622710582	API: Fix num_experts_per_token reporting This wasn't linked to the model config. This value can be 1 if a MoE model isn't loaded. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-28 00:31:14 -05:00
kingbri	c5bbfd97b2	Entrypoint: Load loras after model Prevents an error if the model isn't loaded on startup. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-27 23:55:02 -05:00
kingbri	ac0d6f8869	Tree: Format and cleanup start Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-27 01:17:31 -05:00
kingbri	a71b96a20c	Main: Switch to entrypoint Allows for other modules to access the startup function. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-27 00:34:50 -05:00

1 2 3

116 commits