jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
DocShotgun	7380a3b79a	Implement lora support (#24 ) * Model: Implement basic lora support * Add ability to load loras from config on launch * Supports loading multiple loras and lora scaling * Add function to unload loras * Colab: Update for basic lora support * Model: Test vram alloc after lora load, add docs * Git: Add loras folder to .gitignore * API: Add basic lora-related endpoints * Add /loras/ endpoint for querying available loras * Add /model/lora endpoint for querying currently loaded loras * Add /model/lora/load endpoint for loading loras * Add /model/lora/unload endpoint for unloading loras * Move lora config-checking logic to main.py for better compat with API endpoints * Revert bad CRLF line ending changes * API: Add basic lora-related endpoints (fixed) * Add /loras/ endpoint for querying available loras * Add /model/lora endpoint for querying currently loaded loras * Add /model/lora/load endpoint for loading loras * Add /model/lora/unload endpoint for unloading loras * Move lora config-checking logic to main.py for better compat with API endpoints * Model: Unload loras first when unloading model * API + Models: Cleanup lora endpoints and functions Condenses down endpoint and model load code. Also makes the routes behave the same way as model routes to help not confuse the end user. Signed-off-by: kingbri <bdashore3@proton.me> * Loras: Optimize load endpoint Return successes and failures along with consolidating the request to the rewritten load_loras function. Signed-off-by: kingbri <bdashore3@proton.me> --------- Co-authored-by: kingbri <bdashore3@proton.me> Co-authored-by: DocShotgun <126566557+DocShotgun@users.noreply.github.com>	2023-12-08 23:38:08 -05:00
kingbri	f8e9e22c43	API: Fix model load endpoint with draft Draft wasn't being parsed correctly with the new changes which removed the draft_enabled bool. There's still some more work to be done with returning exceptions. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-06 18:05:55 -05:00
kingbri	8ba3bfa6b3	API: Fix load exception handling Models do not fully unload if an exception is caught in load. Therefore, leave it to the client to unload on cancel. Also add handlers in the event a SSE stream is cancelled. These packets can't be sent back to the client since the client has severed the connection, so print them in terminal. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-05 00:23:15 -05:00
kingbri	7c92968558	API: Fix mistaken debug statement Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-04 18:07:12 -05:00
kingbri	5e54911cc8	API: Fix semaphore handling and chat completion errors Chat completions previously always yielded a final packet to say that a generation finished. However, this caused errors that a yield was executed after GeneratorExit. This is correctly stated because python's garbage collector can't clean up the generator after exiting due to the finally block executing. In addition, SSE endpoints close off the connection, so the finish packet can only be yielded when the response has completed, so ignore yield on exception. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-04 15:51:25 -05:00
kingbri	ed6c962aad	API: Fix sequential requests FastAPI is kinda weird with queueing. If an await is used within an async def, requests aren't executed sequentially. Get the sequential requests back by using a semaphore to limit concurrent execution from generator functions. Also scaffold the framework to move generator functions to their own file. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-03 22:54:34 -05:00
kingbri	ae69b18583	API: Use FastAPI streaming instead of sse_starlette sse_starlette kept firing a ping response if it was taking too long to set an event. Rather than using a hacky workaround, switch to FastAPI's inbuilt streaming response and construct SSE requests with a utility function. This helps the API become more robust and removes an extra requirement. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-01 01:54:35 -05:00
kingbri	6493b1d2aa	OAI: Add ability to send dummy models Some APIs require an OAI model to be sent against the models endpoint. Fix this by adding a GPT 3.5 turbo entry as first in the list to cover as many APIs as possible. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-01 00:27:28 -05:00
kingbri	aef411bed5	OAI: Fix chat completion streaming Chat completions require a finish reason to be provided in the OAI spec once the streaming is completed. This is different from a non- streaming chat completion response. Also fix some errors that were raised from the endpoint. References #15 Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-01 00:14:24 -05:00
kingbri	e703c716ee	Merge branch 'main' of https://github.com/ziadloo/tabbyAPI into ziadloo-main	2023-11-30 01:01:48 -05:00
kingbri	56f9b1d1a8	API: Add generator error handling If the generator errors, there's no proper handling to send an error packet and close the connection. This is especially important for unloading models if the load fails at any stage to reclaim a user's VRAM. Raising an exception caused the model_container object to lock and not get freed by the GC. This made sense to propegate SSE errors across all generator functions rather than relying on abort signals. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-30 00:37:48 -05:00
kingbri	2bc3da0155	YAML: Force all files to open with utf8 The default encoding method when opening files on Windows is cp1252 which doesn't support all unicode and can cause issues. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-29 22:04:29 -05:00
Mehran Ziadloo	b0c42d0f05	Leveraging local variables	2023-11-27 20:56:56 -08:00
Mehran Ziadloo	ead503c75b	Adding token usage support	2023-11-27 20:05:05 -08:00
kingbri	d929e0c826	API: Fix error points and exceptions On /v1/model/load, some internal server errors weren't being sent, so migrate directory checking out and also add a check to make sure the proposed model path exists. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-25 00:27:02 -05:00
kingbri	d47c39da54	API: Don't include draft directory in response The draft directory should be returned for a draft model request (TBD). Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-23 00:07:56 -05:00
kingbri	f47919b1d3	API: Add draft model support Models can be loaded with a child object called "draft" in the POST request. Again, models need to be located within the draft model dir to get loaded. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-19 00:32:25 -05:00
kingbri	27ebec3b35	Model: Add speculative decoding support via config Speculative decoding makes use of draft models that ingest the prompt before forwarding it to the main model. Add options in the config to support this. API options will occur in a different commit. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-18 01:42:20 -05:00
kingbri	4669e49ff0	API: Fix errors with token endpoint Handle None cases if the provided text/token lists are empty. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-17 01:39:06 -05:00
kingbri	021981fce0	API: Re-add depends endpoints Mistakenly removed API key authentication for the models endpoints in testing. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-17 00:50:42 -05:00
kingbri	ac4e9c2277	API: Add CORS support Tell CORS to go fly a kite. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-16 22:19:47 -05:00
kingbri	08a183540b	Config: Add warning on exceptions and clarify parameters Due to how YAML works, double quotes are bad. Specify a linter in the top of the config_sample file. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-16 22:19:47 -05:00
kingbri	282b5b2931	API: Fix responses and some params Responses were not being properly sent as JSON. Only run pydantic's JSON function on stream responses. FastAPI does the rest with static responses. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-16 17:11:55 -05:00
kingbri	d8d61fa19b	API: Add fallback if model isn't loaded Most endpoints require the model to be loaded, so add a depends. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-16 12:20:35 -05:00
kingbri	60eb076b43	Tree: Basic formatting and comments Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-16 11:48:40 -05:00
kingbri	5defb1b0b4	Config: Fix errors when stuff doesn't exist Add safe fallbacks if any part of the config tree doesn't exist. This prevents random internal server errors from showing up. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-16 11:41:03 -05:00
kingbri	5e8419ec0c	OAI: Add chat completions endpoint Chat completions is the endpoint that will be used by OAI in the future. Makes sense to support it even though the completions endpoint will be used more often. Also unify common parameters between the chat completion and completion requests since they're very similar. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-16 01:06:07 -05:00
kingbri	1f444c8fb7	Requirements: Add fastchat and override pydantic Use an older version of pydantic to stay compatible Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-15 01:00:08 -05:00
kingbri	d0b6b11068	OAI: Make freq and presence pen floats Also rename the completions typing file. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-15 00:55:15 -05:00
kingbri	126afdfdc2	Model: Fix gpu split params GPU split auto is a bool and GPU split is an array of integers for GBs to allocate per GPU. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-15 00:55:15 -05:00
kingbri	8fea5391a8	Api: Add token endpoints Support for encoding and decoding with various parameters. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-15 00:55:15 -05:00
kingbri	4670a77c26	API: Don't use response_class This arg in routes caused many errors and isn't even needed for responses. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-14 22:09:26 -05:00
kingbri	b625bface9	OAI: Add API-based model loading/unloading and auth routes Models can be loaded and unloaded via the API. Also add authentication to use the API and for administrator tasks. Both types of authorization use different keys. Also fix the unload function to properly free all used vram. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-14 01:17:19 -05:00
kingbri	47343e2f1a	OAI: Add models support The models endpoint fetches all the models that OAI has to offer. However, since this is an OAI clone, just list the models inside the user's configured model directory instead. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-13 21:38:34 -05:00
kingbri	eee8b642bd	OAI: Implement completion API endpoint Add support for /v1/completions with the option to use streaming if needed. Also rewrite API endpoints to use async when possible since that improves request performance. Model container parameter names also needed rewrites as well and set fallback cases to their disabled values. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-13 18:31:26 -05:00
kingbri	a10c14d357	Config: Switch to YAML and add load progress YAML is a more flexible format when it comes to configuration. Commandline arguments are difficult to remember and configure especially for an API with complicated commandline names. Rather than using half-baked textfiles, implement a proper config solution. Also add a progress bar when loading models in the commandline. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-12 00:21:16 -05:00
kingbri	5d32aa02cd	Tree: Update to use ModelContainer and args Use command-line arguments to load an initial model if necessary. API routes are broken, but we should be using the container from now on as a primary interface with the exllama2 library. Also these args should be turned into a YAML configuration file in the future. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-10 23:19:54 -05:00
Splice86	8e2671a265	Update to README and other minor changes	2023-11-10 01:37:24 -06:00
Splice86	ab84b01fdf	Updated readme	2023-11-10 00:39:08 -06:00
david	b967e2e604	Initial	2023-11-09 21:27:45 -06:00

40 commits