jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
AlpinDale	fa47f51f85	feat: workflows for formatting/linting (#35 ) * add github workflows for pylint and yapf * yapf * docstrings for auth * fix auth.py * fix generators.py * fix gen_logging.py * fix main.py * fix model.py * fix templating.py * fix utils.py * update formatting.sh to include subdirs for pylint * fix model_test.py * fix wheel_test.py * rename utils to utils_oai * fix OAI/utils_oai.py * fix completion.py * fix token.py * fix lora.py * fix common.py * add pylintrc and fix model.py * finish up pylint * fix attribute error * main.py formatting * add formatting batch script * Main: Remove unnecessary global Linter suggestion. Signed-off-by: kingbri <bdashore3@proton.me> * switch to ruff * Formatting + Linting: Add ruff.toml Signed-off-by: kingbri <bdashore3@proton.me> * Formatting + Linting: Switch scripts to use ruff Also remove the file and recent file change functions from both scripts. Signed-off-by: kingbri <bdashore3@proton.me> * Tree: Format and lint Signed-off-by: kingbri <bdashore3@proton.me> * Scripts + Workflows: Format Signed-off-by: kingbri <bdashore3@proton.me> * Tree: Remove pylint flags We use ruff now Signed-off-by: kingbri <bdashore3@proton.me> * Tree: Format Signed-off-by: kingbri <bdashore3@proton.me> * Formatting: Line length is 88 Use the same value as Black. Signed-off-by: kingbri <bdashore3@proton.me> * Tree: Format Update to new line length rules. Signed-off-by: kingbri <bdashore3@proton.me> --------- Authored-by: AlpinDale <52078762+AlpinDale@users.noreply.github.com> Co-authored-by: kingbri <bdashore3@proton.me>	2023-12-22 16:20:35 +00:00
kingbri	a14abfe21c	Templates: Support bos_token and eos_token fields These are commonly seen in huggingface provided chat templates and aren't that difficult to add in. For feature parity, honor the add_bos_token and ban_eos_token parameters when constructing the prompt. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-22 10:33:11 -05:00
kingbri	8fa764bfbe	Auth: Add option to disable authentication This creates a massive security hole, but it's gated behind a flag for users who only use localhost. A warning will pop up when users disable authentication. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-21 23:40:16 -05:00
kingbri	99a798e117	API: Add auth enforcement to draft list This didn't have an API key gate. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-21 23:14:04 -05:00
kingbri	1a8afcb6ad	Generator: Fix semaphore scheduling Non-streaming tasks were not regulated by the semaphore, causing these tasks to interfere with streaming generations. Add helper functions to take in both sync and async functions for callbacks and sequential blocking with the semaphore. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-21 21:39:45 -05:00
kingbri	c9e43e51aa	API: Add route for draft model list Does the same thing as model list except with draft models. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-19 23:45:53 -05:00
kingbri	da69ad8cd3	Requirements: Pin versions for some dependencies Pydantic and Jinja2 need pinned versions. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-19 21:48:04 -05:00
kingbri	1fd38c61de	API: Remove model check dependency for lora list This isn't needed for listing stuff. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-19 21:35:29 -05:00
kingbri	de9a19b5d3	Templating: Add generation prompt appending Append generation prompts if given the flag on an OAI chat completion request. This appends the "assistant" message to the instruct prompt. Defaults to true since this is intended behavior. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	51ca1ff396	Tree: Switch to Pydantic 2 Pydantic 2 has more modern methods and stability compared to Pydantic 1 Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	f631dd6ff7	Templates: Switch to Jinja2 Jinja2 is a lightweight template parser that's used in Transformers for parsing chat completions. It's much more efficient than Fastchat and can be imported as part of requirements. Also allows for unblocking Pydantic's version. Users now have to provide their own template if needed. A separate repo may be usable for common prompt template storage. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	1a331afe3a	OAI: Add cache_mode parameter to model Mistakenly forgot that the user can choose what cache mode to use when loading a model. Also add when fetching model info. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-16 02:47:50 -05:00
kingbri	eb8ccb9783	Tree: Fix linter issues Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-12 23:58:19 -05:00
kingbri	083df7d585	Tree: Add generation logging support Generations can be logged in the console along with sampling parameters if the user enables it in config. Metrics are always logged at the end of each prompt. In addition, the model endpoint tells the user if they're being logged or not for transparancy purposes. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-12 23:43:35 -05:00
kingbri	db87efde4a	OAI: Add ability to specify fastchat prompt template Sometimes fastchat may not be able to detect the prompt template from the model path. Therefore, add the ability to set it in config.yml or via the request object itself. Also send the provided prompt template on model info request. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-10 15:43:58 -05:00
kingbri	9f195af5ad	Main: Fix function calls Some function names were declared twice. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-10 13:28:21 -05:00
kingbri	fd9f3eac87	Model: Add params to current model endpoint Grabs the current model rope params, max seq len, and the draft model if applicable. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-10 00:40:56 -05:00
kingbri	0f4290f05c	Model: Format Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-09 22:48:42 -05:00
kingbri	5ae2a91c04	Tree: Use unwrap and coalesce for optional handling Python doesn't have proper handling of optionals. The only way to handle them is checking via an if statement if the value is None or by using the "or" keyword to unwrap optionals. Previously, I used the "or" method to unwrap, but this caused issues due to falsy values falling back to the default. This is especially the case with booleans were "False" changed to "True". Instead, add two new functions: unwrap and coalesce. Both function to properly implement a functional way of "None" coalescing. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-09 21:52:17 -05:00
DocShotgun	7380a3b79a	Implement lora support (#24 ) * Model: Implement basic lora support * Add ability to load loras from config on launch * Supports loading multiple loras and lora scaling * Add function to unload loras * Colab: Update for basic lora support * Model: Test vram alloc after lora load, add docs * Git: Add loras folder to .gitignore * API: Add basic lora-related endpoints * Add /loras/ endpoint for querying available loras * Add /model/lora endpoint for querying currently loaded loras * Add /model/lora/load endpoint for loading loras * Add /model/lora/unload endpoint for unloading loras * Move lora config-checking logic to main.py for better compat with API endpoints * Revert bad CRLF line ending changes * API: Add basic lora-related endpoints (fixed) * Add /loras/ endpoint for querying available loras * Add /model/lora endpoint for querying currently loaded loras * Add /model/lora/load endpoint for loading loras * Add /model/lora/unload endpoint for unloading loras * Move lora config-checking logic to main.py for better compat with API endpoints * Model: Unload loras first when unloading model * API + Models: Cleanup lora endpoints and functions Condenses down endpoint and model load code. Also makes the routes behave the same way as model routes to help not confuse the end user. Signed-off-by: kingbri <bdashore3@proton.me> * Loras: Optimize load endpoint Return successes and failures along with consolidating the request to the rewritten load_loras function. Signed-off-by: kingbri <bdashore3@proton.me> --------- Co-authored-by: kingbri <bdashore3@proton.me> Co-authored-by: DocShotgun <126566557+DocShotgun@users.noreply.github.com>	2023-12-08 23:38:08 -05:00
kingbri	f8e9e22c43	API: Fix model load endpoint with draft Draft wasn't being parsed correctly with the new changes which removed the draft_enabled bool. There's still some more work to be done with returning exceptions. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-06 18:05:55 -05:00
kingbri	8ba3bfa6b3	API: Fix load exception handling Models do not fully unload if an exception is caught in load. Therefore, leave it to the client to unload on cancel. Also add handlers in the event a SSE stream is cancelled. These packets can't be sent back to the client since the client has severed the connection, so print them in terminal. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-05 00:23:15 -05:00
kingbri	7c92968558	API: Fix mistaken debug statement Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-04 18:07:12 -05:00
kingbri	5e54911cc8	API: Fix semaphore handling and chat completion errors Chat completions previously always yielded a final packet to say that a generation finished. However, this caused errors that a yield was executed after GeneratorExit. This is correctly stated because python's garbage collector can't clean up the generator after exiting due to the finally block executing. In addition, SSE endpoints close off the connection, so the finish packet can only be yielded when the response has completed, so ignore yield on exception. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-04 15:51:25 -05:00
kingbri	ed6c962aad	API: Fix sequential requests FastAPI is kinda weird with queueing. If an await is used within an async def, requests aren't executed sequentially. Get the sequential requests back by using a semaphore to limit concurrent execution from generator functions. Also scaffold the framework to move generator functions to their own file. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-03 22:54:34 -05:00
kingbri	ae69b18583	API: Use FastAPI streaming instead of sse_starlette sse_starlette kept firing a ping response if it was taking too long to set an event. Rather than using a hacky workaround, switch to FastAPI's inbuilt streaming response and construct SSE requests with a utility function. This helps the API become more robust and removes an extra requirement. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-01 01:54:35 -05:00
kingbri	6493b1d2aa	OAI: Add ability to send dummy models Some APIs require an OAI model to be sent against the models endpoint. Fix this by adding a GPT 3.5 turbo entry as first in the list to cover as many APIs as possible. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-01 00:27:28 -05:00
kingbri	aef411bed5	OAI: Fix chat completion streaming Chat completions require a finish reason to be provided in the OAI spec once the streaming is completed. This is different from a non- streaming chat completion response. Also fix some errors that were raised from the endpoint. References #15 Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-01 00:14:24 -05:00
kingbri	e703c716ee	Merge branch 'main' of https://github.com/ziadloo/tabbyAPI into ziadloo-main	2023-11-30 01:01:48 -05:00
kingbri	56f9b1d1a8	API: Add generator error handling If the generator errors, there's no proper handling to send an error packet and close the connection. This is especially important for unloading models if the load fails at any stage to reclaim a user's VRAM. Raising an exception caused the model_container object to lock and not get freed by the GC. This made sense to propegate SSE errors across all generator functions rather than relying on abort signals. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-30 00:37:48 -05:00
kingbri	2bc3da0155	YAML: Force all files to open with utf8 The default encoding method when opening files on Windows is cp1252 which doesn't support all unicode and can cause issues. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-29 22:04:29 -05:00
Mehran Ziadloo	b0c42d0f05	Leveraging local variables	2023-11-27 20:56:56 -08:00
Mehran Ziadloo	ead503c75b	Adding token usage support	2023-11-27 20:05:05 -08:00
kingbri	d929e0c826	API: Fix error points and exceptions On /v1/model/load, some internal server errors weren't being sent, so migrate directory checking out and also add a check to make sure the proposed model path exists. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-25 00:27:02 -05:00
kingbri	d47c39da54	API: Don't include draft directory in response The draft directory should be returned for a draft model request (TBD). Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-23 00:07:56 -05:00
kingbri	f47919b1d3	API: Add draft model support Models can be loaded with a child object called "draft" in the POST request. Again, models need to be located within the draft model dir to get loaded. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-19 00:32:25 -05:00
kingbri	27ebec3b35	Model: Add speculative decoding support via config Speculative decoding makes use of draft models that ingest the prompt before forwarding it to the main model. Add options in the config to support this. API options will occur in a different commit. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-18 01:42:20 -05:00
kingbri	4669e49ff0	API: Fix errors with token endpoint Handle None cases if the provided text/token lists are empty. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-17 01:39:06 -05:00
kingbri	021981fce0	API: Re-add depends endpoints Mistakenly removed API key authentication for the models endpoints in testing. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-17 00:50:42 -05:00
kingbri	ac4e9c2277	API: Add CORS support Tell CORS to go fly a kite. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-16 22:19:47 -05:00
kingbri	08a183540b	Config: Add warning on exceptions and clarify parameters Due to how YAML works, double quotes are bad. Specify a linter in the top of the config_sample file. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-16 22:19:47 -05:00
kingbri	282b5b2931	API: Fix responses and some params Responses were not being properly sent as JSON. Only run pydantic's JSON function on stream responses. FastAPI does the rest with static responses. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-16 17:11:55 -05:00
kingbri	d8d61fa19b	API: Add fallback if model isn't loaded Most endpoints require the model to be loaded, so add a depends. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-16 12:20:35 -05:00
kingbri	60eb076b43	Tree: Basic formatting and comments Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-16 11:48:40 -05:00
kingbri	5defb1b0b4	Config: Fix errors when stuff doesn't exist Add safe fallbacks if any part of the config tree doesn't exist. This prevents random internal server errors from showing up. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-16 11:41:03 -05:00
kingbri	5e8419ec0c	OAI: Add chat completions endpoint Chat completions is the endpoint that will be used by OAI in the future. Makes sense to support it even though the completions endpoint will be used more often. Also unify common parameters between the chat completion and completion requests since they're very similar. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-16 01:06:07 -05:00
kingbri	1f444c8fb7	Requirements: Add fastchat and override pydantic Use an older version of pydantic to stay compatible Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-15 01:00:08 -05:00
kingbri	d0b6b11068	OAI: Make freq and presence pen floats Also rename the completions typing file. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-15 00:55:15 -05:00
kingbri	126afdfdc2	Model: Fix gpu split params GPU split auto is a bool and GPU split is an array of integers for GBs to allocate per GPU. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-15 00:55:15 -05:00
kingbri	8fea5391a8	Api: Add token endpoints Support for encoding and decoding with various parameters. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-15 00:55:15 -05:00

1 2

59 commits