jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
kingbri	c9e43e51aa	API: Add route for draft model list Does the same thing as model list except with draft models. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-19 23:45:53 -05:00
kingbri	f631dd6ff7	Templates: Switch to Jinja2 Jinja2 is a lightweight template parser that's used in Transformers for parsing chat completions. It's much more efficient than Fastchat and can be imported as part of requirements. Also allows for unblocking Pydantic's version. Users now have to provide their own template if needed. A separate repo may be usable for common prompt template storage. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	db87efde4a	OAI: Add ability to specify fastchat prompt template Sometimes fastchat may not be able to detect the prompt template from the model path. Therefore, add the ability to set it in config.yml or via the request object itself. Also send the provided prompt template on model info request. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-10 15:43:58 -05:00
kingbri	5ae2a91c04	Tree: Use unwrap and coalesce for optional handling Python doesn't have proper handling of optionals. The only way to handle them is checking via an if statement if the value is None or by using the "or" keyword to unwrap optionals. Previously, I used the "or" method to unwrap, but this caused issues due to falsy values falling back to the default. This is especially the case with booleans were "False" changed to "True". Instead, add two new functions: unwrap and coalesce. Both function to properly implement a functional way of "None" coalescing. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-09 21:52:17 -05:00
DocShotgun	7380a3b79a	Implement lora support (#24 ) * Model: Implement basic lora support * Add ability to load loras from config on launch * Supports loading multiple loras and lora scaling * Add function to unload loras * Colab: Update for basic lora support * Model: Test vram alloc after lora load, add docs * Git: Add loras folder to .gitignore * API: Add basic lora-related endpoints * Add /loras/ endpoint for querying available loras * Add /model/lora endpoint for querying currently loaded loras * Add /model/lora/load endpoint for loading loras * Add /model/lora/unload endpoint for unloading loras * Move lora config-checking logic to main.py for better compat with API endpoints * Revert bad CRLF line ending changes * API: Add basic lora-related endpoints (fixed) * Add /loras/ endpoint for querying available loras * Add /model/lora endpoint for querying currently loaded loras * Add /model/lora/load endpoint for loading loras * Add /model/lora/unload endpoint for unloading loras * Move lora config-checking logic to main.py for better compat with API endpoints * Model: Unload loras first when unloading model * API + Models: Cleanup lora endpoints and functions Condenses down endpoint and model load code. Also makes the routes behave the same way as model routes to help not confuse the end user. Signed-off-by: kingbri <bdashore3@proton.me> * Loras: Optimize load endpoint Return successes and failures along with consolidating the request to the rewritten load_loras function. Signed-off-by: kingbri <bdashore3@proton.me> --------- Co-authored-by: kingbri <bdashore3@proton.me> Co-authored-by: DocShotgun <126566557+DocShotgun@users.noreply.github.com>	2023-12-08 23:38:08 -05:00
kingbri	61f6e51fdb	OAI: Add separator style fallback Some models may return None for separator style with FastChat. Fall back to LLAMA2 if this is the case. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-01 23:30:19 -05:00
kingbri	aef411bed5	OAI: Fix chat completion streaming Chat completions require a finish reason to be provided in the OAI spec once the streaming is completed. This is different from a non- streaming chat completion response. Also fix some errors that were raised from the endpoint. References #15 Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-01 00:14:24 -05:00
Mehran Ziadloo	ead503c75b	Adding token usage support	2023-11-27 20:05:05 -08:00
kingbri	d47c39da54	API: Don't include draft directory in response The draft directory should be returned for a draft model request (TBD). Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-23 00:07:56 -05:00
kingbri	2248705c4a	Requirements: Don't force fastchat installation Fastchat requires a lot of dependencies such as transformers, peft, and accelerate which are heavy. This is not useful unless a user wants to add a shim for the chat completion endpoint. Instead, try importing fastchat and notify the console of the error. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-16 01:26:46 -05:00
kingbri	5e8419ec0c	OAI: Add chat completions endpoint Chat completions is the endpoint that will be used by OAI in the future. Makes sense to support it even though the completions endpoint will be used more often. Also unify common parameters between the chat completion and completion requests since they're very similar. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-16 01:06:07 -05:00
kingbri	d0b6b11068	OAI: Make freq and presence pen floats Also rename the completions typing file. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-15 00:55:15 -05:00
kingbri	4670a77c26	API: Don't use response_class This arg in routes caused many errors and isn't even needed for responses. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-14 22:09:26 -05:00
kingbri	b625bface9	OAI: Add API-based model loading/unloading and auth routes Models can be loaded and unloaded via the API. Also add authentication to use the API and for administrator tasks. Both types of authorization use different keys. Also fix the unload function to properly free all used vram. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-14 01:17:19 -05:00
kingbri	47343e2f1a	OAI: Add models support The models endpoint fetches all the models that OAI has to offer. However, since this is an OAI clone, just list the models inside the user's configured model directory instead. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-13 21:38:34 -05:00
kingbri	eee8b642bd	OAI: Implement completion API endpoint Add support for /v1/completions with the option to use streaming if needed. Also rewrite API endpoints to use async when possible since that improves request performance. Model container parameter names also needed rewrites as well and set fallback cases to their disabled values. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-13 18:31:26 -05:00

16 commits