jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
Brian Dashore	1bf062559d	Merge pull request #158 from AlpinDale/embeddings feat: add embeddings support via Infinity-emb	2024-07-31 20:33:12 -04:00
kingbri	dc3dcc9c0d	Embeddings: Update config, args, and parameter names Use embeddings_device as the parameter for device to remove ambiguity. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 15:32:26 -04:00
kingbri	bfa011e0ce	Embeddings: Add model management Embedding models are managed on a separate backend, but are run in parallel with the model itself. Therefore, manage this in a separate container with separate routes. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 15:19:27 -04:00
kingbri	f13d0fb8b3	Embeddings: Add model load checks Same as the normal model container. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 11:17:36 -04:00
kingbri	fbf1455db1	Embeddings: Migrate and organize Infinity Use Infinity as a separate backend and handle the model within the common module. This separates out the embeddings model from the endpoint which allows for model loading/unloading in core. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 11:00:23 -04:00
kingbri	ac1afcc588	Embeddings: Use response classes instead of dicts Follows the existing code style. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-29 14:15:40 -04:00
kingbri	3f21d9ef96	Embeddings: Switch to Infinity Infinity-emb is an async batching engine for embeddings. This is preferable to sentence-transformers since it handles scalable usecases without the need for external thread intervention. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-29 13:42:03 -04:00
kingbri	c9a5d2c363	OAI: Refactor embeddings Move files and rewrite routes to adhere to Tabby's code style. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-28 14:10:51 -04:00
kingbri	7b8b3fe23d	Kobold: Fix max length type Was mistakenly a string instead of an integer. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 23:00:26 -04:00
kingbri	e3226ed930	Kobold: Add untracked file Model types weren't added. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 22:57:55 -04:00
kingbri	3038f668e8	Kobold: Add extra routes for horde compatability Needed to connect to horde. Also do some reordering to clean the router file up. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 22:55:54 -04:00
kingbri	2773517a16	API: Add setup function to routers This helps prepare the router before exposing it to the parent app. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 22:24:33 -04:00
Brian Dashore	6365427d38	Merge pull request #155 from Vhallo/main Simple Typo Fix	2024-07-26 21:35:50 -04:00
kingbri	884b6f5ecd	API: Add log options for initialization Make each API log their respective URLs to help inform users. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 21:32:05 -04:00
kingbri	e8fc13a1f6	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 18:33:04 -04:00
kingbri	ea80b62e30	Sampling: Reorder aliased params and add kobold aliases Also add dynatemp range which is an alternative way of calculating min and max temp. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 18:32:33 -04:00
kingbri	7522b1447b	Model: Add support for HuggingFace config and bad_words_ids This is necessary for Kobold's API. Current models use bad_words_ids in generation_config.json, but for some reason, they're also present in the model's config.json. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 18:23:22 -04:00
kingbri	545e26608f	Kobold: Move params to aliases Some of the parameters the API provides are aliases for their OAI equivalents. It makes more sense to move them to the common file. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 16:46:54 -04:00
kingbri	b7cb6f0b91	API: Add KoboldAI server Used for interacting with applications that use KoboldAI's API such as horde. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 16:37:30 -04:00
AlpinDale	5adfab1cbd	ruff: formatting	2024-07-26 02:53:14 +00:00
AlpinDale	f20cd330ef	feat: add embeddings support via sentence-transformers	2024-07-26 02:45:07 +00:00
kingbri	5c082b7e8c	Async: Add option to use Uvloop/Winloop These are faster event loops for asyncio which should improve overall performance. Gate these under an experimental flag for now to stress test these loops. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-24 18:59:20 -04:00
kingbri	71de3060bb	Downloader: Make timeout configurable Add an API parameter to set the timeout in seconds. Keep it to None by default for uninterrupted downloads. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-23 21:42:38 -04:00
Vhallo	b2064bbfb4	Typo fix in completion.py	2024-07-23 23:49:43 +02:00
Vhallo	88e4b108b4	Typo fix in chat_completion.py	2024-07-23 23:48:50 +02:00
kingbri	3e8ffebdd3	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-23 14:32:50 -04:00
kingbri	300f034233	API: Add config option to select servers Always enable the core endpoints and allow servers to be selected as needed. Use the OAI server by default. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-23 14:27:42 -04:00
kingbri	9ad69e8ab6	API: Migrate universal routes to core Place OAI specific routes in the appropriate folder. This is in preperation for adding new API servers that can be optionally enabled. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-23 14:08:48 -04:00
kingbri	64c2cc85c9	OAI: Migrate model depends into proper file Use amongst multiple routers. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-23 13:59:56 -04:00
kingbri	d1706fb067	OAI: Remove double logging if request is cancelled Uvicorn can log in both the request disconnect handler and the CancelledError. However, these sometimes don't work and both need to be checked. But, don't log twice if one works. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-22 21:48:59 -04:00
kingbri	3826815edb	API: Add request logging Log all the parts of a request if the config flag is set. The logged fields are all server side anyways, so nothing is being exposed to clients. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-22 21:40:00 -04:00
kingbri	ad4d17bca2	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-22 12:24:34 -04:00
kingbri	0eedc8ca14	API: Switch from request ID middleware to depends Middleware runs on both the request and response. Therefore, streaming responses had increased latency when processing tasks and sending data to the client which resulted in erratic streaming behavior. Use a depends to add request IDs since it only executes when the request is run rather than expecting the response to be sent as well. For the future, it would be best to think about limiting the time between each tick of chunk data to be safe. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-22 12:19:46 -04:00
kingbri	cae94b920c	API: Add ability to use request IDs Identify which request is being processed to help users disambiguate which logs correspond to which request. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-21 21:01:05 -04:00
kingbri	38185a1ff4	Auth: Fix key check coalesce Prefer the auth-specific headers before the generic authorization header. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-19 10:08:57 -04:00
kingbri	c1b61441f4	OAI: Fix usage chunk return Place the logic into their proper utility functions and cleanup the code with formatting. Also, OAI's docs specify that a [DONE] return is needed when everything is finished. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-12 14:37:20 -04:00
Volodymyr Kuznetsov	b149d3398d	OAI: support stream_options argument	2024-07-11 18:37:50 -07:00
kingbri	9fc3fc4c54	OAI: Amend comments Clarify what the user can and can't see. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-11 14:22:50 -04:00
kingbri	1f46a1130c	OAI: Restrict list permissions for API keys API keys are not allowed to view all the admin's models, templates, draft models, loras, etc. Basically anything that can be viewed on the filesystem outside of anything that's currently loaded is not allowed to be returned unless an admin key is present. This change helps preserve user privacy while not erroring out on list endpoints that the OAI spec requires. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-11 14:22:50 -04:00
kingbri	dfb4c51d5f	OAI: Fix function idioms Make functions mean the same thing to avoid confusion. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-11 14:22:50 -04:00
kingbri	b9a58ff01b	Auth: Make key permission check work on Requests Pass a request and internally unwrap the headers. In addition, allow X-admin-key to get checked in an API key request. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-11 14:22:49 -04:00
Colin Kealty	279e900ea5	Add on the fly model loading to requests	2024-07-11 10:52:10 -04:00
kingbri	5c293499bd	OAI: Reorder functions Reordering routes changes the order of appearance on documentation. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 15:27:08 -04:00
kingbri	521d21b9f2	OAI: Add return types for docs Adding return types allows for responses to get included in the autogenerated docs. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 15:23:41 -04:00
kingbri	6613e38436	Main: Make openapi export store locally This runs faster than always making a syscall to check if the env var is set. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 14:54:06 -04:00
kingbri	933268f7e2	API: Integrate OpenAPI export script Move OpenAPI export as an env var within the main function. This allows for easy export by running main. In addition, an env variable provides global and explicit state to disable conditional wheel imports (ex. Exl2 and torch) which caused errors at first. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 12:34:32 -04:00
kingbri	5e82b7eb69	API: Add standalone method to fetch OpenAPI docs Generates and stores an export of the openapi.json file for use in static websites. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-07 21:35:52 -04:00
kingbri	27d2d5f3d2	Config + Model: Allow for default fallbacks from config for model loads Previously, the parameters under the "model" block in config.yml only handled the loading of a model on startup. This meant that any subsequent API request required each parameter to be filled out or use a sane default (usually defaults to the model's config.json). However, there are cases where admins may want an argument from the config to apply if the parameter isn't provided in the request body. To help alleviate this, add a mechanism that works like sampler overrides where users can specify a flag that acts as a fallback. Therefore, this change both preserves the source of truth of what parameters the admin is loading and adds some convenience for users that want customizable defaults for their requests. This behavior may change in the future, but I think it solves the issue for now. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-06 17:50:58 -04:00
DocShotgun	156b74f3f0	Revision to paged attention checks (#133 ) * Model: Clean up paged attention checks * Model: Move cache_size checks after paged attn checks Cache size is only relevant in paged mode * Model: Fix no_flash_attention * Model: Remove no_flash_attention Ability to use flash attention is auto-detected, so this flag is unneeded. Uninstall flash attention to disable it on supported hardware.	2024-06-09 17:28:11 +02:00
DocShotgun	55d979b7a5	Update dependencies, support Python 3.12, update for exl2 0.1.5 (#134 ) * Dependencies: Add wheels for Python 3.12 * Model: Switch fp8 cache to Q8 cache * Model: Add ability to set draft model cache mode * Dependencies: Bump exllamav2 to 0.1.5 * Model: Support Q6 cache * Config: Add Q6 cache and draft_cache_mode to config sample	2024-06-09 17:27:39 +02:00

1 2 3 4

200 commits