jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
kingbri	0bcb4e4a7d	Model: Attach request ID to logs If multiple logs come in at once, track which log corresponds to which request. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-01 00:25:54 -04:00
kingbri	9390d362dd	Model: Log generation params and metrics after the prompt/response A user's prompt and response can be large in the console. Therefore, always log the smaller payloads (ex. gen params + metrics) after the large chunks. However, it's recommended to keep prompt logging off anyways since it'll result in console spam. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-01 00:19:21 -04:00
Brian Dashore	1bf062559d	Merge pull request #158 from AlpinDale/embeddings feat: add embeddings support via Infinity-emb	2024-07-31 20:33:12 -04:00
kingbri	f111052e39	Dependencies: Use hosted pip index instead of Github Installing directly from github causes pip's HTTP cache to not recognize that the correct version of a package is already installed. This causes a redownload. When using the Start.bat script, it updates dependencies automatically to keep users on the latest versions of a package for security reasons. A simple pip cache website helps alleviate this problem and allows pip to find the cached wheels when invoked with an upgrade argument. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 20:46:37 -04:00
kingbri	46304ce875	Model: Properly pass in max_batch_size from config The override wasn't being passed in before. Also, the default is now none since Exl2 can automatically calculate the max batch size. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 18:42:25 -04:00
kingbri	dc3dcc9c0d	Embeddings: Update config, args, and parameter names Use embeddings_device as the parameter for device to remove ambiguity. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 15:32:26 -04:00
kingbri	bfa011e0ce	Embeddings: Add model management Embedding models are managed on a separate backend, but are run in parallel with the model itself. Therefore, manage this in a separate container with separate routes. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 15:19:27 -04:00
kingbri	f13d0fb8b3	Embeddings: Add model load checks Same as the normal model container. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 11:17:36 -04:00
kingbri	01c7702859	Signal: Fix async signal handling Run unload async functions before exiting the program. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 11:11:05 -04:00
kingbri	fbf1455db1	Embeddings: Migrate and organize Infinity Use Infinity as a separate backend and handle the model within the common module. This separates out the embeddings model from the endpoint which allows for model loading/unloading in core. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 11:00:23 -04:00
kingbri	ac1afcc588	Embeddings: Use response classes instead of dicts Follows the existing code style. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-29 14:15:40 -04:00
kingbri	3f21d9ef96	Embeddings: Switch to Infinity Infinity-emb is an async batching engine for embeddings. This is preferable to sentence-transformers since it handles scalable usecases without the need for external thread intervention. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-29 13:42:03 -04:00
kingbri	c9a5d2c363	OAI: Refactor embeddings Move files and rewrite routes to adhere to Tabby's code style. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-28 14:10:51 -04:00
kingbri	d85414738d	Dependencies: Update Flash Attention 2 v2.6.3 with torch 2.3 wheels. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-28 13:50:15 -04:00
kingbri	c79e0832d5	Revert "Dependencies: Update pytorch and flash_attention" This reverts commit `f47d96790c`. See https://github.com/pytorch/pytorch/issues/131662 for more information. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-28 13:49:04 -04:00
kingbri	7b8b3fe23d	Kobold: Fix max length type Was mistakenly a string instead of an integer. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 23:00:26 -04:00
kingbri	e3226ed930	Kobold: Add untracked file Model types weren't added. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 22:57:55 -04:00
kingbri	3038f668e8	Kobold: Add extra routes for horde compatability Needed to connect to horde. Also do some reordering to clean the router file up. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 22:55:54 -04:00
kingbri	2773517a16	API: Add setup function to routers This helps prepare the router before exposing it to the parent app. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 22:24:33 -04:00
Brian Dashore	6365427d38	Merge pull request #155 from Vhallo/main Simple Typo Fix	2024-07-26 21:35:50 -04:00
kingbri	884b6f5ecd	API: Add log options for initialization Make each API log their respective URLs to help inform users. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 21:32:05 -04:00
kingbri	e8fc13a1f6	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 18:33:04 -04:00
kingbri	ea80b62e30	Sampling: Reorder aliased params and add kobold aliases Also add dynatemp range which is an alternative way of calculating min and max temp. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 18:32:33 -04:00
kingbri	7522b1447b	Model: Add support for HuggingFace config and bad_words_ids This is necessary for Kobold's API. Current models use bad_words_ids in generation_config.json, but for some reason, they're also present in the model's config.json. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 18:23:22 -04:00
kingbri	545e26608f	Kobold: Move params to aliases Some of the parameters the API provides are aliases for their OAI equivalents. It makes more sense to move them to the common file. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 16:46:54 -04:00
kingbri	b7cb6f0b91	API: Add KoboldAI server Used for interacting with applications that use KoboldAI's API such as horde. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 16:37:30 -04:00
kingbri	4e808cbed7	Auth: Fix disable auth when checking for key permissions Since authentication is disabled, remove the limited permissions for requests. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 15:04:29 -04:00
kingbri	f47d96790c	Dependencies: Update pytorch and flash_attention v2.4.0 and v2.6.3 Also use torch 2.4 wheels. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-25 23:39:52 -04:00
AlpinDale	5adfab1cbd	ruff: formatting	2024-07-26 02:53:14 +00:00
AlpinDale	765d3593b3	remove submodule	2024-07-26 02:52:18 +00:00
AlpinDale	f20cd330ef	feat: add embeddings support via sentence-transformers	2024-07-26 02:45:07 +00:00
kingbri	a1c3f6cc1c	Dependencies: Update ExllamaV2 v0.1.8 Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-24 22:00:43 -04:00
kingbri	27f9559d83	Dependencies: Switch to fastapi-slim Reduces dependency size since the full fastapi package isn't required. Add httptools since it makes requests faster and it was installed with fastapi previously. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-24 21:59:56 -04:00
kingbri	42bc4adcfb	Config: Add option to set priority to realtime Realtime process priority assigns resources to point to tabby's processes. Running as administrator will give realtime priority while running as a normal user will set as high priority. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-24 21:50:06 -04:00
kingbri	5c082b7e8c	Async: Add option to use Uvloop/Winloop These are faster event loops for asyncio which should improve overall performance. Gate these under an experimental flag for now to stress test these loops. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-24 18:59:20 -04:00
kingbri	71de3060bb	Downloader: Make timeout configurable Add an API parameter to set the timeout in seconds. Keep it to None by default for uninterrupted downloads. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-23 21:42:38 -04:00
kingbri	8c02fe9771	Downloader: Disable timeout This prevents TimeoutErrors from showing up. However, a longer timeout may be necessary since this is in the API. Turning it off for now will help resolve immediate errors. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-23 21:38:46 -04:00
Vhallo	b2064bbfb4	Typo fix in completion.py	2024-07-23 23:49:43 +02:00
Vhallo	88e4b108b4	Typo fix in chat_completion.py	2024-07-23 23:48:50 +02:00
kingbri	3e8ffebdd3	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-23 14:32:50 -04:00
kingbri	300f034233	API: Add config option to select servers Always enable the core endpoints and allow servers to be selected as needed. Use the OAI server by default. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-23 14:27:42 -04:00
kingbri	9ad69e8ab6	API: Migrate universal routes to core Place OAI specific routes in the appropriate folder. This is in preperation for adding new API servers that can be optionally enabled. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-23 14:08:48 -04:00
kingbri	64c2cc85c9	OAI: Migrate model depends into proper file Use amongst multiple routers. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-23 13:59:56 -04:00
kingbri	d1706fb067	OAI: Remove double logging if request is cancelled Uvicorn can log in both the request disconnect handler and the CancelledError. However, these sometimes don't work and both need to be checked. But, don't log twice if one works. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-22 21:48:59 -04:00
kingbri	14dfaf600a	Args: Add request logging Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-22 21:41:42 -04:00
kingbri	3826815edb	API: Add request logging Log all the parts of a request if the config flag is set. The logged fields are all server side anyways, so nothing is being exposed to clients. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-22 21:40:00 -04:00
kingbri	522999ebb4	Config: Change from gen_logging to logging More accurately reflects the config.yml's sections. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-22 21:15:16 -04:00
kingbri	191600a150	Revert "Model: Skip empty token chunks" This reverts commit `21516bd7b5`. This skips EOS and implementing it the proper way seems more costly than necessary. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-22 18:34:00 -04:00
kingbri	15f891b277	Args: Update to latest config.yml Fix order of params to follow the same flow as config.yml Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-22 16:26:41 -04:00
kingbri	ad4d17bca2	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-22 12:24:34 -04:00

1 2 3 4 5 ...

612 commits