jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
kingbri	2a33ebbf29	Model: Bypass lock checks when shutting down Previously, when a SIGINT was emitted and a model load is running, the API didn't shut down until the load finished due to waitng for the lock. However, when shutting down, the lock doesn't matter since the process is being killed anyway. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 16:05:34 -04:00
Brian Dashore	65c16f2a7c	Merge pull request #161 from theroyallab/new-start-scripts Fix pip index bandwidth costs and update start scripts	2024-08-03 15:21:02 -04:00
kingbri	8703b23f89	Start: Make linux scripts executable Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 15:19:31 -04:00
kingbri	b795bfc7b2	Start: Split some prints up Newlines can be helpful at times. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 15:14:40 -04:00
kingbri	65e758e134	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 15:08:24 -04:00
kingbri	7ce46cc2da	Start: Rewrite start scripts Start scripts now don't update dependencies by default due to mishandling caches from pip. Also add dedicated update scripts and save options to a JSON file instead of a text one. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 13:03:24 -04:00
kingbri	e66d213aef	Revert "Dependencies: Use hosted pip index instead of Github" This reverts commit `f111052e39`. This was a bad idea since the netlify server has limited bandwidth. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 11:35:26 -04:00
kingbri	7bf2b07d4c	Signals: Exit on async cleanup The async signal exit function should be the internal for exiting the program. In addition, prevent the handler from being called twice by adding a boolean. May become an asyncio event later on. In addition, make sure to skip_wait when running model.unload. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-02 15:11:57 -04:00
kingbri	b124797949	Dependencies: Re-add sentence-transformers This is actually required for infinity to load a model. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-02 14:35:58 -04:00
kingbri	56619810bf	Dependencies: Switch sentence-transformers to infinity-emb Leftover before the transition. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-02 13:34:47 -04:00
kingbri	3e42211c3e	Config: Embeddings: Make embeddings_device a default when API loading When loading from the API, the fallback for embeddings_device will be the same as the config. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-01 13:59:49 -04:00
kingbri	54aeebaec1	API: Fix return of current embeddings model Return a ModelCard instead of a ModelList. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-01 13:43:31 -04:00
kingbri	0bcb4e4a7d	Model: Attach request ID to logs If multiple logs come in at once, track which log corresponds to which request. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-01 00:25:54 -04:00
kingbri	9390d362dd	Model: Log generation params and metrics after the prompt/response A user's prompt and response can be large in the console. Therefore, always log the smaller payloads (ex. gen params + metrics) after the large chunks. However, it's recommended to keep prompt logging off anyways since it'll result in console spam. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-01 00:19:21 -04:00
Brian Dashore	1bf062559d	Merge pull request #158 from AlpinDale/embeddings feat: add embeddings support via Infinity-emb	2024-07-31 20:33:12 -04:00
kingbri	f111052e39	Dependencies: Use hosted pip index instead of Github Installing directly from github causes pip's HTTP cache to not recognize that the correct version of a package is already installed. This causes a redownload. When using the Start.bat script, it updates dependencies automatically to keep users on the latest versions of a package for security reasons. A simple pip cache website helps alleviate this problem and allows pip to find the cached wheels when invoked with an upgrade argument. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 20:46:37 -04:00
kingbri	46304ce875	Model: Properly pass in max_batch_size from config The override wasn't being passed in before. Also, the default is now none since Exl2 can automatically calculate the max batch size. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 18:42:25 -04:00
kingbri	dc3dcc9c0d	Embeddings: Update config, args, and parameter names Use embeddings_device as the parameter for device to remove ambiguity. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 15:32:26 -04:00
kingbri	bfa011e0ce	Embeddings: Add model management Embedding models are managed on a separate backend, but are run in parallel with the model itself. Therefore, manage this in a separate container with separate routes. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 15:19:27 -04:00
kingbri	f13d0fb8b3	Embeddings: Add model load checks Same as the normal model container. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 11:17:36 -04:00
kingbri	01c7702859	Signal: Fix async signal handling Run unload async functions before exiting the program. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 11:11:05 -04:00
kingbri	fbf1455db1	Embeddings: Migrate and organize Infinity Use Infinity as a separate backend and handle the model within the common module. This separates out the embeddings model from the endpoint which allows for model loading/unloading in core. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 11:00:23 -04:00
kingbri	ac1afcc588	Embeddings: Use response classes instead of dicts Follows the existing code style. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-29 14:15:40 -04:00
kingbri	3f21d9ef96	Embeddings: Switch to Infinity Infinity-emb is an async batching engine for embeddings. This is preferable to sentence-transformers since it handles scalable usecases without the need for external thread intervention. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-29 13:42:03 -04:00
kingbri	c9a5d2c363	OAI: Refactor embeddings Move files and rewrite routes to adhere to Tabby's code style. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-28 14:10:51 -04:00
kingbri	d85414738d	Dependencies: Update Flash Attention 2 v2.6.3 with torch 2.3 wheels. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-28 13:50:15 -04:00
kingbri	c79e0832d5	Revert "Dependencies: Update pytorch and flash_attention" This reverts commit `f47d96790c`. See https://github.com/pytorch/pytorch/issues/131662 for more information. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-28 13:49:04 -04:00
kingbri	7b8b3fe23d	Kobold: Fix max length type Was mistakenly a string instead of an integer. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 23:00:26 -04:00
kingbri	e3226ed930	Kobold: Add untracked file Model types weren't added. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 22:57:55 -04:00
kingbri	3038f668e8	Kobold: Add extra routes for horde compatability Needed to connect to horde. Also do some reordering to clean the router file up. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 22:55:54 -04:00
kingbri	2773517a16	API: Add setup function to routers This helps prepare the router before exposing it to the parent app. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 22:24:33 -04:00
Brian Dashore	6365427d38	Merge pull request #155 from Vhallo/main Simple Typo Fix	2024-07-26 21:35:50 -04:00
kingbri	884b6f5ecd	API: Add log options for initialization Make each API log their respective URLs to help inform users. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 21:32:05 -04:00
kingbri	e8fc13a1f6	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 18:33:04 -04:00
kingbri	ea80b62e30	Sampling: Reorder aliased params and add kobold aliases Also add dynatemp range which is an alternative way of calculating min and max temp. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 18:32:33 -04:00
kingbri	7522b1447b	Model: Add support for HuggingFace config and bad_words_ids This is necessary for Kobold's API. Current models use bad_words_ids in generation_config.json, but for some reason, they're also present in the model's config.json. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 18:23:22 -04:00
kingbri	545e26608f	Kobold: Move params to aliases Some of the parameters the API provides are aliases for their OAI equivalents. It makes more sense to move them to the common file. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 16:46:54 -04:00
kingbri	b7cb6f0b91	API: Add KoboldAI server Used for interacting with applications that use KoboldAI's API such as horde. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 16:37:30 -04:00
kingbri	4e808cbed7	Auth: Fix disable auth when checking for key permissions Since authentication is disabled, remove the limited permissions for requests. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 15:04:29 -04:00
kingbri	f47d96790c	Dependencies: Update pytorch and flash_attention v2.4.0 and v2.6.3 Also use torch 2.4 wheels. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-25 23:39:52 -04:00
AlpinDale	5adfab1cbd	ruff: formatting	2024-07-26 02:53:14 +00:00
AlpinDale	765d3593b3	remove submodule	2024-07-26 02:52:18 +00:00
AlpinDale	f20cd330ef	feat: add embeddings support via sentence-transformers	2024-07-26 02:45:07 +00:00
kingbri	a1c3f6cc1c	Dependencies: Update ExllamaV2 v0.1.8 Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-24 22:00:43 -04:00
kingbri	27f9559d83	Dependencies: Switch to fastapi-slim Reduces dependency size since the full fastapi package isn't required. Add httptools since it makes requests faster and it was installed with fastapi previously. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-24 21:59:56 -04:00
kingbri	42bc4adcfb	Config: Add option to set priority to realtime Realtime process priority assigns resources to point to tabby's processes. Running as administrator will give realtime priority while running as a normal user will set as high priority. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-24 21:50:06 -04:00
kingbri	5c082b7e8c	Async: Add option to use Uvloop/Winloop These are faster event loops for asyncio which should improve overall performance. Gate these under an experimental flag for now to stress test these loops. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-24 18:59:20 -04:00
kingbri	71de3060bb	Downloader: Make timeout configurable Add an API parameter to set the timeout in seconds. Keep it to None by default for uninterrupted downloads. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-23 21:42:38 -04:00
kingbri	8c02fe9771	Downloader: Disable timeout This prevents TimeoutErrors from showing up. However, a longer timeout may be necessary since this is in the API. Turning it off for now will help resolve immediate errors. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-23 21:38:46 -04:00
Vhallo	b2064bbfb4	Typo fix in completion.py	2024-07-23 23:49:43 +02:00

1 2 3 4 5 ...

624 commits