jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
kingbri	8703b23f89	Start: Make linux scripts executable Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 15:19:31 -04:00
kingbri	b795bfc7b2	Start: Split some prints up Newlines can be helpful at times. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 15:14:40 -04:00
kingbri	65e758e134	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 15:08:24 -04:00
kingbri	7ce46cc2da	Start: Rewrite start scripts Start scripts now don't update dependencies by default due to mishandling caches from pip. Also add dedicated update scripts and save options to a JSON file instead of a text one. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 13:03:24 -04:00
kingbri	e66d213aef	Revert "Dependencies: Use hosted pip index instead of Github" This reverts commit `f111052e39`. This was a bad idea since the netlify server has limited bandwidth. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 11:35:26 -04:00
kingbri	7bf2b07d4c	Signals: Exit on async cleanup The async signal exit function should be the internal for exiting the program. In addition, prevent the handler from being called twice by adding a boolean. May become an asyncio event later on. In addition, make sure to skip_wait when running model.unload. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-02 15:11:57 -04:00
kingbri	b124797949	Dependencies: Re-add sentence-transformers This is actually required for infinity to load a model. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-02 14:35:58 -04:00
kingbri	56619810bf	Dependencies: Switch sentence-transformers to infinity-emb Leftover before the transition. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-02 13:34:47 -04:00
kingbri	3e42211c3e	Config: Embeddings: Make embeddings_device a default when API loading When loading from the API, the fallback for embeddings_device will be the same as the config. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-01 13:59:49 -04:00
kingbri	54aeebaec1	API: Fix return of current embeddings model Return a ModelCard instead of a ModelList. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-01 13:43:31 -04:00
kingbri	0bcb4e4a7d	Model: Attach request ID to logs If multiple logs come in at once, track which log corresponds to which request. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-01 00:25:54 -04:00
kingbri	9390d362dd	Model: Log generation params and metrics after the prompt/response A user's prompt and response can be large in the console. Therefore, always log the smaller payloads (ex. gen params + metrics) after the large chunks. However, it's recommended to keep prompt logging off anyways since it'll result in console spam. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-01 00:19:21 -04:00
Brian Dashore	1bf062559d	Merge pull request #158 from AlpinDale/embeddings feat: add embeddings support via Infinity-emb	2024-07-31 20:33:12 -04:00
kingbri	f111052e39	Dependencies: Use hosted pip index instead of Github Installing directly from github causes pip's HTTP cache to not recognize that the correct version of a package is already installed. This causes a redownload. When using the Start.bat script, it updates dependencies automatically to keep users on the latest versions of a package for security reasons. A simple pip cache website helps alleviate this problem and allows pip to find the cached wheels when invoked with an upgrade argument. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 20:46:37 -04:00
kingbri	46304ce875	Model: Properly pass in max_batch_size from config The override wasn't being passed in before. Also, the default is now none since Exl2 can automatically calculate the max batch size. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 18:42:25 -04:00
kingbri	dc3dcc9c0d	Embeddings: Update config, args, and parameter names Use embeddings_device as the parameter for device to remove ambiguity. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 15:32:26 -04:00
kingbri	bfa011e0ce	Embeddings: Add model management Embedding models are managed on a separate backend, but are run in parallel with the model itself. Therefore, manage this in a separate container with separate routes. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 15:19:27 -04:00
kingbri	f13d0fb8b3	Embeddings: Add model load checks Same as the normal model container. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 11:17:36 -04:00
kingbri	01c7702859	Signal: Fix async signal handling Run unload async functions before exiting the program. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 11:11:05 -04:00
kingbri	fbf1455db1	Embeddings: Migrate and organize Infinity Use Infinity as a separate backend and handle the model within the common module. This separates out the embeddings model from the endpoint which allows for model loading/unloading in core. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 11:00:23 -04:00
kingbri	ac1afcc588	Embeddings: Use response classes instead of dicts Follows the existing code style. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-29 14:15:40 -04:00
kingbri	3f21d9ef96	Embeddings: Switch to Infinity Infinity-emb is an async batching engine for embeddings. This is preferable to sentence-transformers since it handles scalable usecases without the need for external thread intervention. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-29 13:42:03 -04:00
kingbri	c9a5d2c363	OAI: Refactor embeddings Move files and rewrite routes to adhere to Tabby's code style. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-28 14:10:51 -04:00
kingbri	d85414738d	Dependencies: Update Flash Attention 2 v2.6.3 with torch 2.3 wheels. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-28 13:50:15 -04:00
kingbri	c79e0832d5	Revert "Dependencies: Update pytorch and flash_attention" This reverts commit `f47d96790c`. See https://github.com/pytorch/pytorch/issues/131662 for more information. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-28 13:49:04 -04:00
kingbri	7b8b3fe23d	Kobold: Fix max length type Was mistakenly a string instead of an integer. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 23:00:26 -04:00
kingbri	e3226ed930	Kobold: Add untracked file Model types weren't added. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 22:57:55 -04:00
kingbri	3038f668e8	Kobold: Add extra routes for horde compatability Needed to connect to horde. Also do some reordering to clean the router file up. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 22:55:54 -04:00
kingbri	2773517a16	API: Add setup function to routers This helps prepare the router before exposing it to the parent app. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 22:24:33 -04:00
Brian Dashore	6365427d38	Merge pull request #155 from Vhallo/main Simple Typo Fix	2024-07-26 21:35:50 -04:00
kingbri	884b6f5ecd	API: Add log options for initialization Make each API log their respective URLs to help inform users. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 21:32:05 -04:00
kingbri	e8fc13a1f6	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 18:33:04 -04:00
kingbri	ea80b62e30	Sampling: Reorder aliased params and add kobold aliases Also add dynatemp range which is an alternative way of calculating min and max temp. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 18:32:33 -04:00
kingbri	7522b1447b	Model: Add support for HuggingFace config and bad_words_ids This is necessary for Kobold's API. Current models use bad_words_ids in generation_config.json, but for some reason, they're also present in the model's config.json. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 18:23:22 -04:00
kingbri	545e26608f	Kobold: Move params to aliases Some of the parameters the API provides are aliases for their OAI equivalents. It makes more sense to move them to the common file. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 16:46:54 -04:00
kingbri	b7cb6f0b91	API: Add KoboldAI server Used for interacting with applications that use KoboldAI's API such as horde. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 16:37:30 -04:00
kingbri	4e808cbed7	Auth: Fix disable auth when checking for key permissions Since authentication is disabled, remove the limited permissions for requests. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 15:04:29 -04:00
kingbri	f47d96790c	Dependencies: Update pytorch and flash_attention v2.4.0 and v2.6.3 Also use torch 2.4 wheels. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-25 23:39:52 -04:00
AlpinDale	5adfab1cbd	ruff: formatting	2024-07-26 02:53:14 +00:00
AlpinDale	765d3593b3	remove submodule	2024-07-26 02:52:18 +00:00
AlpinDale	f20cd330ef	feat: add embeddings support via sentence-transformers	2024-07-26 02:45:07 +00:00
kingbri	a1c3f6cc1c	Dependencies: Update ExllamaV2 v0.1.8 Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-24 22:00:43 -04:00
kingbri	27f9559d83	Dependencies: Switch to fastapi-slim Reduces dependency size since the full fastapi package isn't required. Add httptools since it makes requests faster and it was installed with fastapi previously. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-24 21:59:56 -04:00
kingbri	42bc4adcfb	Config: Add option to set priority to realtime Realtime process priority assigns resources to point to tabby's processes. Running as administrator will give realtime priority while running as a normal user will set as high priority. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-24 21:50:06 -04:00
kingbri	5c082b7e8c	Async: Add option to use Uvloop/Winloop These are faster event loops for asyncio which should improve overall performance. Gate these under an experimental flag for now to stress test these loops. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-24 18:59:20 -04:00
kingbri	71de3060bb	Downloader: Make timeout configurable Add an API parameter to set the timeout in seconds. Keep it to None by default for uninterrupted downloads. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-23 21:42:38 -04:00
kingbri	8c02fe9771	Downloader: Disable timeout This prevents TimeoutErrors from showing up. However, a longer timeout may be necessary since this is in the API. Turning it off for now will help resolve immediate errors. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-23 21:38:46 -04:00
Vhallo	b2064bbfb4	Typo fix in completion.py	2024-07-23 23:49:43 +02:00
Vhallo	88e4b108b4	Typo fix in chat_completion.py	2024-07-23 23:48:50 +02:00
kingbri	3e8ffebdd3	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-23 14:32:50 -04:00

1 2 3 4 5 ...

622 commits