Commit graph

624 commits

Author SHA1 Message Date
kingbri
2a33ebbf29 Model: Bypass lock checks when shutting down
Previously, when a SIGINT was emitted and a model load is running,
the API didn't shut down until the load finished due to waitng for
the lock. However, when shutting down, the lock doesn't matter since
the process is being killed anyway.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-03 16:05:34 -04:00
Brian Dashore
65c16f2a7c
Merge pull request #161 from theroyallab/new-start-scripts
Fix pip index bandwidth costs and update start scripts
2024-08-03 15:21:02 -04:00
kingbri
8703b23f89 Start: Make linux scripts executable
Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-03 15:19:31 -04:00
kingbri
b795bfc7b2 Start: Split some prints up
Newlines can be helpful at times.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-03 15:14:40 -04:00
kingbri
65e758e134 Tree: Format
Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-03 15:08:24 -04:00
kingbri
7ce46cc2da Start: Rewrite start scripts
Start scripts now don't update dependencies by default due to mishandling
caches from pip. Also add dedicated update scripts and save options
to a JSON file instead of a text one.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-03 13:03:24 -04:00
kingbri
e66d213aef Revert "Dependencies: Use hosted pip index instead of Github"
This reverts commit f111052e39.

This was a bad idea since the netlify server has limited bandwidth.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-03 11:35:26 -04:00
kingbri
7bf2b07d4c Signals: Exit on async cleanup
The async signal exit function should be the internal for exiting
the program. In addition, prevent the handler from being called
twice by adding a boolean. May become an asyncio event later on.

In addition, make sure to skip_wait when running model.unload.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-02 15:11:57 -04:00
kingbri
b124797949 Dependencies: Re-add sentence-transformers
This is actually required for infinity to load a model.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-02 14:35:58 -04:00
kingbri
56619810bf Dependencies: Switch sentence-transformers to infinity-emb
Leftover before the transition.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-02 13:34:47 -04:00
kingbri
3e42211c3e Config: Embeddings: Make embeddings_device a default when API loading
When loading from the API, the fallback for embeddings_device will be
the same as the config.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-01 13:59:49 -04:00
kingbri
54aeebaec1 API: Fix return of current embeddings model
Return a ModelCard instead of a ModelList.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-01 13:43:31 -04:00
kingbri
0bcb4e4a7d Model: Attach request ID to logs
If multiple logs come in at once, track which log corresponds to
which request.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-01 00:25:54 -04:00
kingbri
9390d362dd Model: Log generation params and metrics after the prompt/response
A user's prompt and response can be large in the console. Therefore,
always log the smaller payloads (ex. gen params + metrics) after
the large chunks.

However, it's recommended to keep prompt logging off anyways since
it'll result in console spam.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-01 00:19:21 -04:00
Brian Dashore
1bf062559d
Merge pull request #158 from AlpinDale/embeddings
feat: add embeddings support via Infinity-emb
2024-07-31 20:33:12 -04:00
kingbri
f111052e39 Dependencies: Use hosted pip index instead of Github
Installing directly from github causes pip's HTTP cache to not
recognize that the correct version of a package is already installed.
This causes a redownload.

When using the Start.bat script, it updates dependencies automatically
to keep users on the latest versions of a package for security reasons.

A simple pip cache website helps alleviate this problem and allows pip
to find the cached wheels when invoked with an upgrade argument.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-30 20:46:37 -04:00
kingbri
46304ce875 Model: Properly pass in max_batch_size from config
The override wasn't being passed in before. Also, the default is now
none since Exl2 can automatically calculate the max batch size.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-30 18:42:25 -04:00
kingbri
dc3dcc9c0d Embeddings: Update config, args, and parameter names
Use embeddings_device as the parameter for device to remove ambiguity.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-30 15:32:26 -04:00
kingbri
bfa011e0ce Embeddings: Add model management
Embedding models are managed on a separate backend, but are run
in parallel with the model itself. Therefore, manage this in a separate
container with separate routes.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-30 15:19:27 -04:00
kingbri
f13d0fb8b3 Embeddings: Add model load checks
Same as the normal model container.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-30 11:17:36 -04:00
kingbri
01c7702859 Signal: Fix async signal handling
Run unload async functions before exiting the program.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-30 11:11:05 -04:00
kingbri
fbf1455db1 Embeddings: Migrate and organize Infinity
Use Infinity as a separate backend and handle the model within the
common module. This separates out the embeddings model from the endpoint
which allows for model loading/unloading in core.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-30 11:00:23 -04:00
kingbri
ac1afcc588 Embeddings: Use response classes instead of dicts
Follows the existing code style.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-29 14:15:40 -04:00
kingbri
3f21d9ef96 Embeddings: Switch to Infinity
Infinity-emb is an async batching engine for embeddings. This is
preferable to sentence-transformers since it handles scalable usecases
without the need for external thread intervention.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-29 13:42:03 -04:00
kingbri
c9a5d2c363 OAI: Refactor embeddings
Move files and rewrite routes to adhere to Tabby's code style.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-28 14:10:51 -04:00
kingbri
d85414738d Dependencies: Update Flash Attention 2
v2.6.3 with torch 2.3 wheels.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-28 13:50:15 -04:00
kingbri
c79e0832d5 Revert "Dependencies: Update pytorch and flash_attention"
This reverts commit f47d96790c.

See https://github.com/pytorch/pytorch/issues/131662 for more information.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-28 13:49:04 -04:00
kingbri
7b8b3fe23d Kobold: Fix max length type
Was mistakenly a string instead of an integer.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 23:00:26 -04:00
kingbri
e3226ed930 Kobold: Add untracked file
Model types weren't added.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 22:57:55 -04:00
kingbri
3038f668e8 Kobold: Add extra routes for horde compatability
Needed to connect to horde. Also do some reordering to clean the
router file up.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 22:55:54 -04:00
kingbri
2773517a16 API: Add setup function to routers
This helps prepare the router before exposing it to the parent app.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 22:24:33 -04:00
Brian Dashore
6365427d38
Merge pull request #155 from Vhallo/main
Simple Typo Fix
2024-07-26 21:35:50 -04:00
kingbri
884b6f5ecd API: Add log options for initialization
Make each API log their respective URLs to help inform users.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 21:32:05 -04:00
kingbri
e8fc13a1f6 Tree: Format
Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 18:33:04 -04:00
kingbri
ea80b62e30 Sampling: Reorder aliased params and add kobold aliases
Also add dynatemp range which is an alternative way of calculating
min and max temp.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 18:32:33 -04:00
kingbri
7522b1447b Model: Add support for HuggingFace config and bad_words_ids
This is necessary for Kobold's API. Current models use bad_words_ids
in generation_config.json, but for some reason, they're also present
in the model's config.json.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 18:23:22 -04:00
kingbri
545e26608f Kobold: Move params to aliases
Some of the parameters the API provides are aliases for their OAI
equivalents. It makes more sense to move them to the common file.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 16:46:54 -04:00
kingbri
b7cb6f0b91 API: Add KoboldAI server
Used for interacting with applications that use KoboldAI's API
such as horde.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 16:37:30 -04:00
kingbri
4e808cbed7 Auth: Fix disable auth when checking for key permissions
Since authentication is disabled, remove the limited permissions
for requests.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 15:04:29 -04:00
kingbri
f47d96790c Dependencies: Update pytorch and flash_attention
v2.4.0 and v2.6.3

Also use torch 2.4 wheels.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-25 23:39:52 -04:00
AlpinDale
5adfab1cbd ruff: formatting 2024-07-26 02:53:14 +00:00
AlpinDale
765d3593b3 remove submodule 2024-07-26 02:52:18 +00:00
AlpinDale
f20cd330ef feat: add embeddings support via sentence-transformers 2024-07-26 02:45:07 +00:00
kingbri
a1c3f6cc1c Dependencies: Update ExllamaV2
v0.1.8

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-24 22:00:43 -04:00
kingbri
27f9559d83 Dependencies: Switch to fastapi-slim
Reduces dependency size since the full fastapi package isn't required.
Add httptools since it makes requests faster and it was installed
with fastapi previously.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-24 21:59:56 -04:00
kingbri
42bc4adcfb Config: Add option to set priority to realtime
Realtime process priority assigns resources to point to tabby's
processes. Running as administrator will give realtime priority
while running as a normal user will set as high priority.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-24 21:50:06 -04:00
kingbri
5c082b7e8c Async: Add option to use Uvloop/Winloop
These are faster event loops for asyncio which should improve overall
performance. Gate these under an experimental flag for now to stress
test these loops.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-24 18:59:20 -04:00
kingbri
71de3060bb Downloader: Make timeout configurable
Add an API parameter to set the timeout in seconds. Keep it to None
by default for uninterrupted downloads.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-23 21:42:38 -04:00
kingbri
8c02fe9771 Downloader: Disable timeout
This prevents TimeoutErrors from showing up. However, a longer
timeout may be necessary since this is in the API. Turning it off
for now will help resolve immediate errors.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-23 21:38:46 -04:00
Vhallo
b2064bbfb4
Typo fix in completion.py 2024-07-23 23:49:43 +02:00