Commit graph

612 commits

Author SHA1 Message Date
kingbri
0bcb4e4a7d Model: Attach request ID to logs
If multiple logs come in at once, track which log corresponds to
which request.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-01 00:25:54 -04:00
kingbri
9390d362dd Model: Log generation params and metrics after the prompt/response
A user's prompt and response can be large in the console. Therefore,
always log the smaller payloads (ex. gen params + metrics) after
the large chunks.

However, it's recommended to keep prompt logging off anyways since
it'll result in console spam.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-01 00:19:21 -04:00
Brian Dashore
1bf062559d
Merge pull request #158 from AlpinDale/embeddings
feat: add embeddings support via Infinity-emb
2024-07-31 20:33:12 -04:00
kingbri
f111052e39 Dependencies: Use hosted pip index instead of Github
Installing directly from github causes pip's HTTP cache to not
recognize that the correct version of a package is already installed.
This causes a redownload.

When using the Start.bat script, it updates dependencies automatically
to keep users on the latest versions of a package for security reasons.

A simple pip cache website helps alleviate this problem and allows pip
to find the cached wheels when invoked with an upgrade argument.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-30 20:46:37 -04:00
kingbri
46304ce875 Model: Properly pass in max_batch_size from config
The override wasn't being passed in before. Also, the default is now
none since Exl2 can automatically calculate the max batch size.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-30 18:42:25 -04:00
kingbri
dc3dcc9c0d Embeddings: Update config, args, and parameter names
Use embeddings_device as the parameter for device to remove ambiguity.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-30 15:32:26 -04:00
kingbri
bfa011e0ce Embeddings: Add model management
Embedding models are managed on a separate backend, but are run
in parallel with the model itself. Therefore, manage this in a separate
container with separate routes.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-30 15:19:27 -04:00
kingbri
f13d0fb8b3 Embeddings: Add model load checks
Same as the normal model container.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-30 11:17:36 -04:00
kingbri
01c7702859 Signal: Fix async signal handling
Run unload async functions before exiting the program.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-30 11:11:05 -04:00
kingbri
fbf1455db1 Embeddings: Migrate and organize Infinity
Use Infinity as a separate backend and handle the model within the
common module. This separates out the embeddings model from the endpoint
which allows for model loading/unloading in core.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-30 11:00:23 -04:00
kingbri
ac1afcc588 Embeddings: Use response classes instead of dicts
Follows the existing code style.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-29 14:15:40 -04:00
kingbri
3f21d9ef96 Embeddings: Switch to Infinity
Infinity-emb is an async batching engine for embeddings. This is
preferable to sentence-transformers since it handles scalable usecases
without the need for external thread intervention.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-29 13:42:03 -04:00
kingbri
c9a5d2c363 OAI: Refactor embeddings
Move files and rewrite routes to adhere to Tabby's code style.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-28 14:10:51 -04:00
kingbri
d85414738d Dependencies: Update Flash Attention 2
v2.6.3 with torch 2.3 wheels.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-28 13:50:15 -04:00
kingbri
c79e0832d5 Revert "Dependencies: Update pytorch and flash_attention"
This reverts commit f47d96790c.

See https://github.com/pytorch/pytorch/issues/131662 for more information.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-28 13:49:04 -04:00
kingbri
7b8b3fe23d Kobold: Fix max length type
Was mistakenly a string instead of an integer.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 23:00:26 -04:00
kingbri
e3226ed930 Kobold: Add untracked file
Model types weren't added.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 22:57:55 -04:00
kingbri
3038f668e8 Kobold: Add extra routes for horde compatability
Needed to connect to horde. Also do some reordering to clean the
router file up.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 22:55:54 -04:00
kingbri
2773517a16 API: Add setup function to routers
This helps prepare the router before exposing it to the parent app.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 22:24:33 -04:00
Brian Dashore
6365427d38
Merge pull request #155 from Vhallo/main
Simple Typo Fix
2024-07-26 21:35:50 -04:00
kingbri
884b6f5ecd API: Add log options for initialization
Make each API log their respective URLs to help inform users.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 21:32:05 -04:00
kingbri
e8fc13a1f6 Tree: Format
Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 18:33:04 -04:00
kingbri
ea80b62e30 Sampling: Reorder aliased params and add kobold aliases
Also add dynatemp range which is an alternative way of calculating
min and max temp.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 18:32:33 -04:00
kingbri
7522b1447b Model: Add support for HuggingFace config and bad_words_ids
This is necessary for Kobold's API. Current models use bad_words_ids
in generation_config.json, but for some reason, they're also present
in the model's config.json.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 18:23:22 -04:00
kingbri
545e26608f Kobold: Move params to aliases
Some of the parameters the API provides are aliases for their OAI
equivalents. It makes more sense to move them to the common file.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 16:46:54 -04:00
kingbri
b7cb6f0b91 API: Add KoboldAI server
Used for interacting with applications that use KoboldAI's API
such as horde.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 16:37:30 -04:00
kingbri
4e808cbed7 Auth: Fix disable auth when checking for key permissions
Since authentication is disabled, remove the limited permissions
for requests.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 15:04:29 -04:00
kingbri
f47d96790c Dependencies: Update pytorch and flash_attention
v2.4.0 and v2.6.3

Also use torch 2.4 wheels.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-25 23:39:52 -04:00
AlpinDale
5adfab1cbd ruff: formatting 2024-07-26 02:53:14 +00:00
AlpinDale
765d3593b3 remove submodule 2024-07-26 02:52:18 +00:00
AlpinDale
f20cd330ef feat: add embeddings support via sentence-transformers 2024-07-26 02:45:07 +00:00
kingbri
a1c3f6cc1c Dependencies: Update ExllamaV2
v0.1.8

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-24 22:00:43 -04:00
kingbri
27f9559d83 Dependencies: Switch to fastapi-slim
Reduces dependency size since the full fastapi package isn't required.
Add httptools since it makes requests faster and it was installed
with fastapi previously.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-24 21:59:56 -04:00
kingbri
42bc4adcfb Config: Add option to set priority to realtime
Realtime process priority assigns resources to point to tabby's
processes. Running as administrator will give realtime priority
while running as a normal user will set as high priority.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-24 21:50:06 -04:00
kingbri
5c082b7e8c Async: Add option to use Uvloop/Winloop
These are faster event loops for asyncio which should improve overall
performance. Gate these under an experimental flag for now to stress
test these loops.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-24 18:59:20 -04:00
kingbri
71de3060bb Downloader: Make timeout configurable
Add an API parameter to set the timeout in seconds. Keep it to None
by default for uninterrupted downloads.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-23 21:42:38 -04:00
kingbri
8c02fe9771 Downloader: Disable timeout
This prevents TimeoutErrors from showing up. However, a longer
timeout may be necessary since this is in the API. Turning it off
for now will help resolve immediate errors.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-23 21:38:46 -04:00
Vhallo
b2064bbfb4
Typo fix in completion.py 2024-07-23 23:49:43 +02:00
Vhallo
88e4b108b4
Typo fix in chat_completion.py 2024-07-23 23:48:50 +02:00
kingbri
3e8ffebdd3 Tree: Format
Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-23 14:32:50 -04:00
kingbri
300f034233 API: Add config option to select servers
Always enable the core endpoints and allow servers to be selected
as needed. Use the OAI server by default.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-23 14:27:42 -04:00
kingbri
9ad69e8ab6 API: Migrate universal routes to core
Place OAI specific routes in the appropriate folder. This is in
preperation for adding new API servers that can be optionally enabled.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-23 14:08:48 -04:00
kingbri
64c2cc85c9 OAI: Migrate model depends into proper file
Use amongst multiple routers.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-23 13:59:56 -04:00
kingbri
d1706fb067 OAI: Remove double logging if request is cancelled
Uvicorn can log in both the request disconnect handler and the
CancelledError. However, these sometimes don't work and both
need to be checked. But, don't log twice if one works.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-22 21:48:59 -04:00
kingbri
14dfaf600a Args: Add request logging
Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-22 21:41:42 -04:00
kingbri
3826815edb API: Add request logging
Log all the parts of a request if the config flag is set. The logged
fields are all server side anyways, so nothing is being exposed to
clients.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-22 21:40:00 -04:00
kingbri
522999ebb4 Config: Change from gen_logging to logging
More accurately reflects the config.yml's sections.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-22 21:15:16 -04:00
kingbri
191600a150 Revert "Model: Skip empty token chunks"
This reverts commit 21516bd7b5.

This skips EOS and implementing it the proper way seems more
costly than necessary.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-22 18:34:00 -04:00
kingbri
15f891b277 Args: Update to latest config.yml
Fix order of params to follow the same flow as config.yml

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-22 16:26:41 -04:00
kingbri
ad4d17bca2 Tree: Format
Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-22 12:24:34 -04:00