Commit graph

275 commits

Author SHA1 Message Date
kingbri
f13d0fb8b3 Embeddings: Add model load checks
Same as the normal model container.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-30 11:17:36 -04:00
kingbri
01c7702859 Signal: Fix async signal handling
Run unload async functions before exiting the program.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-30 11:11:05 -04:00
kingbri
fbf1455db1 Embeddings: Migrate and organize Infinity
Use Infinity as a separate backend and handle the model within the
common module. This separates out the embeddings model from the endpoint
which allows for model loading/unloading in core.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-30 11:00:23 -04:00
kingbri
3f21d9ef96 Embeddings: Switch to Infinity
Infinity-emb is an async batching engine for embeddings. This is
preferable to sentence-transformers since it handles scalable usecases
without the need for external thread intervention.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-29 13:42:03 -04:00
kingbri
e8fc13a1f6 Tree: Format
Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 18:33:04 -04:00
kingbri
ea80b62e30 Sampling: Reorder aliased params and add kobold aliases
Also add dynatemp range which is an alternative way of calculating
min and max temp.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 18:32:33 -04:00
kingbri
7522b1447b Model: Add support for HuggingFace config and bad_words_ids
This is necessary for Kobold's API. Current models use bad_words_ids
in generation_config.json, but for some reason, they're also present
in the model's config.json.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 18:23:22 -04:00
kingbri
545e26608f Kobold: Move params to aliases
Some of the parameters the API provides are aliases for their OAI
equivalents. It makes more sense to move them to the common file.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 16:46:54 -04:00
kingbri
4e808cbed7 Auth: Fix disable auth when checking for key permissions
Since authentication is disabled, remove the limited permissions
for requests.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 15:04:29 -04:00
kingbri
5c082b7e8c Async: Add option to use Uvloop/Winloop
These are faster event loops for asyncio which should improve overall
performance. Gate these under an experimental flag for now to stress
test these loops.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-24 18:59:20 -04:00
kingbri
71de3060bb Downloader: Make timeout configurable
Add an API parameter to set the timeout in seconds. Keep it to None
by default for uninterrupted downloads.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-23 21:42:38 -04:00
kingbri
8c02fe9771 Downloader: Disable timeout
This prevents TimeoutErrors from showing up. However, a longer
timeout may be necessary since this is in the API. Turning it off
for now will help resolve immediate errors.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-23 21:38:46 -04:00
kingbri
64c2cc85c9 OAI: Migrate model depends into proper file
Use amongst multiple routers.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-23 13:59:56 -04:00
kingbri
14dfaf600a Args: Add request logging
Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-22 21:41:42 -04:00
kingbri
3826815edb API: Add request logging
Log all the parts of a request if the config flag is set. The logged
fields are all server side anyways, so nothing is being exposed to
clients.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-22 21:40:00 -04:00
kingbri
522999ebb4 Config: Change from gen_logging to logging
More accurately reflects the config.yml's sections.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-22 21:15:16 -04:00
kingbri
15f891b277 Args: Update to latest config.yml
Fix order of params to follow the same flow as config.yml

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-22 16:26:41 -04:00
kingbri
0eedc8ca14 API: Switch from request ID middleware to depends
Middleware runs on both the request and response. Therefore, streaming
responses had increased latency when processing tasks and sending
data to the client which resulted in erratic streaming behavior.

Use a depends to add request IDs since it only executes when the
request is run rather than expecting the response to be sent as well.

For the future, it would be best to think about limiting the time
between each tick of chunk data to be safe.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-22 12:19:46 -04:00
kingbri
cae94b920c API: Add ability to use request IDs
Identify which request is being processed to help users disambiguate
which logs correspond to which request.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-21 21:01:05 -04:00
kingbri
38185a1ff4 Auth: Fix key check coalesce
Prefer the auth-specific headers before the generic authorization
header.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-19 10:08:57 -04:00
kingbri
e20a2d504b API: Fix pydantic validation errors on disconnect poll returns
Raise a 422 exception for the disconnect. This prevents pydantic
errors when returning a "response" which doesn't contain anything
in this case.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-15 14:41:49 -04:00
kingbri
6019c93637 Networking: Gate sending tracebacks over the API
It's possible that tracebacks can give too much info about a system
when sent over the API. Gate this under a flag to send them only
when debugging since this feature is still useful.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-14 10:30:11 -04:00
kingbri
1f46a1130c OAI: Restrict list permissions for API keys
API keys are not allowed to view all the admin's models, templates,
draft models, loras, etc. Basically anything that can be viewed
on the filesystem outside of anything that's currently loaded is
not allowed to be returned unless an admin key is present.

This change helps preserve user privacy while not erroring out on
list endpoints that the OAI spec requires.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-11 14:22:50 -04:00
kingbri
10890913b8 Auth: Revert x-admin-key allowance in API key check
These kinda clash with each other. Use the correct header for the
correct endpoint.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-11 14:22:50 -04:00
kingbri
b9a58ff01b Auth: Make key permission check work on Requests
Pass a request and internally unwrap the headers. In addition, allow
X-admin-key to get checked in an API key request.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-11 14:22:49 -04:00
kingbri
c7ce97f119 Tree: Ruff lint
Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-08 15:06:28 -04:00
kingbri
6613e38436 Main: Make openapi export store locally
This runs faster than always making a syscall to check if the env
var is set.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-08 14:54:06 -04:00
kingbri
ae66e8f9ba Ruff: Lint
Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-08 13:44:12 -04:00
kingbri
b907421285 Main: Fix launch if EXPORT_OPENAPI is unset
A default needs to be provided with getenv. Fix that with an empty
string.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-08 13:41:44 -04:00
kingbri
933268f7e2 API: Integrate OpenAPI export script
Move OpenAPI export as an env var within the main function. This
allows for easy export by running main.

In addition, an env variable provides global and explicit state to
disable conditional wheel imports (ex. Exl2 and torch) which caused
errors at first.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-08 12:34:32 -04:00
kingbri
27d2d5f3d2 Config + Model: Allow for default fallbacks from config for model loads
Previously, the parameters under the "model" block in config.yml only
handled the loading of a model on startup. This meant that any subsequent
API request required each parameter to be filled out or use a sane default
(usually defaults to the model's config.json).

However, there are cases where admins may want an argument from the
config to apply if the parameter isn't provided in the request body.
To help alleviate this, add a mechanism that works like sampler overrides
where users can specify a flag that acts as a fallback.

Therefore, this change both preserves the source of truth of what
parameters the admin is loading and adds some convenience for users
that want customizable defaults for their requests.

This behavior may change in the future, but I think it solves the
issue for now.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-06 17:50:58 -04:00
turboderp
0eb8fa5d1e
[fix] Bring draft progress and model progress in sync with model loader (#125)
* Bring draft progress and model progress in sync with model loader

* Fix formatting
2024-06-03 19:41:02 +02:00
DocShotgun
7084081b1f Tree: Lint 2024-05-26 18:27:30 -07:00
DocShotgun
ce5e2ec8de Logging: Clarify new vs cached tokens in prompt processing 2024-05-26 18:21:17 -07:00
DocShotgun
7ab7ffd562 Tree: Format 2024-05-26 15:48:18 -07:00
DocShotgun
767e6a798a API + Model: Add support for specifying k/v cache size 2024-05-26 14:17:01 -07:00
kingbri
9fbbc5afca Tree: Swap from map to list comprehensions
List comprehensions are the more "pythonic" way to approach mapping
values to a list. They're also more flexible across different collection
types rather than the inbuilt map method. It's best to keep one convention
rather than splitting down two.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-25 21:16:14 -04:00
kingbri
43cd7f57e8 API + Model: Add blocks and checks for various load requests
Add a sequential lock and wait until jobs are completed before executing
any loading requests that directly alter the model. However, we also
need to block any new requests that come in until the load is finished,
so add a condition that triggers once the lock is free.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-25 21:16:14 -04:00
kingbri
06ff47e2b4 Model: Use true async jobs and add logprobs
The new async dynamic job allows for native async support without the
need of threading. Also add logprobs and metrics back to responses.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-25 21:16:14 -04:00
kingbri
c474076b22 Concurrency: Remove release_semaphore method
At any point for any request cancellation, the semaphore will be
decremented. This is an issue since an arbitrary request can desync
the semaphore, causing multiple tasks to be processed at once and
break generation.

Remove this from the networking handlers and therefore, remove the
release_semaphore function itself.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-19 10:42:26 -04:00
kingbri
b9fd8555fe Sampling: Copy over iterable overrides
If an override was iterable, any modifications to the returned value
would alter the reference to the global storage dict.

Therefore, copy the structure if it's an iterable so any modification
won't alter the original override. Also apply this for the function
that checks for forced overrides.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-17 21:38:28 -04:00
DocShotgun
abe411c6fb API + Model: Add support for regex pattern constraints
Adds the ability to constrain generation via regex pattern using lm-format-enforcer.
2024-05-12 19:10:43 -07:00
DocShotgun
9463ecfa40 Samplers: Minor fixes for sampler override
* Add missing settings to sample_preset.yml
* Fix override for skip_special_tokens
2024-05-12 00:31:31 -07:00
kingbri
c8ec742be9 Samplers: Expose skew sampling
Skew is an extra unused sampler in ExllamaV2. Add it in for coverage.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-12 01:41:01 -04:00
kingbri
6f4012d20d API: Add preset listing for sampler overrides
Querying the overrides list endpoint now returns the selected preset
and a list of presets to use.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-12 01:34:51 -04:00
DocShotgun
c0b631ba92 API: Add banned_strings
From exllamav2: List of strings that the generator will refuse to output. As soon as a partial match happens, a checkpoint is saved that the generator can rewind to if need be. Subsequent tokens are then held until the full string is resolved (match or no match) and either emitted or discarded, accordingly.
2024-05-10 13:53:55 -07:00
DocShotgun
a1df22668b API: Add min_tokens
Bans the EOS token until the generation reaches a minimum length. This will not prevent the model from otherwise ending the generation early by outputting other stop conditions.
2024-05-10 12:30:17 -07:00
kingbri
ab526f7278 Revert "API: Remove unncessary Optional signatures"
This reverts commit 7556dcf134.

The Optionals allowed requests to send "null" in the body for optional
parameters which should be allowed.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-02 21:23:48 -04:00
kingbri
7556dcf134 API: Remove unncessary Optional signatures
Optional isn't necessary if the function signature has a default
value.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-01 00:04:52 -04:00
kingbri
ae75db1829 Downloader: Cleanup on exception
Otherwise a file exists error will show up if any exception happens
but cancel.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-30 23:26:22 -04:00