Commit graph

472 commits

Author SHA1 Message Date
kingbri
32ae62feac Model: Add filter support to dynamic gen
Dynamic gen takes in filters differently. Adjust to set the filter list
per class rather than in the generation function.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-25 21:16:14 -04:00
kingbri
8ccd8fe5f8 Model: Initial dynamic generator support
Adds basic support for ExllamaV2's dynamic generator. Can generate
a streaming and non-streaming completion.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-25 21:16:14 -04:00
kingbri
c474076b22 Concurrency: Remove release_semaphore method
At any point for any request cancellation, the semaphore will be
decremented. This is an issue since an arbitrary request can desync
the semaphore, causing multiple tasks to be processed at once and
break generation.

Remove this from the networking handlers and therefore, remove the
release_semaphore function itself.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-19 10:42:26 -04:00
kingbri
b9fd8555fe Sampling: Copy over iterable overrides
If an override was iterable, any modifications to the returned value
would alter the reference to the global storage dict.

Therefore, copy the structure if it's an iterable so any modification
won't alter the original override. Also apply this for the function
that checks for forced overrides.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-17 21:38:28 -04:00
kingbri
0e9385e023 API: Fix usage reporting for chat completions
Resolves #106

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-17 00:03:15 -04:00
kingbri
e4bb709305 Model: Fix usage stats in non-streaming gens
The wrong key was being returned from the model to the API.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-12 22:44:50 -04:00
kingbri
213430a122 Model/Grammar: Remove lmfe checks
lmfe is a required dependency, so checks are no longer needed.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-12 22:24:28 -04:00
Brian Dashore
b255847c2a
Merge pull request #105 from DocShotgun/main
Add support for regex pattern constraints
2024-05-12 22:22:12 -04:00
DocShotgun
abe411c6fb API + Model: Add support for regex pattern constraints
Adds the ability to constrain generation via regex pattern using lm-format-enforcer.
2024-05-12 19:10:43 -07:00
Ycros
57525219d0
Fix: Properly handle banned_strings and decode_special tokens (#104)
* Fix: Actually pass banned_strings to the generation call.

* decode_special_tokens was missing as well.

* syntax
2024-05-12 20:47:45 +00:00
Brian Dashore
611f00818b
Merge pull request #103 from DocShotgun/main
Minor fixes for sampler override
2024-05-12 16:47:12 -04:00
DocShotgun
dad34237ba Samplers: Add example override for generate_window 2024-05-12 00:39:01 -07:00
DocShotgun
9463ecfa40 Samplers: Minor fixes for sampler override
* Add missing settings to sample_preset.yml
* Fix override for skip_special_tokens
2024-05-12 00:31:31 -07:00
kingbri
c8ec742be9 Samplers: Expose skew sampling
Skew is an extra unused sampler in ExllamaV2. Add it in for coverage.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-12 01:41:01 -04:00
kingbri
6f4012d20d API: Add preset listing for sampler overrides
Querying the overrides list endpoint now returns the selected preset
and a list of presets to use.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-12 01:34:51 -04:00
kingbri
b4bc941cbe Tree: Lint
Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-11 22:42:39 -04:00
kingbri
2da3fb2caf Start: Bump ROCm error version
ROCm support is for 6.0 now. Update that.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-11 21:57:51 -04:00
kingbri
7bebc085ec Model: Remove legacy checks
v0.0.21 has these features implemented.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-11 19:26:23 -04:00
kingbri
cd78728a77 Dependencies: Update ExllamaV2
v0.0.21

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-11 19:26:03 -04:00
Brian Dashore
5432f523cb
Merge pull request #102 from DocShotgun/main
Add support for min_tokens and banned_strings
2024-05-10 21:21:57 -04:00
kingbri
366d57cf45 Tree: Format
Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-10 21:20:41 -04:00
kingbri
7eee936a3f Model: Remove old code and fix API handling
skip_special_tokens is in stable exl2. Also default the parameters
if they are not present in the function signature.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-10 21:20:00 -04:00
DocShotgun
c0b631ba92 API: Add banned_strings
From exllamav2: List of strings that the generator will refuse to output. As soon as a partial match happens, a checkpoint is saved that the generator can rewind to if need be. Subsequent tokens are then held until the full string is resolved (match or no match) and either emitted or discarded, accordingly.
2024-05-10 13:53:55 -07:00
DocShotgun
a1df22668b API: Add min_tokens
Bans the EOS token until the generation reaches a minimum length. This will not prevent the model from otherwise ending the generation early by outputting other stop conditions.
2024-05-10 12:30:17 -07:00
Brian Dashore
643b53e347
Create FUNDING.yml
Add ko-fi link.

Signed-off-by: kingbri <bdashore3@gmail.com>
2024-05-09 19:00:41 +00:00
Brian Dashore
c4f7af160e
Merge pull request #101 from Bakharovsky/fix_exllamav2_cuda_version
Fix: the link to the exllamav2 build for cuda 11.8
2024-05-08 16:32:22 -04:00
Arseniy Bakharovsky
33c86be45c
Update pyproject.toml 2024-05-08 03:31:15 +04:00
kingbri
ae879a623f Main: Add await to an async function
load_loras wasn't properly updated.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-02 21:24:43 -04:00
kingbri
ab526f7278 Revert "API: Remove unncessary Optional signatures"
This reverts commit 7556dcf134.

The Optionals allowed requests to send "null" in the body for optional
parameters which should be allowed.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-02 21:23:48 -04:00
kingbri
7556dcf134 API: Remove unncessary Optional signatures
Optional isn't necessary if the function signature has a default
value.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-01 00:04:52 -04:00
kingbri
ae75db1829 Downloader: Cleanup on exception
Otherwise a file exists error will show up if any exception happens
but cancel.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-30 23:26:22 -04:00
kingbri
e4084b15c1 Downloader: Format
Make a public function private.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-30 01:16:57 -04:00
kingbri
50e0b71690 Downloader: Fix handling of include pattern
If an include or exclude pattern is provided, include should include
all files by default.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-30 01:13:06 -04:00
kingbri
21a01741c9 Downloader: Add include and exclude parameters
These both take an array of glob strings to state what files or
directories to include or exclude when parsing the download list.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-30 00:58:54 -04:00
kingbri
c47869c606 Downloader: Fix fallback mechanisms
Use None-ish coalescing instead of unwrap optional handling. This means
that any value that is "empty" for python will default to the fallback.

Ex. print("" or "test") will print out "test"

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-29 23:33:37 -04:00
kingbri
55ccd1baad API: Add HuggingFace downloader
Adds an asynchronous huggingface downloader that uses HF hub to fetch
all repo files. The current HF hub package has a snapshot_download
function that does not cancel on KeyboardInterrupt.

Instead, make a downloader that uses the Rich progress bar styling
along with a cancellable interface. Finally, link this to TabbyAPI.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-29 01:15:02 -04:00
kingbri
6114bfd221 API: Fix banned_tokens string when empty
The string should not be parsed and any non-string elements should
be removed as well.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-28 12:46:28 -04:00
kingbri
72dff0b6d5 Update README
Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-28 11:26:20 -04:00
kingbri
fb01b164d8 Dependencies: Update flash attention 2
v2.5.8

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-28 11:07:00 -04:00
kingbri
0e015ad58e Dependencies: Update ExllamaV2
v0.0.20

ROCm 6.0 is now required

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-28 11:06:59 -04:00
kingbri
3de93d7c0a Dependencies: Update torch
v2.3.0

NOTE: ROCm is updated to v6.0 wheels

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-28 11:06:17 -04:00
kingbri
4daa6390a5 Dependencies: Unpin lm-format-enforcer
It should be fine to use the stable version from now on.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-28 11:06:17 -04:00
kingbri
6f9da97114 API: Add banned_tokens
Appends the banned tokens to the generation. This is equivalent of
setting logit bias to -100 on a specific set of tokens.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-28 11:06:09 -04:00
kingbri
5750826120 Model: Remove extraneous print
Was printing IDs by accident.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-25 18:49:09 -04:00
kingbri
fb1d2f34c1 OAI: Add response_prefix and fix BOS token issues in chat completions
response_prefix is used to add a prefix before generating the next
message. This is used in many cases such as continuining a prompt
(see #96).

Also if a template has BOS token specified, add_bos_token will
append two BOS tokens. Add a check which strips a starting BOS token
from the prompt if it exists.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-25 00:54:43 -04:00
kingbri
ed7cd3cb59 Network: Fix socket check timeout
Make this a one second timeout to check if a socket is connected.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-22 21:33:41 -04:00
kingbri
1e56d43772 Dependencies: Update lm-format-enforcer
v0.9.8

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-22 21:33:28 -04:00
kingbri
88b0b6f4f1 Model: Cast autosplit_reserve to int
Torch errors if float values are passed (because bytes are not float
types). Therefore, overestimate and cast to an int type.

Resolves #97

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-21 23:49:01 -04:00
kingbri
cab789e685 Templates: Migrate to class
Having many utility functions for initialization doesn't make much sense.
Instead, handle anything regarding template creation inside the
class which reduces the amount of function imports.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-21 23:28:14 -04:00
kingbri
9f93505bc1 OAI: Add skip_special_tokens parameter
Allows the ability to decode special tokens if the user wishes.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-21 00:37:46 -04:00