Commit graph

464 commits

Author SHA1 Message Date
DocShotgun
abe411c6fb API + Model: Add support for regex pattern constraints
Adds the ability to constrain generation via regex pattern using lm-format-enforcer.
2024-05-12 19:10:43 -07:00
Ycros
57525219d0
Fix: Properly handle banned_strings and decode_special tokens (#104)
* Fix: Actually pass banned_strings to the generation call.

* decode_special_tokens was missing as well.

* syntax
2024-05-12 20:47:45 +00:00
Brian Dashore
611f00818b
Merge pull request #103 from DocShotgun/main
Minor fixes for sampler override
2024-05-12 16:47:12 -04:00
DocShotgun
dad34237ba Samplers: Add example override for generate_window 2024-05-12 00:39:01 -07:00
DocShotgun
9463ecfa40 Samplers: Minor fixes for sampler override
* Add missing settings to sample_preset.yml
* Fix override for skip_special_tokens
2024-05-12 00:31:31 -07:00
kingbri
c8ec742be9 Samplers: Expose skew sampling
Skew is an extra unused sampler in ExllamaV2. Add it in for coverage.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-12 01:41:01 -04:00
kingbri
6f4012d20d API: Add preset listing for sampler overrides
Querying the overrides list endpoint now returns the selected preset
and a list of presets to use.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-12 01:34:51 -04:00
kingbri
b4bc941cbe Tree: Lint
Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-11 22:42:39 -04:00
kingbri
2da3fb2caf Start: Bump ROCm error version
ROCm support is for 6.0 now. Update that.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-11 21:57:51 -04:00
kingbri
7bebc085ec Model: Remove legacy checks
v0.0.21 has these features implemented.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-11 19:26:23 -04:00
kingbri
cd78728a77 Dependencies: Update ExllamaV2
v0.0.21

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-11 19:26:03 -04:00
Brian Dashore
5432f523cb
Merge pull request #102 from DocShotgun/main
Add support for min_tokens and banned_strings
2024-05-10 21:21:57 -04:00
kingbri
366d57cf45 Tree: Format
Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-10 21:20:41 -04:00
kingbri
7eee936a3f Model: Remove old code and fix API handling
skip_special_tokens is in stable exl2. Also default the parameters
if they are not present in the function signature.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-10 21:20:00 -04:00
DocShotgun
c0b631ba92 API: Add banned_strings
From exllamav2: List of strings that the generator will refuse to output. As soon as a partial match happens, a checkpoint is saved that the generator can rewind to if need be. Subsequent tokens are then held until the full string is resolved (match or no match) and either emitted or discarded, accordingly.
2024-05-10 13:53:55 -07:00
DocShotgun
a1df22668b API: Add min_tokens
Bans the EOS token until the generation reaches a minimum length. This will not prevent the model from otherwise ending the generation early by outputting other stop conditions.
2024-05-10 12:30:17 -07:00
Brian Dashore
643b53e347
Create FUNDING.yml
Add ko-fi link.

Signed-off-by: kingbri <bdashore3@gmail.com>
2024-05-09 19:00:41 +00:00
Brian Dashore
c4f7af160e
Merge pull request #101 from Bakharovsky/fix_exllamav2_cuda_version
Fix: the link to the exllamav2 build for cuda 11.8
2024-05-08 16:32:22 -04:00
Arseniy Bakharovsky
33c86be45c
Update pyproject.toml 2024-05-08 03:31:15 +04:00
kingbri
ae879a623f Main: Add await to an async function
load_loras wasn't properly updated.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-02 21:24:43 -04:00
kingbri
ab526f7278 Revert "API: Remove unncessary Optional signatures"
This reverts commit 7556dcf134.

The Optionals allowed requests to send "null" in the body for optional
parameters which should be allowed.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-02 21:23:48 -04:00
kingbri
7556dcf134 API: Remove unncessary Optional signatures
Optional isn't necessary if the function signature has a default
value.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-01 00:04:52 -04:00
kingbri
ae75db1829 Downloader: Cleanup on exception
Otherwise a file exists error will show up if any exception happens
but cancel.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-30 23:26:22 -04:00
kingbri
e4084b15c1 Downloader: Format
Make a public function private.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-30 01:16:57 -04:00
kingbri
50e0b71690 Downloader: Fix handling of include pattern
If an include or exclude pattern is provided, include should include
all files by default.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-30 01:13:06 -04:00
kingbri
21a01741c9 Downloader: Add include and exclude parameters
These both take an array of glob strings to state what files or
directories to include or exclude when parsing the download list.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-30 00:58:54 -04:00
kingbri
c47869c606 Downloader: Fix fallback mechanisms
Use None-ish coalescing instead of unwrap optional handling. This means
that any value that is "empty" for python will default to the fallback.

Ex. print("" or "test") will print out "test"

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-29 23:33:37 -04:00
kingbri
55ccd1baad API: Add HuggingFace downloader
Adds an asynchronous huggingface downloader that uses HF hub to fetch
all repo files. The current HF hub package has a snapshot_download
function that does not cancel on KeyboardInterrupt.

Instead, make a downloader that uses the Rich progress bar styling
along with a cancellable interface. Finally, link this to TabbyAPI.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-29 01:15:02 -04:00
kingbri
6114bfd221 API: Fix banned_tokens string when empty
The string should not be parsed and any non-string elements should
be removed as well.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-28 12:46:28 -04:00
kingbri
72dff0b6d5 Update README
Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-28 11:26:20 -04:00
kingbri
fb01b164d8 Dependencies: Update flash attention 2
v2.5.8

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-28 11:07:00 -04:00
kingbri
0e015ad58e Dependencies: Update ExllamaV2
v0.0.20

ROCm 6.0 is now required

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-28 11:06:59 -04:00
kingbri
3de93d7c0a Dependencies: Update torch
v2.3.0

NOTE: ROCm is updated to v6.0 wheels

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-28 11:06:17 -04:00
kingbri
4daa6390a5 Dependencies: Unpin lm-format-enforcer
It should be fine to use the stable version from now on.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-28 11:06:17 -04:00
kingbri
6f9da97114 API: Add banned_tokens
Appends the banned tokens to the generation. This is equivalent of
setting logit bias to -100 on a specific set of tokens.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-28 11:06:09 -04:00
kingbri
5750826120 Model: Remove extraneous print
Was printing IDs by accident.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-25 18:49:09 -04:00
kingbri
fb1d2f34c1 OAI: Add response_prefix and fix BOS token issues in chat completions
response_prefix is used to add a prefix before generating the next
message. This is used in many cases such as continuining a prompt
(see #96).

Also if a template has BOS token specified, add_bos_token will
append two BOS tokens. Add a check which strips a starting BOS token
from the prompt if it exists.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-25 00:54:43 -04:00
kingbri
ed7cd3cb59 Network: Fix socket check timeout
Make this a one second timeout to check if a socket is connected.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-22 21:33:41 -04:00
kingbri
1e56d43772 Dependencies: Update lm-format-enforcer
v0.9.8

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-22 21:33:28 -04:00
kingbri
88b0b6f4f1 Model: Cast autosplit_reserve to int
Torch errors if float values are passed (because bytes are not float
types). Therefore, overestimate and cast to an int type.

Resolves #97

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-21 23:49:01 -04:00
kingbri
cab789e685 Templates: Migrate to class
Having many utility functions for initialization doesn't make much sense.
Instead, handle anything regarding template creation inside the
class which reduces the amount of function imports.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-21 23:28:14 -04:00
kingbri
9f93505bc1 OAI: Add skip_special_tokens parameter
Allows the ability to decode special tokens if the user wishes.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-21 00:37:46 -04:00
kingbri
67f061859d Tree: Add transformers_utils
Part of commit 8824ea0205

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-20 00:07:39 -04:00
kingbri
8824ea0205 Model: Add EOS token support from generation_config.json
GenerationConfig is meant to override various parts of the model
on generation within the transformers lib. Rather than implementing
the entire GenerationConfig framework (since it's pretty redundant),
add in multi eos_token support like VLLM.

The GenerationConfig is used only for generation, but can be used
for other uses if needed.

If there's more necessary parameters in the future, add those in
as well.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-19 22:52:32 -04:00
kingbri
933c5afef0 Dependencies: Update ExllamaV2 and lm-format-enforcer
ExllamaV2: v0.0.19
lmfe: v0.9.6

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-19 21:15:50 -04:00
kingbri
65871ebc0c Docker: Add var to pull on build
When building the Docker container, try pulling from the github
repository to get the latest commit.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-19 21:06:34 -04:00
kingbri
209f0370b4 Docker: Switch image and copy config
Automatically create a config.yml on build. Also use the cuda runtime
image which is much lighter than the previous cuda devel image.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-15 18:01:56 -04:00
Brian Dashore
a2a2e4b866
Merge pull request #94 from pabl-o-ce/docker
Dockerfile work with pyproject.toml
2024-04-15 18:01:09 -04:00
kingbri
515b3c2930 OAI: Tokenize chat completion messages
Since chat completion messages are a structure, format the prompt
before checking in the tokenizer.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-15 14:17:16 -04:00
kingbri
ed05f376d9 Dependencies: Switch to LM-format-enforcer fork
LM format enforcer has some latency on token ingestion, so use an
optimized fork instead. Also add this in as a base dependency since
the size is small.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-14 11:59:49 -04:00