Commit graph

433 commits

Author SHA1 Message Date
kingbri
0e015ad58e Dependencies: Update ExllamaV2
v0.0.20

ROCm 6.0 is now required

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-28 11:06:59 -04:00
kingbri
3de93d7c0a Dependencies: Update torch
v2.3.0

NOTE: ROCm is updated to v6.0 wheels

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-28 11:06:17 -04:00
kingbri
4daa6390a5 Dependencies: Unpin lm-format-enforcer
It should be fine to use the stable version from now on.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-28 11:06:17 -04:00
kingbri
6f9da97114 API: Add banned_tokens
Appends the banned tokens to the generation. This is equivalent of
setting logit bias to -100 on a specific set of tokens.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-28 11:06:09 -04:00
kingbri
5750826120 Model: Remove extraneous print
Was printing IDs by accident.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-25 18:49:09 -04:00
kingbri
fb1d2f34c1 OAI: Add response_prefix and fix BOS token issues in chat completions
response_prefix is used to add a prefix before generating the next
message. This is used in many cases such as continuining a prompt
(see #96).

Also if a template has BOS token specified, add_bos_token will
append two BOS tokens. Add a check which strips a starting BOS token
from the prompt if it exists.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-25 00:54:43 -04:00
kingbri
ed7cd3cb59 Network: Fix socket check timeout
Make this a one second timeout to check if a socket is connected.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-22 21:33:41 -04:00
kingbri
1e56d43772 Dependencies: Update lm-format-enforcer
v0.9.8

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-22 21:33:28 -04:00
kingbri
88b0b6f4f1 Model: Cast autosplit_reserve to int
Torch errors if float values are passed (because bytes are not float
types). Therefore, overestimate and cast to an int type.

Resolves #97

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-21 23:49:01 -04:00
kingbri
cab789e685 Templates: Migrate to class
Having many utility functions for initialization doesn't make much sense.
Instead, handle anything regarding template creation inside the
class which reduces the amount of function imports.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-21 23:28:14 -04:00
kingbri
9f93505bc1 OAI: Add skip_special_tokens parameter
Allows the ability to decode special tokens if the user wishes.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-21 00:37:46 -04:00
kingbri
67f061859d Tree: Add transformers_utils
Part of commit 8824ea0205

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-20 00:07:39 -04:00
kingbri
8824ea0205 Model: Add EOS token support from generation_config.json
GenerationConfig is meant to override various parts of the model
on generation within the transformers lib. Rather than implementing
the entire GenerationConfig framework (since it's pretty redundant),
add in multi eos_token support like VLLM.

The GenerationConfig is used only for generation, but can be used
for other uses if needed.

If there's more necessary parameters in the future, add those in
as well.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-19 22:52:32 -04:00
kingbri
933c5afef0 Dependencies: Update ExllamaV2 and lm-format-enforcer
ExllamaV2: v0.0.19
lmfe: v0.9.6

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-19 21:15:50 -04:00
kingbri
65871ebc0c Docker: Add var to pull on build
When building the Docker container, try pulling from the github
repository to get the latest commit.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-19 21:06:34 -04:00
kingbri
209f0370b4 Docker: Switch image and copy config
Automatically create a config.yml on build. Also use the cuda runtime
image which is much lighter than the previous cuda devel image.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-15 18:01:56 -04:00
Brian Dashore
a2a2e4b866
Merge pull request #94 from pabl-o-ce/docker
Dockerfile work with pyproject.toml
2024-04-15 18:01:09 -04:00
kingbri
515b3c2930 OAI: Tokenize chat completion messages
Since chat completion messages are a structure, format the prompt
before checking in the tokenizer.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-15 14:17:16 -04:00
kingbri
ed05f376d9 Dependencies: Switch to LM-format-enforcer fork
LM format enforcer has some latency on token ingestion, so use an
optimized fork instead. Also add this in as a base dependency since
the size is small.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-14 11:59:49 -04:00
kingbri
3d14283fe0 Start: Lint
Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-13 12:25:41 -04:00
kingbri
4d158dac90 Start: Fix when reading from gpu_lib file
The wrong variable was being set, so fix that.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-13 12:24:30 -04:00
kingbri
2a0aaa2e8a OAI: Add ability to pass extra vars in jinja templates
A chat completion can now declare extra template_vars to pass when
a template is rendered, opening up the possibility of using state
outside of huggingface's parameters.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-11 09:49:25 -04:00
kingbri
b1f3baad74 OAI: Add response_format parameter
response_format allows a user to request a valid, but arbitrary JSON
object from the API. This is a new part of the OAI spec.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-09 21:33:31 -04:00
kingbri
de41e9f7e9 Start: Add gpu_lib argument
Argument to override the selected GPU library. Useful for daemoniztion
when running for the first time.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-08 23:33:19 -04:00
kingbri
d759a15559 Model: Fix chunk size handling
Wrong class attribute name used for max_attention_size and fixes
declaration of the draft model's chunk_size.

Also expose the parameter to the end user in both config and model
load.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-07 18:39:19 -04:00
kingbri
30c4554572 Requirements: Update Exllamav2
v0.0.18

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-07 18:00:56 -04:00
kingbri
46ac3beea9 Templates: Support list style chat_template keys
HuggingFace updated transformers to provide templates in a list for
tokenizers. Update to support this new format. Providing the name
of a template for the "prompt_template" value in config.yml will also
look inside the template list.

In addition, log if there's a template exception, but continue model
loading since it shouldn't shut down the application.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-07 11:20:25 -04:00
kingbri
5bb4995a7c API: Move OAI to APIRouter
This makes the API more modular for other API implementations in the
future.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-06 01:25:31 -04:00
kingbri
8bdc19124f Start: Fix gpu lib when reading from file
Readline doesn't strip out newlines or spaces.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-02 22:04:01 -04:00
Brian Dashore
cdb96e4f74
Merge pull request #93 from AlpinDale/chore/log-level
chore: make log level configurable via env variable
2024-04-02 00:52:06 -04:00
kingbri
f9f8c97c6d Templates: Fix stop_string parsing
Template modules grab all set vars, including ones that use runtime
vars. If a template var is set to a runtime var and a module is created,
an UndefinedError fires.

Use make_module instead to pass runtime vars when creating a template
module.

Resolves #92

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-02 00:44:04 -04:00
PΔBLØ ᄃΞ
8a5a82baec
Update Dockerfile
remove unnecessary apt install to just use one
2024-04-01 22:27:11 -05:00
PΔBLØ ᄃΞ
85271e2b7d fix: Dockerfile work on pyproject.toml 2024-04-01 19:32:42 -05:00
AlpinDale
1650e6e640 ruff 2024-04-01 23:11:30 +00:00
AlpinDale
5e599ddbd4 typo 2024-04-01 23:08:28 +00:00
AlpinDale
6c4a1a9c70 make log level a global var 2024-04-01 23:07:30 +00:00
AlpinDale
031349133b properly order imports 2024-04-01 23:03:16 +00:00
AlpinDale
e90ead3b35 chore: make log level configurable via env variable 2024-04-01 22:57:56 +00:00
kingbri
6ecce1604b Model: Fix log if exl2 version is too low
Switch to pyproject syntax.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-31 23:11:21 -04:00
kingbri
f534930270 Dependencies: Bump Exllamav2
v0.0.17

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-31 23:10:28 -04:00
kingbri
d716527b92 Sampling: Add additive param to overrides
Additive is used to add collections together. Currently, it's used
for lists, but it can be used for dictionaries in the future.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-31 01:10:55 -04:00
kingbri
05b5700334 Dependencies: Update torch
v2.2.2

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-30 17:03:37 -04:00
kingbri
5c94894a1a Dependencies: Update Flash Attention
v2.5.6

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-30 16:58:24 -04:00
kingbri
b11aac51e2 Model: Add torch.inference_mode() to generator function
Provides a speedup to model forward.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-30 10:45:28 -04:00
kingbri
e8b6a02aa8 API: Move prompt template construction to utils
Best to move the inner workings within its inner function. Also fix
an edge case where stop strings can be a string rather than an array.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-29 02:24:13 -04:00
kingbri
190a0b26c3 Model: Fix generation when stream = false
References #91. Check if the length of the generation array is > 0
after popping the finish reason.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-29 02:15:56 -04:00
kingbri
d4280e1378 Dependencies: Add pytorch-triton-rocm
Required for AMD installs.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-28 11:02:56 -04:00
kingbri
271f5ba7a4 Templates: Modify alpaca and chatml
Add the stop_strings metadata parameter.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-27 22:28:41 -04:00
kingbri
dc456f4cc2 Templates: Add stop_strings meta param
Adding the stop_strings var to chat templates will allow for the
template creator to specify stopping strings to add onto chat completions.

Thes get appended with existing stopping strings that are passed
in the API request. However, a sampler override with force: true will
override all stopping strings.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-27 22:22:07 -04:00
kingbri
277c540c98 Colab: Update
Switch to pyproject

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-24 21:48:48 -04:00