Commit graph

641 commits

Author SHA1 Message Date
kingbri
80ef379721 Sampling: Add top-a support
Currently in exllamav2 dev, but will be in the next release.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-22 23:50:24 -05:00
AlpinDale
6a5bbd217c
feat: logging (#39)
* add logging

* simplify the logger

* formatting

* final touches

* fix format

* Model: Add log to metrics

Signed-off-by: kingbri <bdashore3@proton.me>

---------

Authored-by: AlpinDale <52078762+AlpinDale@users.noreply.github.com>
2023-12-23 04:33:31 +00:00
Brian Dashore
f5314fcdad
Merge pull request #37 from DocShotgun/main
Colab: Expose new config arguments
2023-12-22 12:07:52 -05:00
kingbri
71f6a586f1 Templates: Add error handling for template errors
Similar to the transformers library, add an error handler when an
exception is fired. This relays the error to the user.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-22 11:59:47 -05:00
AlpinDale
fa47f51f85
feat: workflows for formatting/linting (#35)
* add github workflows for pylint and yapf

* yapf

* docstrings for auth

* fix auth.py

* fix generators.py

* fix gen_logging.py

* fix main.py

* fix model.py

* fix templating.py

* fix utils.py

* update formatting.sh to include subdirs for pylint

* fix model_test.py

* fix wheel_test.py

* rename utils to utils_oai

* fix OAI/utils_oai.py

* fix completion.py

* fix token.py

* fix lora.py

* fix common.py

* add pylintrc and fix model.py

* finish up pylint

* fix attribute error

* main.py formatting

* add formatting batch script

* Main: Remove unnecessary global

Linter suggestion.

Signed-off-by: kingbri <bdashore3@proton.me>

* switch to ruff

* Formatting + Linting: Add ruff.toml

Signed-off-by: kingbri <bdashore3@proton.me>

* Formatting + Linting: Switch scripts to use ruff

Also remove the file and recent file change functions from both
scripts.

Signed-off-by: kingbri <bdashore3@proton.me>

* Tree: Format and lint

Signed-off-by: kingbri <bdashore3@proton.me>

* Scripts + Workflows: Format

Signed-off-by: kingbri <bdashore3@proton.me>

* Tree: Remove pylint flags

We use ruff now

Signed-off-by: kingbri <bdashore3@proton.me>

* Tree: Format

Signed-off-by: kingbri <bdashore3@proton.me>

* Formatting: Line length is 88

Use the same value as Black.

Signed-off-by: kingbri <bdashore3@proton.me>

* Tree: Format

Update to new line length rules.

Signed-off-by: kingbri <bdashore3@proton.me>

---------

Authored-by: AlpinDale <52078762+AlpinDale@users.noreply.github.com>
Co-authored-by: kingbri <bdashore3@proton.me>
2023-12-22 16:20:35 +00:00
kingbri
a14abfe21c Templates: Support bos_token and eos_token fields
These are commonly seen in huggingface provided chat templates and
aren't that difficult to add in.

For feature parity, honor the add_bos_token and ban_eos_token
parameters when constructing the prompt.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-22 10:33:11 -05:00
DocShotgun
7967607f12
Colab: Expose new config arguments 2023-12-22 01:53:13 -08:00
Brian Dashore
2bf8087de3
Merge pull request #36 from veden/dev 2023-12-22 00:34:19 -05:00
Veden
91e6823b24
fixed method invocation in get_template_from_model_json 2023-12-21 21:25:59 -08:00
kingbri
8fa764bfbe Auth: Add option to disable authentication
This creates a massive security hole, but it's gated behind a flag
for users who only use localhost.

A warning will pop up when users disable authentication.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-21 23:40:16 -05:00
kingbri
99a798e117 API: Add auth enforcement to draft list
This didn't have an API key gate.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-21 23:14:04 -05:00
kingbri
5d80a049ae Templates: Switch to common function for JSON loading
Fix redundancy in code when loading templates. However, loading
a template from config.json may be a mistake since tokenizer_config.json
is the main place where chat templates are stored.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-21 23:08:51 -05:00
kingbri
72e19dbc12 Config: Change default dirs in sample
Models and draft models default to the models directory while
loras default to the loras directory.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-21 22:35:03 -05:00
Brian Dashore
87a9dfc8c4
Merge pull request #34 from veden/dev
Templates: Added automatic detection of chat templates from tokenizer_config.json
2023-12-21 22:34:53 -05:00
kingbri
1a8afcb6ad Generator: Fix semaphore scheduling
Non-streaming tasks were not regulated by the semaphore, causing these
tasks to interfere with streaming generations. Add helper functions
to take in both sync and async functions for callbacks and sequential
blocking with the semaphore.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-21 21:39:45 -05:00
Aaron Veden
f53c98db94
Templates: Added automatic detection of chat templates from tokenizer_config.json 2023-12-20 22:45:55 -08:00
kingbri
bee758dae9 Config: Clarify rope parameters
Blank = automatic calculation of alpha value.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-20 21:15:06 -05:00
kingbri
5728b9fffb Model: Don't error out if a generation is empty
When stream is false, the generation can be empty, which means
that there's no chunks present in the final generation array, causing
an error.

Instead, return a dummy value if generation is falsy (empty array
or None)

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-20 00:51:33 -05:00
kingbri
ab10b263fd Model: Add override base seq len
Some models (such as mistral and mixtral) set their base sequence
length to 32k due to assumptions of support for sliding window
attention.

Therefore, add this parameter to override the base sequence length
of a model which helps with auto-calculation of rope alpha.

If auto-calculation of rope alpha isn't being used, the max_seq_len
parameter works fine as is.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-20 00:45:39 -05:00
Brian Dashore
5368ed7b64
Merge pull request #31 from veryamazinglystupid/main
cuda -> 12, pydantic error fixed.
2023-12-20 00:04:51 -05:00
kingbri
5fbb37405f Colab: Remove the pydantic hotfix
Requirements.txt is now pinned to install pydantic >= 2.0.0

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-20 00:01:58 -05:00
kingbri
c9e43e51aa API: Add route for draft model list
Does the same thing as model list except with draft models.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-19 23:45:53 -05:00
kingbri
ce2602df9a Model: Fix max seq len handling
Previously, the max sequence length was overriden by the user's
config and never took the model's config.json into account.

Now, set the default to 4096, but include config.prepare when
selecting the max sequence length. The yaml and API request
now serve as overrides rather than parameters.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-19 23:37:52 -05:00
kingbri
d3246747c0 Templates: Attempt loading from model config
Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-19 22:58:47 -05:00
kingbri
da69ad8cd3 Requirements: Pin versions for some dependencies
Pydantic and Jinja2 need pinned versions.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-19 21:48:04 -05:00
kingbri
1fd38c61de API: Remove model check dependency for lora list
This isn't needed for listing stuff.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-19 21:35:29 -05:00
veryamazinglystupid
12bf7a0174
fix the colab, pydantic error
:3
2023-12-19 19:46:57 +05:30
kingbri
0a144688c6 Templates: Add clarity statements
Lets the user know if a file not found (OSError) occurs and prints
the applied template on model load.

Also fix some remaining references to fastchat.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-19 08:13:04 -05:00
kingbri
0d76ed9b8b Revert "Start: Add an argument parser to batch file"
This reverts commit 097c298c39.
2023-12-19 00:01:27 -05:00
kingbri
45e2987622 Start: Fix batch file condition
Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-18 23:57:30 -05:00
kingbri
097c298c39 Start: Add an argument parser to batch file
Used for future arguments.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-18 23:53:47 -05:00
kingbri
c3f7898967 OAI: Add logit bias support
Use exllamav2's token bias which is the functional equivalent of
OAI's logit bias parameter.

Strings are casted to integers on request and errors if an invalid
value is passed.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-18 23:53:47 -05:00
kingbri
46f6dc824e Scripts: Add requirements update to start script
Also add an argument to skip the requirements if needed.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-18 23:53:47 -05:00
kingbri
1f2cc8a47b Templates: Update folder
Move README to the separate templates repo.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-18 23:53:47 -05:00
kingbri
bc21f0bbc0 OAI: Add field aliasing
Repetition penalty range needs field aliases to support multiple
parameter calls.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-18 23:53:47 -05:00
kingbri
124e39df26 Remove fschat from Dockerfile
Fastchat is removed from all dependencies

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-18 23:53:47 -05:00
kingbri
de9a19b5d3 Templating: Add generation prompt appending
Append generation prompts if given the flag on an OAI chat completion
request.

This appends the "assistant" message to the instruct prompt. Defaults
to true since this is intended behavior.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-18 23:53:47 -05:00
kingbri
041070fd6e Update gitignore
Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-18 23:53:47 -05:00
kingbri
417cb958fa Auth: Only regenerate auth on OSError
OSError means that a file wasn't found, which means auth tokens should
be rengenerated. Otherwise, fire the error and exit.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-18 23:53:47 -05:00
kingbri
a87e474660 OAI: Fix chat completion validation
Validation wasn't properly run on older pydantic, so ChatCompletionRespChoice
was being sent instead of a ChatCompletionMessage when streaming
responses.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-18 23:53:47 -05:00
kingbri
7cbc08fc72 Templates: Add auto-detection from path
This replicates FastChat's model path detection.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-18 23:53:47 -05:00
kingbri
e895eaa4bd OAI: Clarify types in docs
Adding field descriptions show which parameters are used solely for
OAI compliance and not actually parsed in the model code.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-18 23:53:47 -05:00
kingbri
51ca1ff396 Tree: Switch to Pydantic 2
Pydantic 2 has more modern methods and stability compared to Pydantic 1

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-18 23:53:47 -05:00
kingbri
f631dd6ff7 Templates: Switch to Jinja2
Jinja2 is a lightweight template parser that's used in Transformers
for parsing chat completions. It's much more efficient than Fastchat
and can be imported as part of requirements.

Also allows for unblocking Pydantic's version.

Users now have to provide their own template if needed. A separate
repo may be usable for common prompt template storage.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-18 23:53:47 -05:00
kingbri
95fd0f075e Model: Fix no flash attention
Was being called wrong from config.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-17 23:31:58 -05:00
kingbri
ad8807a830 Model: Add support for num_experts_by_token
New parameter that's safe to edit in exllamav2 v0.0.11. Only recommended
for people who know what they're doing.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-17 18:03:01 -05:00
kingbri
70fbee3edd OAI: Fix model parameter placement
Accidentally edited the Model Card parameters vs the model load request
ones.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-17 14:36:28 -05:00
kingbri
1d0bdfa77c Model + OAI: Fix parameter parsing
Rope alpha changes don't require removing the 1.0 default
from Rope scale.

Keep defaults when possible to avoid errors.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-17 14:28:18 -05:00
Veden
3e57125025
OAI: adding optional draft model properties for draft_rope alpha and scale (#28)
* OAI: adding optional draft model properties for draft_rope alpha and scale

* Forgot to set the properties to None
2023-12-17 19:23:45 +00:00
kingbri
528d58f841 Requirements: Fix AMD
Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-17 00:45:43 -05:00