Commit graph

1036 commits

Author SHA1 Message Date
kingbri
c1642076c2 API: Switch unload method to POST
GET and POST can be used interchangeably in this case, but adhere
to the HTTP spec.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-01-04 21:11:36 -05:00
kingbri
cd4bf99598 OAI: Fix autodoc examples for model loading
Some values weren't defaulting to correct values.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-01-04 20:53:56 -05:00
kingbri
ceb388e8a0 Start: Override ROCm env variables
These are used for supporting GPUs that are not on the "officially
supported list".

Signed-off-by: kingbri <bdashore3@proton.me>
2024-01-02 21:01:18 -05:00
Brian Dashore
c980f35e1b
Merge pull request #47 from Baysul/patch-1
Only try to install one of the EXLv2 wheels
2024-01-02 20:58:59 -05:00
Basil
2460b2f8ef
Only try to install one of the EXLv2 wheels
...depending on Python version.
2024-01-02 16:56:39 -08:00
kingbri
451042aadf Main: Don't load if model_name/loras is blank
Previously, if model_name was commented out, a load would not occur.
Add the case if model_name or loras is blank which returns None when
parsing the YAML.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-01-02 13:56:25 -05:00
kingbri
6b04463051 API: Fix CFG reporting
THe model endpoint wasn't reporting if CFG is on.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-01-02 13:54:16 -05:00
kingbri
bbd4ee54ca Model: Add fallback if negative prompt is empty
Fallback to the BOS token since an empty string won't do anything.
Ideally, an empty negative prompt should not be used, but it's not
the end of the world.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-01-02 01:46:51 -05:00
kingbri
b378773d0a Model: Add CFG support
CFG, or classifier-free guidance helps push a model in different
directions based on what the user provides.

Currently, CFG is ignored if the negative prompt is blank (it shouldn't
be used in that way anyways).

Signed-off-by: kingbri <bdashore3@proton.me>
2024-01-02 01:46:51 -05:00
kingbri
bb7a8e4614 Config: Add override argparser
Add an argparser that casts over to dictionaries of subgroups to
integrate with the config.

This argparser doesn't contain everything in the config due to complexity
issues with CLI args, but will eventually progress to parity. In addition,
it's used to override the config.yml rather than replace it.

A config arg is also provided if the user wants to fully override the
config yaml with another file path.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-01-01 14:27:12 -05:00
kingbri
7176fa66f0 Update README
Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-31 11:25:18 -05:00
kingbri
979a9d28a3 Tree: Format
Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-31 11:22:18 -05:00
kingbri
528d20ca5b Update README
Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-31 11:21:13 -05:00
kingbri
72bc30343c Model: Fix frequency penalty fallback
The appropriate branches weren't firing when frequency penalty is
0.0. Also fix repetition penalty overriding.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-31 11:21:07 -05:00
kingbri
47744fe9f7 Update README
Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-31 01:48:10 -05:00
kingbri
0dc12d82d5 Model: Add fallback for freq and presence pen
Previous behavior aliased freq pen for rep pen. Keep this behavior
when using the freq pen parameter with a legacy exllamav2 version
rather than ignoring both entirely.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-30 00:24:15 -05:00
kingbri
79a57588d5 API: Add template list endpoint
Fetches all template names that a user has in the templates directory
for chat completions.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-29 22:58:55 -05:00
kingbri
dce8c74edc API: Add clarification and cleanup autodocs
It's possible to override parts of the example JSON to give proper
examples of values.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-29 10:28:06 -05:00
kingbri
4136f19058 Config: Make the sample a drop-in solution
With the new wiki, all parameters are fully documented along with
comments in the YAML file itself. This should help new users who
pull, copy the config, and can't start the API due to subsections
being uncommented and read.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-29 01:36:21 -05:00
kingbri
ec929728d9 Model: Read scale_pos_emb from config
In newer versions of exllamav2, this value is read from the model's
config.json. This value will still default to 1.0 anyways.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-28 21:14:24 -05:00
city-unit
e70729b0c0 Update Docker
Squash commit that merges #43, #44, and #45

Create .dockerignore

Make compose marginally better

Un-scuffed the Dockerfile
2023-12-28 18:26:04 -05:00
kingbri
5dc2df68be Model: Repetition penalty range -> penalty range
All penalties can have a sustain (range) applied to them in exl2,
so clarify the parameter.

However, the default behaviors change based on if freq OR pres pen
is enabled. For the sanity of OAI users, have freq and pres pen only
apply on the output tokens when range is -1 (default).

But, repetition penalty still functions the same way where -1 means
the range is the max seq len.

Doing this prevents gibberish output when using the more modern freq
and presence penalties similar to llamacpp.

NOTE: This logic is still subject to change in the future, but I believe
it hits the happy medium for users who want defaults and users who want
to tinker around with the sampling knobs.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-28 18:16:10 -05:00
kingbri
c72d30918c Config: Default None -> Empty in comments
Empty makes more sense when talking about empty fields.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-28 00:32:29 -05:00
kingbri
f56221ff0c Tree: Format
Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-28 00:31:59 -05:00
kingbri
3622710582 API: Fix num_experts_per_token reporting
This wasn't linked to the model config. This value can be 1 if
a MoE model isn't loaded.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-28 00:31:14 -05:00
kingbri
c5bbfd97b2 Entrypoint: Load loras after model
Prevents an error if the model isn't loaded on startup.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-27 23:55:02 -05:00
kingbri
ee84d892b8 Start: Add shell script
Same as the batch file. Also edit the python script to work when
a venv is clean.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-27 23:53:14 -05:00
kingbri
ac0d6f8869 Tree: Format and cleanup start
Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-27 01:17:31 -05:00
kingbri
4d83d1aae4 Start: Switch to python script
Direct python can be used for requirements checking. Remove the ps1
script and create a venv purely in batch.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-27 00:37:53 -05:00
kingbri
a71b96a20c Main: Switch to entrypoint
Allows for other modules to access the startup function.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-27 00:34:50 -05:00
kingbri
e92ef8f5c7 OAI: Fix rep pen range alias
No need to unwrap because the Pydantic alias does that for us.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-25 15:37:11 -05:00
kingbri
7b74cb28e6 Model: Move unsupported sampler check
Overbloated the generation function.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-25 15:29:51 -05:00
kingbri
e256ff8182 Samplers: Add frequency and presence penalty
Un-alias repetition penalty from the frequency penalty parameter.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-25 15:27:32 -05:00
kingbri
442bb59f8f Tests: Remove logger class
The logger module could not be found when calling the test. Re-add
the color logging at a later time.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-25 15:20:39 -05:00
kingbri
162c13752a Requirements: Update to Flash Attention 2.4.1
Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-25 14:40:08 -05:00
kingbri
5c08316d18 Start: Switch to Write-Host
Write-Output is equal to a return statement and breaks parts of
the script.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-25 11:59:58 -05:00
kingbri
670ccac19a Start: Add option to not install wheels
Building from source is a case for many wheels, so add an option
to skip wheel upgrades/installation if the user uses the start script.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-25 11:49:56 -05:00
kingbri
09ae71aa91 OAI: Add finish to completions
OAI spec requires [DONE] to be sent over SSE to signal that a generation
is completed.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-25 11:25:38 -05:00
kingbri
cc3229c109 Scripts: Make Start.bat idiotproof
Start now creates a venv, installs the correct requirements, and
starts the API.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-24 20:50:24 -05:00
kingbri
060d422e03 Config: Resolve filepath
This maps the absolute path when loading the config file. Making
things safer when loading and finding the correct path.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-23 23:57:33 -05:00
kingbri
703a114f63 Tree: Format
Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-23 23:03:28 -05:00
kingbri
c9126c3145 Config: Isolate to a separate file
Reduce dependency of globals in main to simplify code a bit.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-23 23:02:37 -05:00
kingbri
0d2e726e82 Main: Fix import formatting
Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-23 21:33:15 -05:00
kingbri
3461f8294f Logging: Clarify preferences
Preferences are preferences, not a config.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-23 21:08:10 -05:00
kingbri
98a7b951b9 Logging: Add newlines to Prompt and Response
Makes things clearer rather than adding an extra space.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-22 23:55:22 -05:00
kingbri
80ef379721 Sampling: Add top-a support
Currently in exllamav2 dev, but will be in the next release.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-22 23:50:24 -05:00
AlpinDale
6a5bbd217c
feat: logging (#39)
* add logging

* simplify the logger

* formatting

* final touches

* fix format

* Model: Add log to metrics

Signed-off-by: kingbri <bdashore3@proton.me>

---------

Authored-by: AlpinDale <52078762+AlpinDale@users.noreply.github.com>
2023-12-23 04:33:31 +00:00
Brian Dashore
f5314fcdad
Merge pull request #37 from DocShotgun/main
Colab: Expose new config arguments
2023-12-22 12:07:52 -05:00
kingbri
71f6a586f1 Templates: Add error handling for template errors
Similar to the transformers library, add an error handler when an
exception is fired. This relays the error to the user.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-22 11:59:47 -05:00
AlpinDale
fa47f51f85
feat: workflows for formatting/linting (#35)
* add github workflows for pylint and yapf

* yapf

* docstrings for auth

* fix auth.py

* fix generators.py

* fix gen_logging.py

* fix main.py

* fix model.py

* fix templating.py

* fix utils.py

* update formatting.sh to include subdirs for pylint

* fix model_test.py

* fix wheel_test.py

* rename utils to utils_oai

* fix OAI/utils_oai.py

* fix completion.py

* fix token.py

* fix lora.py

* fix common.py

* add pylintrc and fix model.py

* finish up pylint

* fix attribute error

* main.py formatting

* add formatting batch script

* Main: Remove unnecessary global

Linter suggestion.

Signed-off-by: kingbri <bdashore3@proton.me>

* switch to ruff

* Formatting + Linting: Add ruff.toml

Signed-off-by: kingbri <bdashore3@proton.me>

* Formatting + Linting: Switch scripts to use ruff

Also remove the file and recent file change functions from both
scripts.

Signed-off-by: kingbri <bdashore3@proton.me>

* Tree: Format and lint

Signed-off-by: kingbri <bdashore3@proton.me>

* Scripts + Workflows: Format

Signed-off-by: kingbri <bdashore3@proton.me>

* Tree: Remove pylint flags

We use ruff now

Signed-off-by: kingbri <bdashore3@proton.me>

* Tree: Format

Signed-off-by: kingbri <bdashore3@proton.me>

* Formatting: Line length is 88

Use the same value as Black.

Signed-off-by: kingbri <bdashore3@proton.me>

* Tree: Format

Update to new line length rules.

Signed-off-by: kingbri <bdashore3@proton.me>

---------

Authored-by: AlpinDale <52078762+AlpinDale@users.noreply.github.com>
Co-authored-by: kingbri <bdashore3@proton.me>
2023-12-22 16:20:35 +00:00