Previously, if model_name was commented out, a load would not occur.
Add the case if model_name or loras is blank which returns None when
parsing the YAML.
Signed-off-by: kingbri <bdashore3@proton.me>
Fallback to the BOS token since an empty string won't do anything.
Ideally, an empty negative prompt should not be used, but it's not
the end of the world.
Signed-off-by: kingbri <bdashore3@proton.me>
CFG, or classifier-free guidance helps push a model in different
directions based on what the user provides.
Currently, CFG is ignored if the negative prompt is blank (it shouldn't
be used in that way anyways).
Signed-off-by: kingbri <bdashore3@proton.me>
Add an argparser that casts over to dictionaries of subgroups to
integrate with the config.
This argparser doesn't contain everything in the config due to complexity
issues with CLI args, but will eventually progress to parity. In addition,
it's used to override the config.yml rather than replace it.
A config arg is also provided if the user wants to fully override the
config yaml with another file path.
Signed-off-by: kingbri <bdashore3@proton.me>
The appropriate branches weren't firing when frequency penalty is
0.0. Also fix repetition penalty overriding.
Signed-off-by: kingbri <bdashore3@proton.me>
Previous behavior aliased freq pen for rep pen. Keep this behavior
when using the freq pen parameter with a legacy exllamav2 version
rather than ignoring both entirely.
Signed-off-by: kingbri <bdashore3@proton.me>
With the new wiki, all parameters are fully documented along with
comments in the YAML file itself. This should help new users who
pull, copy the config, and can't start the API due to subsections
being uncommented and read.
Signed-off-by: kingbri <bdashore3@proton.me>
In newer versions of exllamav2, this value is read from the model's
config.json. This value will still default to 1.0 anyways.
Signed-off-by: kingbri <bdashore3@proton.me>
All penalties can have a sustain (range) applied to them in exl2,
so clarify the parameter.
However, the default behaviors change based on if freq OR pres pen
is enabled. For the sanity of OAI users, have freq and pres pen only
apply on the output tokens when range is -1 (default).
But, repetition penalty still functions the same way where -1 means
the range is the max seq len.
Doing this prevents gibberish output when using the more modern freq
and presence penalties similar to llamacpp.
NOTE: This logic is still subject to change in the future, but I believe
it hits the happy medium for users who want defaults and users who want
to tinker around with the sampling knobs.
Signed-off-by: kingbri <bdashore3@proton.me>
Direct python can be used for requirements checking. Remove the ps1
script and create a venv purely in batch.
Signed-off-by: kingbri <bdashore3@proton.me>
Building from source is a case for many wheels, so add an option
to skip wheel upgrades/installation if the user uses the start script.
Signed-off-by: kingbri <bdashore3@proton.me>
This maps the absolute path when loading the config file. Making
things safer when loading and finding the correct path.
Signed-off-by: kingbri <bdashore3@proton.me>