Commit graph

26 commits

Author SHA1 Message Date
DocShotgun
a1df22668b API: Add min_tokens
Bans the EOS token until the generation reaches a minimum length. This will not prevent the model from otherwise ending the generation early by outputting other stop conditions.
2024-05-10 12:30:17 -07:00
kingbri
ab526f7278 Revert "API: Remove unncessary Optional signatures"
This reverts commit 7556dcf134.

The Optionals allowed requests to send "null" in the body for optional
parameters which should be allowed.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-02 21:23:48 -04:00
kingbri
7556dcf134 API: Remove unncessary Optional signatures
Optional isn't necessary if the function signature has a default
value.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-01 00:04:52 -04:00
kingbri
6114bfd221 API: Fix banned_tokens string when empty
The string should not be parsed and any non-string elements should
be removed as well.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-28 12:46:28 -04:00
kingbri
6f9da97114 API: Add banned_tokens
Appends the banned tokens to the generation. This is equivalent of
setting logit bias to -100 on a specific set of tokens.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-28 11:06:09 -04:00
kingbri
9f93505bc1 OAI: Add skip_special_tokens parameter
Allows the ability to decode special tokens if the user wishes.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-21 00:37:46 -04:00
kingbri
d716527b92 Sampling: Add additive param to overrides
Additive is used to add collections together. Currently, it's used
for lists, but it can be used for dictionaries in the future.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-31 01:10:55 -04:00
kingbri
09a4c79847 Model: Auto-scale max_tokens by default
If max_tokens is None, it automatically scales to fill up the context.
This does not mean the generation will fill up that context since
EOS stops also exist.

Originally suggested by #86

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-18 22:54:59 -04:00
kingbri
efc01d947b API + Model: Add speculative ngram decoding
Speculative ngram decoding is like speculative decoding without the
draft model. It's not as useful because it only decodes on predictable
sequences, but it depends on the usecase.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-13 23:32:11 -04:00
kingbri
5a2de30066 Tree: Update to cleanup globals
Use the module singleton pattern to share global state. This can also
be a modified version of the Global Object Pattern. The main reason
this pattern is used is for ease of use when handling global state
rather than adding extra dependencies for a DI parameter.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-12 23:59:30 -04:00
kingbri
228c227c1e Logging: Switch to loguru
Loguru is a flexible logger that allows for easier hooking and imports
into Rich with no problems. Also makes progress bars stick to the
bottom of the terminal window.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-08 01:00:48 -05:00
kingbri
f6d749c771 Model: Add EBNF grammar support
Using the Outlines library, add support to supply EBNF strings and
pass them to the library for parsing.

From there, a wrapper is created and a filter is passed to generation.

Replace with an in-house solution at some point that's more flexible.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-24 23:40:11 -05:00
kingbri
57b3d69949 API + Model: Add support for JSON schema constraints
Add the ability to constrain the return value of a model to be JSON.
Built using the JSON schema standard to define the properties of what
the model should return.

This feature should be more accurate than using GBNF/EBNF to yield
the same results due to the use of lmformatenforcer.

GBNF/EBNF will be added in a different commit/branch.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-24 23:40:11 -05:00
kingbri
7def32e4de Model: Fix logit bias handling
If the token doesn't exist, gracefully warn instead of erroring out.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-18 18:30:58 -05:00
kingbri
a79c42ff4c Sampling: Make validators simpler
Injecting into Pydantic fields caused issues with serialization for
documentation rendering. Rather than reinvent the wheel again,
switch to a chain of if statements for now. This may change in the
future if subclasses from the base sampler request need to be
validated as well.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-11 15:28:43 -05:00
kingbri
7e730e3507 Sampling: Add universal validation system
Rather than maintaining yet another function to validate sampler
ranges/values, embed them in fields which allows for less
maintainence in the future.

Also add validation for existing samplers that can corrupt
the sampling stack if set improperly.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-10 14:59:23 -05:00
kingbri
0af6a38af3 Model: Add logprobs support
Returns token offsets, selected tokens, probabilities of tokens
post-sampling, and normalized probability of selecting a token
pre-sampling (for efficiency purposes).

Only for text completions. Chat completions in a later commit.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-08 21:26:53 -05:00
AliCat
bb48f77ca1
Neutralize samplers (#59)
* Update sample_preset.yml

Neutralized the samplers.

* Sampling: Fix dynatemp defaults

Default max temp and min temp is 1.0

* Sampling: Fix TFS defaults

Default is 1.0

---------

Co-authored-by: AliCat <86847834+alicat22@users.noreply.github.com>
Co-authored-by: kingbri <bdashore3@proton.me>
2024-02-08 00:23:09 -05:00
erinmaybe
fa2acb2828
Adds aliases for min_temp and max_temp (#58)
* Adds aliases for min_temp and max_temp

* Sampling: Add dynatemp_exponent alias
2024-02-03 21:51:29 -05:00
kingbri
b827bcbb44 Sampling: Cleanup and update
Cleanup how overrides are handled, class naming, and adopt exllamav2's
model class to enforce latest stable version methods rather than
adding multiple backwards compatability checks.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-02 23:36:17 -05:00
kingbri
634d299fd9 Sampling: Fix smoothing factor default fallback
default_factory, not default_factor

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-02 23:35:15 -05:00
Alexander Abushady
d7c18855e7
added quadratic sampling (#56)
* added quadratic sampling

* Update sample_preset.yml

* oops missed a spot

* Sampling: Fix smoothing factor semantics
2024-02-02 22:12:59 -05:00
kingbri
4a7b8b1b7a Samplers: Add dynamic temperature
Does not work if max_temp is less than or equal to min_temp. Sampler
validation will have to be refactored in the future, so the dynamic
temperature check will also be changed.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-01-31 01:20:59 -05:00
kingbri
fc4570220c API + Model: Add new parameters and clean up documentation
The example JSON fields were changed because of the new sampler
default strategy. Fix these by manually changing the values.

Also add support for fasttensors and expose generate_window to
the API. It's recommended to not adjust generate_window as it's
dynamically scaled based on max_seq_len by default.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-01-25 00:15:40 -05:00
kingbri
b14c5443fd API: Add sampler override switching
Allow users to switch the currently overriden samplers via the API
so a restart isn't required to switch the overrides.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-01-25 00:15:40 -05:00
kingbri
6c30f24c83 Tree: Unify sampler parameters and add override support
Unify API sampler params into a superclass which should make them
easier to manage and inherit generic functions from.

Not all frontends expose all sampling parameters due to connections
with OAI (that handles sampling themselves with the exception of
a few sliders).

Add the ability for the user to customize fallback parameters from
server-side.

In addition, parameters can be forced to a certain value server-side
in case the repo automatically sets other sampler values in the
background that the user doesn't want.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-01-25 00:15:40 -05:00