Commit graph

58 commits

Author SHA1 Message Date
kingbri
6379081dd8 Sampling: Make add_bos_token override concise
Also set the default to None so text completions follows the same
pattern.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-10 19:07:35 -04:00
kingbri
42346c6b39 Sampling: Remove skip_special_tokens
This parameter is way too confusing and does not make sense in
the modern LLM space.

Change approved by all maintainers.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-09 22:11:33 -04:00
kingbri
1afc9b983e Model: Remove generate_window
Not required since we error with exceeding the max_seq_len

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-04-16 12:59:02 -04:00
kingbri
5697204e47 Merge branch 'main' into model-rewrite 2025-04-16 02:15:46 -04:00
kingbri
6bb5f8f599 Sampling: Rewrite mirostat_mode parameter
Apparently the "mirostat" parameter has been updated by frontends
to pass a number. ExllamaV2 expects a boolean, but most pass a number
anyway, so just alias mirostat_mode and mirostat together.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-04-16 02:13:55 -04:00
kingbri
3084ef9fa1 Model + API: Migrate to use BaseSamplerParams
kwargs is pretty ugly when figuring out which arguments to use. The
base requests falls back to defaults anyways, so pass in the params
object as is.

However, since Python's typing isn't like TypeScript where types
can be transformed, the type hinting has a possiblity of None showing
up despite there always being a value for some params.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-04-16 00:50:05 -04:00
kingbri
dcb36e9ab2 Model: Remove extra unwraps
The base sampler request already specifies the defaults, so don't
unwrap in this way.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-04-15 23:38:46 -04:00
kingbri
11ed3cf5ee Model: Cleanup logging and remove extraneous declarations
Log the parameters passed into the generate gen function rather than
the generation settings to reduce complexity.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-04-15 23:31:12 -04:00
kingbri
c23e406f2d Sampling: Add max_completion_tokens
Conforms with OAI's updated spec

Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>
2024-12-13 01:02:37 -05:00
TerminalMan
7d18d2e2ca
Refactor the sampling class (#199)
* improve validation

* remove to_gen_params functions

* update changes for all endpoint types

* OAI: Fix calls to generation

Chat completion and completion need to have prompt split out before
pushing to the backend.

Signed-off-by: kingbri <bdashore3@proton.me>

* Sampling: Convert Top-K values of -1 to 0

Some OAI implementations use -1 as disabled instead of 0. Therefore,
add a coalesce case.

Signed-off-by: kingbri <bdashore3@proton.me>

* Sampling: Format and space out

Make the code more readable.

Signed-off-by: kingbri <bdashore3@proton.me>

* Sampling: Fix mirostat

Field items are nested in data within a Pydantic FieldInfo

Signed-off-by: kingbri <bdashore3@proton.me>

* Sampling: Format

Signed-off-by: kingbri <bdashore3@proton.me>

* Sampling: Fix banned_tokens and allowed_tokens conversion

If the provided string has whitespace, trim it before splitting.

Signed-off-by: kingbri <bdashore3@proton.me>

* Sampling: Add helpful log to dry_sequence_breakers

Let the user know if the sequence errors out.

Signed-off-by: kingbri <bdashore3@proton.me>

* Sampling: Apply validators in right order

Validators need to be applied in order from top to bottom, this is why
the after validator was not being applied properly.

Set the model to validate default params for sampler override purposes.
This can be turned off if there are unclear errors.

Signed-off-by: kingbri <bdashore3@proton.me>

* Endpoints: Format

Cleanup and semantically fix field validators

Signed-off-by: kingbri <bdashore3@proton.me>

* Kobold: Update validators and fix parameter application

Validators on parent fields cannot see child fields. Therefore,
validate using the child fields instead and alter the parent field
data from there.

Also fix badwordsids casting.

Signed-off-by: kingbri <bdashore3@proton.me>

* Sampling: Remove validate defaults and fix mirostat

If a user sets an override to a non-default value, that's their
own fault.

Run validator on the actual mirostat_mode parameter rather than
the alternate mirostat parameter.

Signed-off-by: kingbri <bdashore3@proton.me>

* Kobold: Rework badwordsids

Currently, this serves to ban the EOS token. All other functionality
was legacy, so remove it.

Signed-off-by: kingbri <bdashore3@proton.me>

* Model: Remove HuggingfaceConfig

This was only necessary for badwordsids. All other fields are handled
by exl2. Keep the class as a stub if it's needed again.

Signed-off-by: kingbri <bdashore3@proton.me>

* Kobold: Bump kcpp impersonation

TabbyAPI supports XTC now.

Signed-off-by: kingbri <bdashore3@proton.me>

* Sampling: Change alias to validation_alias

Reduces the probability for errors and makes the class consistent.

Signed-off-by: kingbri <bdashore3@proton.me>

* OAI: Use constraints for validation

Instead of adding a model_validator, use greater than or equal to
constraints provided by Pydantic.

Signed-off-by: kingbri <bdashore3@proton.me>

* Tree: Lint

Signed-off-by: kingbri <bdashore3@proton.me>

---------

Co-authored-by: SecretiveShell <84923604+SecretiveShell@users.noreply.github.com>
Co-authored-by: kingbri <bdashore3@proton.me>
2024-10-27 11:43:41 -04:00
kingbri
56ce82ef77 Sampling: Add XTC support
Matches with upstream.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-24 18:10:52 -04:00
TerminalMan
f4791e7ed9
Cleanup config file loader (#208)
* fix config file loader

* prune nonetype values from config dict

fixes default values not initialising properly

* Utils: Shrink None removal function

It is more concise to use a list and dict collection if necessary
rather than iterating through and checking each value. Tested and
works with Tabby's cases.

Signed-off-by: kingbri <bdashore3@proton.me>

---------

Signed-off-by: kingbri <bdashore3@proton.me>
Co-authored-by: kingbri <bdashore3@proton.me>
2024-09-23 21:42:01 -04:00
kingbri
24ea85b3c5 Tree: Use safe loader for YAML
Loaders that read use a safe type while loaders that write use both
round-trip and safe options.

Also don't create module-level parsers where they're not needed.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-18 19:26:51 -04:00
TerminalMan
6c7542de9f migrate all yaml loaders to ruamel.yaml 2024-09-18 11:33:15 +01:00
kingbri
2c3bc71afa Tree: Switch to asynchronous file handling
Using aiofiles, there's no longer a possiblity of blocking file operations
that can hang up the event loop. In addition, partially migrate
classes to use asynchronous init instead of the normal python magic method.

The only exception is config, since that's handled in the synchonous
init before the event loop starts.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-10 16:45:14 -04:00
kingbri
dffceab777 Sampling: Link dry_range
Was not linked in the gen params dict.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-08 01:55:52 -04:00
kingbri
9c4a0e650f Sampling: Fix override for DRY sequence breakers
The common type should be an array of strings.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-07 21:38:50 -04:00
kingbri
4f5ca7a4c7 Sampling: Update overrides and params
Re-order to make more sense.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-07 12:48:59 -04:00
kingbri
ae37f3f332 Sampling: Update DRY
Switch to new parameters and remove dry_max_ngram as that's not supposed
to be changed.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-07 12:39:14 -04:00
kingbri
05c3f1194f Sampling: Add rudimentary DRY support
Adds DRY support based on the current exl2 dev API. Only change for
optimization is dry_max_ngram instead of using a closed range.

Currently, DRY range is aliased to dry_max_ngram.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-07 00:48:42 -04:00
kingbri
21712578cf API: Add allowed_tokens support
This is the opposite of banned tokens. Exllama specific implementation
of #181.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-29 21:44:42 -04:00
Ben Gitter
70b9fc95de
[WIP] OpenAI Tools Support/Function calling (#154)
* returning stop str if exists from gen

* added chat template for firefunctionv2

* pulling tool vars from template

* adding parsing for tool inputs/outputs

* passing tool data from endpoint to chat template, adding tool_start to the stop list

* loosened typing on the response tool call, leaning more on the user supplying a quality schema if they want a particular format

* non streaming generation prototype

* cleaning template

* Continued work with type, ingestion into template, and chat template for fire func

* Correction - streaming toolcall comes back as delta obj not inside chatcomprespchoice per chat_completion_chunk.py inside OAI lib.

* Ruff Formating

* Moved stop string and tool updates out of prompt creation func

Updated tool pydantic to match OAI

Support for streaming

Updated generate tool calls to use flag within chat_template and insert tool reminder

* Llama 3.1 chat templates

Updated fire func template

* renamed llama3.1 to chatml_with_headers..

* update name of template

* Support for calling a tool start token rather than the string.

Simplified tool_params

Warning when gen_settings are being overidden becuase user set temp to 0

Corrected schema and tools to correct types for function args. Str for some reason

* draft groq tool use model template

* changed headers to vars for readablity (but mostly because some models are weird about newlines after headers, so this is an easier way to change globally)

* Clean up comments and code in chat comp

* Post processed tool call to meet OAI spec rather than forcing model to write json in a string in the middle of the call.

* changes example back to args as json rather than string of json

* Standardize chat templates to each other

* cleaning/rewording

* stop elements can also be ints (tokens)

* Cleaning/formatting

* added special tokens for tools and tool_response as specified in description

* Cleaning

* removing aux templates - going to live in llm-promp-templates repo instead

* Tree: Format

Signed-off-by: kingbri <bdashore3@proton.me>

* Chat Completions: Don't include internal tool variables in OpenAPI

Use SkipJsonSchema to supress inclusion with the OpenAPI JSON. The
location of these variables may need to be changed in the future.

Signed-off-by: kingbri <bdashore3@proton.me>

* Templates: Deserialize metadata on template load

Since we're only looking for specific template variables that are
static in the template, it makes more sense to render when the template
is initialized.

Signed-off-by: kingbri <bdashore3@proton.me>

* Tools: Fix comments

Adhere to the format style of comments in the rest of the project.

Signed-off-by: kingbri <bdashore3@proton.me>

---------

Co-authored-by: Ben Gitter <gitterbd@gmail.com>
Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-17 00:16:25 -04:00
kingbri
e8fc13a1f6 Tree: Format
Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 18:33:04 -04:00
kingbri
ea80b62e30 Sampling: Reorder aliased params and add kobold aliases
Also add dynatemp range which is an alternative way of calculating
min and max temp.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 18:32:33 -04:00
kingbri
545e26608f Kobold: Move params to aliases
Some of the parameters the API provides are aliases for their OAI
equivalents. It makes more sense to move them to the common file.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 16:46:54 -04:00
kingbri
9fbbc5afca Tree: Swap from map to list comprehensions
List comprehensions are the more "pythonic" way to approach mapping
values to a list. They're also more flexible across different collection
types rather than the inbuilt map method. It's best to keep one convention
rather than splitting down two.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-25 21:16:14 -04:00
kingbri
b9fd8555fe Sampling: Copy over iterable overrides
If an override was iterable, any modifications to the returned value
would alter the reference to the global storage dict.

Therefore, copy the structure if it's an iterable so any modification
won't alter the original override. Also apply this for the function
that checks for forced overrides.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-17 21:38:28 -04:00
DocShotgun
abe411c6fb API + Model: Add support for regex pattern constraints
Adds the ability to constrain generation via regex pattern using lm-format-enforcer.
2024-05-12 19:10:43 -07:00
DocShotgun
9463ecfa40 Samplers: Minor fixes for sampler override
* Add missing settings to sample_preset.yml
* Fix override for skip_special_tokens
2024-05-12 00:31:31 -07:00
kingbri
c8ec742be9 Samplers: Expose skew sampling
Skew is an extra unused sampler in ExllamaV2. Add it in for coverage.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-12 01:41:01 -04:00
kingbri
6f4012d20d API: Add preset listing for sampler overrides
Querying the overrides list endpoint now returns the selected preset
and a list of presets to use.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-12 01:34:51 -04:00
DocShotgun
c0b631ba92 API: Add banned_strings
From exllamav2: List of strings that the generator will refuse to output. As soon as a partial match happens, a checkpoint is saved that the generator can rewind to if need be. Subsequent tokens are then held until the full string is resolved (match or no match) and either emitted or discarded, accordingly.
2024-05-10 13:53:55 -07:00
DocShotgun
a1df22668b API: Add min_tokens
Bans the EOS token until the generation reaches a minimum length. This will not prevent the model from otherwise ending the generation early by outputting other stop conditions.
2024-05-10 12:30:17 -07:00
kingbri
ab526f7278 Revert "API: Remove unncessary Optional signatures"
This reverts commit 7556dcf134.

The Optionals allowed requests to send "null" in the body for optional
parameters which should be allowed.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-02 21:23:48 -04:00
kingbri
7556dcf134 API: Remove unncessary Optional signatures
Optional isn't necessary if the function signature has a default
value.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-01 00:04:52 -04:00
kingbri
6114bfd221 API: Fix banned_tokens string when empty
The string should not be parsed and any non-string elements should
be removed as well.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-28 12:46:28 -04:00
kingbri
6f9da97114 API: Add banned_tokens
Appends the banned tokens to the generation. This is equivalent of
setting logit bias to -100 on a specific set of tokens.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-28 11:06:09 -04:00
kingbri
9f93505bc1 OAI: Add skip_special_tokens parameter
Allows the ability to decode special tokens if the user wishes.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-21 00:37:46 -04:00
kingbri
d716527b92 Sampling: Add additive param to overrides
Additive is used to add collections together. Currently, it's used
for lists, but it can be used for dictionaries in the future.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-31 01:10:55 -04:00
kingbri
09a4c79847 Model: Auto-scale max_tokens by default
If max_tokens is None, it automatically scales to fill up the context.
This does not mean the generation will fill up that context since
EOS stops also exist.

Originally suggested by #86

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-18 22:54:59 -04:00
kingbri
efc01d947b API + Model: Add speculative ngram decoding
Speculative ngram decoding is like speculative decoding without the
draft model. It's not as useful because it only decodes on predictable
sequences, but it depends on the usecase.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-13 23:32:11 -04:00
kingbri
5a2de30066 Tree: Update to cleanup globals
Use the module singleton pattern to share global state. This can also
be a modified version of the Global Object Pattern. The main reason
this pattern is used is for ease of use when handling global state
rather than adding extra dependencies for a DI parameter.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-12 23:59:30 -04:00
kingbri
228c227c1e Logging: Switch to loguru
Loguru is a flexible logger that allows for easier hooking and imports
into Rich with no problems. Also makes progress bars stick to the
bottom of the terminal window.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-08 01:00:48 -05:00
kingbri
f6d749c771 Model: Add EBNF grammar support
Using the Outlines library, add support to supply EBNF strings and
pass them to the library for parsing.

From there, a wrapper is created and a filter is passed to generation.

Replace with an in-house solution at some point that's more flexible.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-24 23:40:11 -05:00
kingbri
57b3d69949 API + Model: Add support for JSON schema constraints
Add the ability to constrain the return value of a model to be JSON.
Built using the JSON schema standard to define the properties of what
the model should return.

This feature should be more accurate than using GBNF/EBNF to yield
the same results due to the use of lmformatenforcer.

GBNF/EBNF will be added in a different commit/branch.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-24 23:40:11 -05:00
kingbri
7def32e4de Model: Fix logit bias handling
If the token doesn't exist, gracefully warn instead of erroring out.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-18 18:30:58 -05:00
kingbri
a79c42ff4c Sampling: Make validators simpler
Injecting into Pydantic fields caused issues with serialization for
documentation rendering. Rather than reinvent the wheel again,
switch to a chain of if statements for now. This may change in the
future if subclasses from the base sampler request need to be
validated as well.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-11 15:28:43 -05:00
kingbri
7e730e3507 Sampling: Add universal validation system
Rather than maintaining yet another function to validate sampler
ranges/values, embed them in fields which allows for less
maintainence in the future.

Also add validation for existing samplers that can corrupt
the sampling stack if set improperly.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-10 14:59:23 -05:00
kingbri
0af6a38af3 Model: Add logprobs support
Returns token offsets, selected tokens, probabilities of tokens
post-sampling, and normalized probability of selecting a token
pre-sampling (for efficiency purposes).

Only for text completions. Chat completions in a later commit.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-08 21:26:53 -05:00
AliCat
bb48f77ca1
Neutralize samplers (#59)
* Update sample_preset.yml

Neutralized the samplers.

* Sampling: Fix dynatemp defaults

Default max temp and min temp is 1.0

* Sampling: Fix TFS defaults

Default is 1.0

---------

Co-authored-by: AliCat <86847834+alicat22@users.noreply.github.com>
Co-authored-by: kingbri <bdashore3@proton.me>
2024-02-08 00:23:09 -05:00