The vision module from the ExllamaV2 backend is used in files outside
the backends contained folder. Therefore, import ExllamaV2 as an
optional dependency here.
Signed-off-by: kingbri <bdashore3@proton.me>
The strings weren't being concatenated properly. Only add the combined
text if the chat completion type is a List.
Signed-off-by: kingbri <bdashore3@proton.me>
If vision is enabled and the model doesn't support it, send an
error asking the user to reload. Also, add a method to unload the
vision tower.
Signed-off-by: kingbri <bdashore3@proton.me>
The model_type internal reference was changed to an enum for
a more extendable loading process. Return the current model type
when loading a new model.
Signed-off-by: kingbri <bdashore3@proton.me>
Previously, the flow for parsing chat completion messages and rendering
from the prompt template was disconnected between endpoints. Now, create
a common function to render and handle everything appropriately afterwards.
Signed-off-by: kingbri <bdashore3@proton.me>
Migrate the add method into the class itself. Also, a BaseModel isn't
needed here since this isn't a serialized class.
Signed-off-by: kingbri <bdashore3@proton.me>
Previously, the messages were a list of dicts. These are untyped
and don't provide strict hinting. Add types for chat completion
messages and reformat existing code.
Signed-off-by: kingbri <bdashore3@proton.me>
* More robust checks for OAI chat completion message lists on /v1/encode endpoint
* Added TODO to support other aspects of chat completions
* Fix oversight where embeddings was not defined in advance on /v1/chat/completions endpoint
* Support image_url inputs containing URLs or base64 strings following OAI vision spec
* Use async lru cache for image embeddings
* Add generic wrapper class for multimodal embeddings
If an API key sends a dummy model, it shouldn't error as the server
is catering to clients that expect specific OAI model names. This
is a problem with inline model loading since these names would error
by default. Therefore, add an exception if the provided name is in the
dummy model names (which also doubles as inline strict exceptions).
However, the dummy model names weren't configurable, so add a new
option to specify exception names, otherwise the default is gpt-3.5-turbo.
Signed-off-by: kingbri <bdashore3@proton.me>
The admin key check was running even if inline loading was disabled.
Fix this bug, but also preserve the existing permission system when
inline loading is enabled.
Signed-off-by: kingbri <bdashore3@proton.me>
Adds the ability to load vision parts of text + image models. Requires
an explicit flag in config because there isn't a way to automatically
determine whether the vision tower should be used.
Signed-off-by: kingbri <bdashore3@proton.me>
* improve validation
* remove to_gen_params functions
* update changes for all endpoint types
* OAI: Fix calls to generation
Chat completion and completion need to have prompt split out before
pushing to the backend.
Signed-off-by: kingbri <bdashore3@proton.me>
* Sampling: Convert Top-K values of -1 to 0
Some OAI implementations use -1 as disabled instead of 0. Therefore,
add a coalesce case.
Signed-off-by: kingbri <bdashore3@proton.me>
* Sampling: Format and space out
Make the code more readable.
Signed-off-by: kingbri <bdashore3@proton.me>
* Sampling: Fix mirostat
Field items are nested in data within a Pydantic FieldInfo
Signed-off-by: kingbri <bdashore3@proton.me>
* Sampling: Format
Signed-off-by: kingbri <bdashore3@proton.me>
* Sampling: Fix banned_tokens and allowed_tokens conversion
If the provided string has whitespace, trim it before splitting.
Signed-off-by: kingbri <bdashore3@proton.me>
* Sampling: Add helpful log to dry_sequence_breakers
Let the user know if the sequence errors out.
Signed-off-by: kingbri <bdashore3@proton.me>
* Sampling: Apply validators in right order
Validators need to be applied in order from top to bottom, this is why
the after validator was not being applied properly.
Set the model to validate default params for sampler override purposes.
This can be turned off if there are unclear errors.
Signed-off-by: kingbri <bdashore3@proton.me>
* Endpoints: Format
Cleanup and semantically fix field validators
Signed-off-by: kingbri <bdashore3@proton.me>
* Kobold: Update validators and fix parameter application
Validators on parent fields cannot see child fields. Therefore,
validate using the child fields instead and alter the parent field
data from there.
Also fix badwordsids casting.
Signed-off-by: kingbri <bdashore3@proton.me>
* Sampling: Remove validate defaults and fix mirostat
If a user sets an override to a non-default value, that's their
own fault.
Run validator on the actual mirostat_mode parameter rather than
the alternate mirostat parameter.
Signed-off-by: kingbri <bdashore3@proton.me>
* Kobold: Rework badwordsids
Currently, this serves to ban the EOS token. All other functionality
was legacy, so remove it.
Signed-off-by: kingbri <bdashore3@proton.me>
* Model: Remove HuggingfaceConfig
This was only necessary for badwordsids. All other fields are handled
by exl2. Keep the class as a stub if it's needed again.
Signed-off-by: kingbri <bdashore3@proton.me>
* Kobold: Bump kcpp impersonation
TabbyAPI supports XTC now.
Signed-off-by: kingbri <bdashore3@proton.me>
* Sampling: Change alias to validation_alias
Reduces the probability for errors and makes the class consistent.
Signed-off-by: kingbri <bdashore3@proton.me>
* OAI: Use constraints for validation
Instead of adding a model_validator, use greater than or equal to
constraints provided by Pydantic.
Signed-off-by: kingbri <bdashore3@proton.me>
* Tree: Lint
Signed-off-by: kingbri <bdashore3@proton.me>
---------
Co-authored-by: SecretiveShell <84923604+SecretiveShell@users.noreply.github.com>
Co-authored-by: kingbri <bdashore3@proton.me>
* Model: Fix inline loading and draft key
There was a lack of foresight between the new config.yml and how
it was structured. The "draft" key became "draft_model" without updating
both the API request and inline loading keys.
For the API requests, still support "draft" as legacy, but the "draft_model"
key is preferred.
Signed-off-by: kingbri <bdashore3@proton.me>
* OAI: Add draft model dir to inline load
Was not pushed before and caused errors of the kwargs being None.
Signed-off-by: kingbri <bdashore3@proton.me>
* Model: Fix draft args application
Draft model args weren't applying since there was a reset due to how
the old override behavior worked.
Signed-off-by: kingbri <bdashore3@proton.me>
* OAI: Change embedding model load params
Use embedding_model_name to be inline with the config.
Signed-off-by: kingbri <bdashore3@proton.me>
* API: Fix parameter for draft model load
Alias name to draft_model_name.
Signed-off-by: kingbri <bdashore3@proton.me>
* API: Fix parameter for template switch
Add prompt_template_name to be more descriptive.
Signed-off-by: kingbri <bdashore3@proton.me>
* API: Fix parameter for model load
Alias name to model_name for config parity.
Signed-off-by: kingbri <bdashore3@proton.me>
* API: Add alias documentation
Signed-off-by: kingbri <bdashore3@proton.me>
---------
Signed-off-by: kingbri <bdashore3@proton.me>
* fix config file loader
* prune nonetype values from config dict
fixes default values not initialising properly
* Utils: Shrink None removal function
It is more concise to use a list and dict collection if necessary
rather than iterating through and checking each value. Tested and
works with Tabby's cases.
Signed-off-by: kingbri <bdashore3@proton.me>
---------
Signed-off-by: kingbri <bdashore3@proton.me>
Co-authored-by: kingbri <bdashore3@proton.me>
Make it so any message role can be parsed from a list. Not really
sure why this is the case because system and assistant shouldn't be
sending data other than text, but it also doesn't make much sense
to be extremely strict with roles either.
Signed-off-by: kingbri <bdashore3@proton.me>