Some packages such as ExllamaV2 and V3 require specific versions for
the latest features. Rather than creating repetitive functions, create
an agnostic function to check the installed package and then report
to the user to upgrade.
This is also sent to requests for loading and unloading, so keep the
error short.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
The HFModel class serves to coalesce all config files that contain
random keys which are required for model usage.
Adding this base class allows us to expand as HuggingFace randomly
changes their JSON schemas over time, reducing the brunt that backend
devs need to feel when their next model isn't supported.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
This parameter is way too confusing and does not make sense in
the modern LLM space.
Change approved by all maintainers.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
If an inference dep isn't present, force exit the application. This
occurs after all subcommands have been appropriately processed.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
Seemed out of place in the common load function. In addition, rename
the transformers utils signature which actually takes a directory
instead of a file.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
Adding a comma in the description converts the string to a tuple,
which isn't parseable by argparse's help.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
Fixes application of sampler parameters by adding a new sampler builder
interface. Also expose the generator class-wide and add wait_for_jobs.
Finally, allow inline loading to specify the backend.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
This stub fetches the add_eos_token field from the HF tokenizer config.
Ideally, this should be in the backend rather than tabby.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
Jobs should be started and immediately cleaned up when calling the
generation stream. Expose a stream_generate function and append
this to the base class since it's more idiomatic than generate_gen.
The exl2 container's generate_gen function is now internal.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
One goal is to try migrating away from kwargs and use the ModelLoadRequest
instead. However, Pydantic doesn't support async validators making
parsing of the inline config impossible due to its use of aiofiles.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
This is applied across containers. Doesn't make sense to put this method
in the backend.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
The admin takes priority over the regular user. Therefore, if a model
is loading, ignore all incoming generation requests
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
Apparently the "mirostat" parameter has been updated by frontends
to pass a number. ExllamaV2 expects a boolean, but most pass a number
anyway, so just alias mirostat_mode and mirostat together.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
kwargs is pretty ugly when figuring out which arguments to use. The
base requests falls back to defaults anyways, so pass in the params
object as is.
However, since Python's typing isn't like TypeScript where types
can be transformed, the type hinting has a possiblity of None showing
up despite there always being a value for some params.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
The base sampler request already specifies the defaults, so don't
unwrap in this way.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
Log the parameters passed into the generate gen function rather than
the generation settings to reduce complexity.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
This shouldn't even be an exposed option since changing it always
breaks inference with the model. Let the model's config.json handle
it.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>