The HFModel class serves to coalesce all config files that contain
random keys which are required for model usage.
Adding this base class allows us to expand as HuggingFace randomly
changes their JSON schemas over time, reducing the brunt that backend
devs need to feel when their next model isn't supported.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
When fetching special tokens from the model, don't factor in the
add_bos_token and ban_eos_token parameters as switches.
In addition, change the internal handling of add_bos_token to an optional
boolean. This allows us to fallback to the model when selecting whether
or not to add the BOS token, especially for chat completions.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
Jobs should be started and immediately cleaned up when calling the
generation stream. Expose a stream_generate function and append
this to the base class since it's more idiomatic than generate_gen.
The exl2 container's generate_gen function is now internal.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
The model card is a unified structure for sharing model params.
Rather than kwargs, use this instead.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>