The previous template was compatible with Jinja2 in Python, but it
was not cross-platform compatible according to HF's standards.
Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>
The props endpoint is a standard used by llamacpp APIs which returns
various properties of a model to a server. It's still recommended to
use /v1/model to get all the parameters a TabbyAPI model has.
Also include the contents of a prompt template when fetching the current
model.
Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>
* Ensure that length of positive/negative prompt + max_tokens does not exceed max_seq_len
* Ensure that total required pages for CFG request does not exceed allocated cache_size
Most software has moved to CUDA 12 and cards that aren't supported by
11.8 don't use tabby anyways.
Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>
The vision module from the ExllamaV2 backend is used in files outside
the backends contained folder. Therefore, import ExllamaV2 as an
optional dependency here.
Signed-off-by: kingbri <bdashore3@proton.me>
The strings weren't being concatenated properly. Only add the combined
text if the chat completion type is a List.
Signed-off-by: kingbri <bdashore3@proton.me>
If vision is enabled and the model doesn't support it, send an
error asking the user to reload. Also, add a method to unload the
vision tower.
Signed-off-by: kingbri <bdashore3@proton.me>
The model_type internal reference was changed to an enum for
a more extendable loading process. Return the current model type
when loading a new model.
Signed-off-by: kingbri <bdashore3@proton.me>
Previously, the flow for parsing chat completion messages and rendering
from the prompt template was disconnected between endpoints. Now, create
a common function to render and handle everything appropriately afterwards.
Signed-off-by: kingbri <bdashore3@proton.me>
Migrate the add method into the class itself. Also, a BaseModel isn't
needed here since this isn't a serialized class.
Signed-off-by: kingbri <bdashore3@proton.me>
Previously, the messages were a list of dicts. These are untyped
and don't provide strict hinting. Add types for chat completion
messages and reformat existing code.
Signed-off-by: kingbri <bdashore3@proton.me>
* More robust checks for OAI chat completion message lists on /v1/encode endpoint
* Added TODO to support other aspects of chat completions
* Fix oversight where embeddings was not defined in advance on /v1/chat/completions endpoint