Use 2.3.4 from tgw. However, keep the 2.3.3 wheels in requirements
if the newer wheels don't work for now.
Signed-off-by: kingbri <bdashore3@proton.me>
Documented in previous commits. Also make sure that for version checking,
check the value of kwargs instead of if the key is present since requests
pass default values.
Signed-off-by: kingbri <bdashore3@proton.me>
A simple batch script to activate a venv and start TabbyAPI. This
can be used with nssm in Windows for a systemd-like background service.
Signed-off-by: kingbri <bdashore3@proton.me>
This helps clarify things when users are configuring for the first
time. For example, some users were putting the model name in the
"model" block instead of the "model_name" field.
Signed-off-by: kingbri <bdashore3@proton.me>
Models can be loaded with a child object called "draft" in the POST
request. Again, models need to be located within the draft model dir
to get loaded.
Signed-off-by: kingbri <bdashore3@proton.me>
Model: Add extra information to print and fix the divide by zero error.
Auth: Fix validation of API and admin keys to look for the entire key.
References #7 and #6
Signed-off-by: kingbri <bdashore3@proton.me>
Speculative decoding makes use of draft models that ingest the prompt
before forwarding it to the main model.
Add options in the config to support this. API options will occur
in a different commit.
Signed-off-by: kingbri <bdashore3@proton.me>
Stop conditions was None, causing model to error out when trying to
add the EOS token to a None value.
Authentication failed when Bearer contained an empty string. To fix
this, add a condition which checks array length.
Signed-off-by: kingbri <bdashore3@proton.me>
Add the EOS token into stop strings after checking kwargs. If
ban_eos_token is on, don't add the EOS token in for extra measure.
Signed-off-by: kingbri <bdashore3@proton.me>
Responses were not being properly sent as JSON. Only run pydantic's
JSON function on stream responses. FastAPI does the rest with static
responses.
Signed-off-by: kingbri <bdashore3@proton.me>
Add safe fallbacks if any part of the config tree doesn't exist. This
prevents random internal server errors from showing up.
Signed-off-by: kingbri <bdashore3@proton.me>
Fastchat requires a lot of dependencies such as transformers, peft,
and accelerate which are heavy. This is not useful unless a user
wants to add a shim for the chat completion endpoint.
Instead, try importing fastchat and notify the console of the error.
Signed-off-by: kingbri <bdashore3@proton.me>