* enables automatic calculation of NTK-aware alpha scaling for models if the rope_alpha arg is not passed in the config, using the same formula used for draft models
sse_starlette kept firing a ping response if it was taking too long
to set an event. Rather than using a hacky workaround, switch to
FastAPI's inbuilt streaming response and construct SSE requests with
a utility function.
This helps the API become more robust and removes an extra requirement.
Signed-off-by: kingbri <bdashore3@proton.me>
Some APIs require an OAI model to be sent against the models endpoint.
Fix this by adding a GPT 3.5 turbo entry as first in the list to cover
as many APIs as possible.
Signed-off-by: kingbri <bdashore3@proton.me>
Chat completions require a finish reason to be provided in the OAI
spec once the streaming is completed. This is different from a non-
streaming chat completion response.
Also fix some errors that were raised from the endpoint.
References #15
Signed-off-by: kingbri <bdashore3@proton.me>
If the generator errors, there's no proper handling to send an error
packet and close the connection.
This is especially important for unloading models if the load fails
at any stage to reclaim a user's VRAM. Raising an exception caused
the model_container object to lock and not get freed by the GC.
This made sense to propegate SSE errors across all generator functions
rather than relying on abort signals.
Signed-off-by: kingbri <bdashore3@proton.me>
The default encoding method when opening files on Windows is cp1252
which doesn't support all unicode and can cause issues.
Signed-off-by: kingbri <bdashore3@proton.me>
This reverts commit cad144126f.
Change this parameter back to repetition_decay. This is different than
rep_pen_slope used in other backends such as kobold and NAI.
Still keep the fallback condition though.
Signed-off-by: kingbri <bdashore3@proton.me>
Unlike other backends, tabby attempts to generate even if the context
is greater than the max sequence length via truncation of the given
context.
Rather than artifically erroring out, give a warning that outputted
console metrics are going to be incorrect and to make sure that
context <= max_seq_len.
Signed-off-by: kingbri <bdashore3@proton.me>
Alias repetition_penalty_range to repetition_range since that's used
as an internal variable. Perhaps in the future, there should be a function
that allows for iterating through request aliases and give a default value.
Signed-off-by: kingbri <bdashore3@proton.me>
On /v1/model/load, some internal server errors weren't being sent,
so migrate directory checking out and also add a check to make sure
the proposed model path exists.
Signed-off-by: kingbri <bdashore3@proton.me>
Use 2.3.4 from tgw. However, keep the 2.3.3 wheels in requirements
if the newer wheels don't work for now.
Signed-off-by: kingbri <bdashore3@proton.me>
Documented in previous commits. Also make sure that for version checking,
check the value of kwargs instead of if the key is present since requests
pass default values.
Signed-off-by: kingbri <bdashore3@proton.me>
A simple batch script to activate a venv and start TabbyAPI. This
can be used with nssm in Windows for a systemd-like background service.
Signed-off-by: kingbri <bdashore3@proton.me>
This helps clarify things when users are configuring for the first
time. For example, some users were putting the model name in the
"model" block instead of the "model_name" field.
Signed-off-by: kingbri <bdashore3@proton.me>
Models can be loaded with a child object called "draft" in the POST
request. Again, models need to be located within the draft model dir
to get loaded.
Signed-off-by: kingbri <bdashore3@proton.me>
Model: Add extra information to print and fix the divide by zero error.
Auth: Fix validation of API and admin keys to look for the entire key.
References #7 and #6
Signed-off-by: kingbri <bdashore3@proton.me>
Speculative decoding makes use of draft models that ingest the prompt
before forwarding it to the main model.
Add options in the config to support this. API options will occur
in a different commit.
Signed-off-by: kingbri <bdashore3@proton.me>