Commit graph

1051 commits

Author SHA1 Message Date
Brian Dashore
109e4223e0
Merge pull request #18 from DocShotgun/main
Add automatic NTK-aware alpha scaling to model
2023-12-03 01:06:50 -05:00
kingbri
27fc0c0069 Model: Cleanup and compartmentalize auto rope functions
Also handle an edge case if ratio <= 1 since NTK scaling is only
used for values > 1.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-03 01:05:09 -05:00
DocShotgun
bd2c5d0d09
Force auto-alpha to 1.0 if config ctx == base ctx 2023-12-02 21:19:59 -08:00
DocShotgun
1c398b0be7
Add automatic NTK-aware alpha scaling to model
* enables automatic calculation of NTK-aware alpha scaling for models if the rope_alpha arg is not passed in the config, using the same formula used for draft models
2023-12-02 21:02:29 -08:00
kingbri
61f6e51fdb OAI: Add separator style fallback
Some models may return None for separator style with FastChat. Fall
back to LLAMA2 if this is the case.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-01 23:30:19 -05:00
kingbri
ae69b18583 API: Use FastAPI streaming instead of sse_starlette
sse_starlette kept firing a ping response if it was taking too long
to set an event. Rather than using a hacky workaround, switch to
FastAPI's inbuilt streaming response and construct SSE requests with
a utility function.

This helps the API become more robust and removes an extra requirement.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-01 01:54:35 -05:00
kingbri
6493b1d2aa OAI: Add ability to send dummy models
Some APIs require an OAI model to be sent against the models endpoint.
Fix this by adding a GPT 3.5 turbo entry as first in the list to cover
as many APIs as possible.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-01 00:27:28 -05:00
kingbri
aef411bed5 OAI: Fix chat completion streaming
Chat completions require a finish reason to be provided in the OAI
spec once the streaming is completed. This is different from a non-
streaming chat completion response.

Also fix some errors that were raised from the endpoint.

References #15

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-01 00:14:24 -05:00
Brian Dashore
c4d8c901e1
Merge pull request #13 from ziadloo/main
Adding the usage stat support (prompt_tokens, completion_tokens, and total_tokens)
2023-11-30 01:57:44 -05:00
kingbri
8a5ac5485b Model: Fix rounding
generated_tokens is always a whole number.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-30 01:55:46 -05:00
kingbri
e703c716ee Merge branch 'main' of https://github.com/ziadloo/tabbyAPI into ziadloo-main 2023-11-30 01:01:48 -05:00
kingbri
56f9b1d1a8 API: Add generator error handling
If the generator errors, there's no proper handling to send an error
packet and close the connection.

This is especially important for unloading models if the load fails
at any stage to reclaim a user's VRAM. Raising an exception caused
the model_container object to lock and not get freed by the GC.

This made sense to propegate SSE errors across all generator functions
rather than relying on abort signals.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-30 00:37:48 -05:00
kingbri
2bc3da0155 YAML: Force all files to open with utf8
The default encoding method when opening files on Windows is cp1252
which doesn't support all unicode and can cause issues.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-29 22:04:29 -05:00
kingbri
3957316b79 Revert "API: Rename repetition_decay -> repetition_slope"
This reverts commit cad144126f.

Change this parameter back to repetition_decay. This is different than
rep_pen_slope used in other backends such as kobold and NAI.

Still keep the fallback condition though.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-29 22:03:45 -05:00
kingbri
94696543bc Model: Warn user if context > max_seq_len
Unlike other backends, tabby attempts to generate even if the context
is greater than the max sequence length via truncation of the given
context.

Rather than artifically erroring out, give a warning that outputted
console metrics are going to be incorrect and to make sure that
context <= max_seq_len.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-29 01:35:32 -05:00
kingbri
cad144126f API: Rename repetition_decay -> repetition_slope
Also fix the fallback to use 0 for sanity checking and validation.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-29 01:13:05 -05:00
kingbri
5cbf7f13da OAI: Fix repetition range
Alias repetition_penalty_range to repetition_range since that's used
as an internal variable. Perhaps in the future, there should be a function
that allows for iterating through request aliases and give a default value.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-29 00:53:19 -05:00
Mehran Ziadloo
b0c42d0f05 Leveraging local variables 2023-11-27 20:56:56 -08:00
Mehran Ziadloo
ead503c75b Adding token usage support 2023-11-27 20:05:05 -08:00
kingbri
44e7f7b0ee Update README
Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-25 23:47:48 -05:00
Brian Dashore
0914bc313f
Merge pull request #12 from DocShotgun/main
Add start-up shell script for Linux
2023-11-25 00:29:47 -05:00
kingbri
d929e0c826 API: Fix error points and exceptions
On /v1/model/load, some internal server errors weren't being sent,
so migrate directory checking out and also add a check to make sure
the proposed model path exists.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-25 00:27:02 -05:00
DocShotgun
cffd20f580
Add start-up shell script for Linux
- requires user to have already installed the pre-requisites in venv
2023-11-23 19:03:52 -08:00
kingbri
d47c39da54 API: Don't include draft directory in response
The draft directory should be returned for a draft model request (TBD).

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-23 00:07:56 -05:00
kingbri
13c9c09398 Update README
Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-22 00:20:21 -05:00
kingbri
d25310e55d Requirements: Update Flash Attention 2
Use 2.3.4 from tgw. However, keep the 2.3.3 wheels in requirements
if the newer wheels don't work for now.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-21 22:12:55 -05:00
kingbri
71b9a53336 API: Add temperature_last support
Documented in previous commits. Also make sure that for version checking,
check the value of kwargs instead of if the key is present since requests
pass default values.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-21 21:20:59 -05:00
turboderp
3337fe6acc Warning if unsupported samplers are used 2023-11-21 18:35:22 +01:00
turboderp
a54de11cf3 Add new samplers 2023-11-21 18:16:53 +01:00
kingbri
c92ee24bb4 Tree: Add batch script
A simple batch script to activate a venv and start TabbyAPI. This
can be used with nssm in Windows for a systemd-like background service.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-20 01:48:06 -05:00
kingbri
2aa9c145be Auth: Fix an oops with headers
I copy pasted the code wrong.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-20 00:16:12 -05:00
kingbri
39ea730be5 Auth: Allow admin keys to work with api key routes
Admin keys are an administrator key, so it makes sense to allow it
for API key routes as well.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-19 23:53:07 -05:00
turboderp
8ef730f016
Merge pull request #11 from veden/patch-1
Fix incorrect ratio calculation for draft model
2023-11-20 04:23:34 +01:00
Veden
f960fac8ff
Fix incorrect ratio calculation for draft model 2023-11-19 13:12:53 -08:00
kingbri
4cddd0400c Model: Fix draft model loading
Use draft_config to find the path instead of kwargs.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-19 02:04:02 -05:00
kingbri
698b0b1976 Update README
Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-19 01:19:31 -05:00
kingbri
581e1fc219 Sample config: Remove unused value
Draft models are specified in the draft sublock.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-19 01:16:03 -05:00
kingbri
e0e93c103b Sample config: Uncomment all parameters
This helps clarify things when users are configuring for the first
time. For example, some users were putting the model name in the
"model" block instead of the "model_name" field.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-19 01:12:07 -05:00
kingbri
63762654f0 Update README
Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-19 01:05:49 -05:00
Brian Dashore
e46676cb08
Merge pull request #9 from city-unit/main
Add basic docker support
2023-11-19 00:53:24 -05:00
kingbri
e4a8848445 Auth: Log API and admin key on startup
Helpful for users who run headless or use Docker.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-19 00:52:39 -05:00
kingbri
31bc418795 Model: Add context in response output
When printing to the console, give information about the context
(ingestion token count).

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-19 00:49:32 -05:00
city_unit
80c69939ae Remove unneeded stuffs 2023-11-19 00:34:54 -05:00
kingbri
f47919b1d3 API: Add draft model support
Models can be loaded with a child object called "draft" in the POST
request. Again, models need to be located within the draft model dir
to get loaded.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-19 00:32:25 -05:00
city_unit
6b22dc0119 Rename, fschat support 2023-11-19 00:32:14 -05:00
city_unit
99cf0b6d7b Add basic docker support 2023-11-19 00:01:17 -05:00
kingbri
6b9af58cc1 Tree: Fix extraneous bugs and update T/s print
Model: Add extra information to print and fix the divide by zero error.
Auth: Fix validation of API and admin keys to look for the entire key.

References #7 and #6

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-18 22:34:40 -05:00
kingbri
a51889bdb8 Requirements: Update Flash Attention
Bump to version 2.3.3.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-18 22:28:24 -05:00
Brian Dashore
b2410a0436
Merge pull request #4 from waldfee/config_samples
Adds draft model support to config.yml
2023-11-18 13:16:23 -05:00
kingbri
27ebec3b35 Model: Add speculative decoding support via config
Speculative decoding makes use of draft models that ingest the prompt
before forwarding it to the main model.

Add options in the config to support this. API options will occur
in a different commit.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-18 01:42:20 -05:00