Commit graph

153 commits

Author SHA1 Message Date
kingbri
a69ee976f0 API: Let the user know if a disconnect occurred
If a user disconnects from a request, log this in the console.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-09 15:48:27 -05:00
kingbri
4d09226364 Logging: Fix Uvicorn hook
The Uvicorn logging config wasn't being set. Fix that when creating
a new server.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-08 17:56:48 -05:00
kingbri
2295b12643 Progress: Fix bar with draft models
Show two bars and clarify which bar is which.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-08 01:48:06 -05:00
kingbri
cad72315f4 Init: Switch to display redoc endpoint
Redoc looks much better than Swagger docs, so show that by default.
Both endpoints still exist.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-08 01:00:48 -05:00
kingbri
228c227c1e Logging: Switch to loguru
Loguru is a flexible logger that allows for easier hooking and imports
into Rich with no problems. Also makes progress bars stick to the
bottom of the terminal window.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-08 01:00:48 -05:00
kingbri
fe0ff240e7 Progress: Switch to Rich
Rich is a more mature library for displaying progress bars, logging,
and console output. This should help properly align progress bars
within the terminal.

Side note: "We're Rich!"

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-08 01:00:48 -05:00
kingbri
9a007c4707 Model: Add support for Q4 cache
Add this in addition to 8bit cache and 16bit cache. Passing "Q4" with
the cache_mode request parameter will set this on model load.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-06 00:59:28 -05:00
kingbri
0b25c208d6 API: Fix error reporting
Make a disconnect on load error consistently. It should be safer to
warn the user to run unload (or re-run load) if a model does not
load correctly.

Also don't log the traceback for request errors that don't have one.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-05 18:16:02 -05:00
kingbri
165cc6fc2d API: Remove unnecessary endpoint
This used to be a shim for ooba, but it's no longer necessary.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-04 23:21:40 -05:00
kingbri
d2c6ae2d35 API: Back to async
According to FastAPI docs, if you're using a generic function, running
it in async will make it more performant (which makes sense since
running def functions for routes will automatically run the caller
through a threadpool).

Tested and everything works fine.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-04 23:21:40 -05:00
kingbri
b0c295dd2f API: Add more methods to semaphore
The semaphore/queue model for Tabby is as follows:
- Any load requests go through the semaphore by default
- Any load request can include the skip_queue parameter to bypass
the semaphore
- Any unload requests are immediately executed
- All completion requests are placed inside the semaphore by default

This model preserves the parallelism of single-user mode with extra
convenience methods for queues in multi-user. It also helps mitigate
problems that were previously present in the concurrency stack.

Also change how the program's loop runs so it exits when the API thread
dies.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-04 23:21:40 -05:00
kingbri
c82697fef2 API: Fix issues with concurrent requests and queueing
This is the first in many future commits that will overhaul the API
to be more robust and concurrent. The model is admin-first where the
admin can do anything in-case something goes awry.

Previously, calls to long running synchronous background tasks would
block the entire API, making it ignore any terminal signals until
generation is completed.

To fix this, levrage FastAPI's run_in_threadpool to offload the long
running tasks to another thread. However, signals to abort the process
still kept the background thread running and made the terminal hang.

This was due to an issue with Uvicorn not propegating the SIGINT signal
across threads in its event loop. To fix this in a catch-all way, run
the API processes in a separate thread so the main thread can still
kill the process if needed.

In addition, make request error logging more robust and refer to the
console for full error logs rather than creating a long message on the
client-side.

Finally, add state checks to see if a model is fully loaded before
generating a completion.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-04 23:21:40 -05:00
kingbri
5a23b9ebc9 Tree: Format
Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-22 01:28:30 -05:00
kingbri
bee26a2f2c API: Auto-unload on a load request
Automatically unload the existing model when calling /load. This was
requested many times, and does make more sense in the long run.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-21 23:00:11 -05:00
kingbri
949248fb94 Config: Add experimental torch cuda malloc backend
This option saves some VRAM, but does have the chance to error out.
Add this in the experimental config section.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-14 21:45:56 -05:00
kingbri
c02fe4d1db API: Fix response creation
Change chat completion and text completion responses to be more
flexible.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-08 21:26:53 -05:00
kingbri
0af6a38af3 Model: Add logprobs support
Returns token offsets, selected tokens, probabilities of tokens
post-sampling, and normalized probability of selecting a token
pre-sampling (for efficiency purposes).

Only for text completions. Chat completions in a later commit.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-08 21:26:53 -05:00
kingbri
284f20263f API: Clean up tokenizing endpoint
Split the get tokens function into separate wrapper encode and decode
functions for overall code cleanliness.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-08 21:26:53 -05:00
kingbri
58590a6c57 Config: Add option to force streaming off
Many APIs automatically ask for request streaming without giving
the user the option to turn it off. Therefore, give the user more
freedom by giving a server-side kill switch.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-07 21:09:59 -05:00
kingbri
1919bf7705 Launch: Make exllamav2 requirement more friendly
Add the ability to use an unsafe config flag if needed and migrate
the exl2 check to a different file within the exl2 backend code.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-02 23:36:17 -05:00
kingbri
2ea063cea9 Tree: Require exllamav2 version for startup
Exllamav2 is currently supported on all GPUs and versions. Therefore,
it should be expected that users use the latest version of exllamav2 to
get the latest features.

Doing this helps reduce checks that don't really serve any purpose.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-02 23:36:17 -05:00
kingbri
d3781920b3 OAI: Split up utility functions
Just like types, put utility functions in their own separate module
based on the route.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-02 23:36:17 -05:00
kingbri
b14c5443fd API: Add sampler override switching
Allow users to switch the currently overriden samplers via the API
so a restart isn't required to switch the overrides.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-01-25 00:15:40 -05:00
kingbri
de0ba7214c API: Add template switching and unload endpoints
Templates can be switched and unloaded without reloading the entire
model.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-01-25 00:15:40 -05:00
kingbri
6c30f24c83 Tree: Unify sampler parameters and add override support
Unify API sampler params into a superclass which should make them
easier to manage and inherit generic functions from.

Not all frontends expose all sampling parameters due to connections
with OAI (that handles sampling themselves with the exception of
a few sliders).

Add the ability for the user to customize fallback parameters from
server-side.

In addition, parameters can be forced to a certain value server-side
in case the repo automatically sets other sampler values in the
background that the user doesn't want.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-01-25 00:15:40 -05:00
kingbri
78f920eeda Tree: Refactor code organization
Move common functions into their own folder and refactor the backends
to use their own folder as well.

Also cleanup imports and alphabetize import statments themselves.

Finally, move colab and docker into their own folders as well.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-01-25 00:15:40 -05:00
kingbri
902e841c39 Main: Add logging for API routes
Helps users get started with accessing the docs.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-01-10 23:50:11 -05:00
kingbri
c1642076c2 API: Switch unload method to POST
GET and POST can be used interchangeably in this case, but adhere
to the HTTP spec.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-01-04 21:11:36 -05:00
kingbri
451042aadf Main: Don't load if model_name/loras is blank
Previously, if model_name was commented out, a load would not occur.
Add the case if model_name or loras is blank which returns None when
parsing the YAML.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-01-02 13:56:25 -05:00
kingbri
6b04463051 API: Fix CFG reporting
THe model endpoint wasn't reporting if CFG is on.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-01-02 13:54:16 -05:00
kingbri
bb7a8e4614 Config: Add override argparser
Add an argparser that casts over to dictionaries of subgroups to
integrate with the config.

This argparser doesn't contain everything in the config due to complexity
issues with CLI args, but will eventually progress to parity. In addition,
it's used to override the config.yml rather than replace it.

A config arg is also provided if the user wants to fully override the
config yaml with another file path.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-01-01 14:27:12 -05:00
kingbri
79a57588d5 API: Add template list endpoint
Fetches all template names that a user has in the templates directory
for chat completions.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-29 22:58:55 -05:00
kingbri
dce8c74edc API: Add clarification and cleanup autodocs
It's possible to override parts of the example JSON to give proper
examples of values.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-29 10:28:06 -05:00
kingbri
3622710582 API: Fix num_experts_per_token reporting
This wasn't linked to the model config. This value can be 1 if
a MoE model isn't loaded.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-28 00:31:14 -05:00
kingbri
c5bbfd97b2 Entrypoint: Load loras after model
Prevents an error if the model isn't loaded on startup.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-27 23:55:02 -05:00
kingbri
ac0d6f8869 Tree: Format and cleanup start
Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-27 01:17:31 -05:00
kingbri
a71b96a20c Main: Switch to entrypoint
Allows for other modules to access the startup function.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-27 00:34:50 -05:00
kingbri
09ae71aa91 OAI: Add finish to completions
OAI spec requires [DONE] to be sent over SSE to signal that a generation
is completed.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-25 11:25:38 -05:00
kingbri
703a114f63 Tree: Format
Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-23 23:03:28 -05:00
kingbri
c9126c3145 Config: Isolate to a separate file
Reduce dependency of globals in main to simplify code a bit.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-23 23:02:37 -05:00
kingbri
0d2e726e82 Main: Fix import formatting
Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-23 21:33:15 -05:00
kingbri
3461f8294f Logging: Clarify preferences
Preferences are preferences, not a config.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-23 21:08:10 -05:00
AlpinDale
6a5bbd217c
feat: logging (#39)
* add logging

* simplify the logger

* formatting

* final touches

* fix format

* Model: Add log to metrics

Signed-off-by: kingbri <bdashore3@proton.me>

---------

Authored-by: AlpinDale <52078762+AlpinDale@users.noreply.github.com>
2023-12-23 04:33:31 +00:00
kingbri
71f6a586f1 Templates: Add error handling for template errors
Similar to the transformers library, add an error handler when an
exception is fired. This relays the error to the user.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-22 11:59:47 -05:00
AlpinDale
fa47f51f85
feat: workflows for formatting/linting (#35)
* add github workflows for pylint and yapf

* yapf

* docstrings for auth

* fix auth.py

* fix generators.py

* fix gen_logging.py

* fix main.py

* fix model.py

* fix templating.py

* fix utils.py

* update formatting.sh to include subdirs for pylint

* fix model_test.py

* fix wheel_test.py

* rename utils to utils_oai

* fix OAI/utils_oai.py

* fix completion.py

* fix token.py

* fix lora.py

* fix common.py

* add pylintrc and fix model.py

* finish up pylint

* fix attribute error

* main.py formatting

* add formatting batch script

* Main: Remove unnecessary global

Linter suggestion.

Signed-off-by: kingbri <bdashore3@proton.me>

* switch to ruff

* Formatting + Linting: Add ruff.toml

Signed-off-by: kingbri <bdashore3@proton.me>

* Formatting + Linting: Switch scripts to use ruff

Also remove the file and recent file change functions from both
scripts.

Signed-off-by: kingbri <bdashore3@proton.me>

* Tree: Format and lint

Signed-off-by: kingbri <bdashore3@proton.me>

* Scripts + Workflows: Format

Signed-off-by: kingbri <bdashore3@proton.me>

* Tree: Remove pylint flags

We use ruff now

Signed-off-by: kingbri <bdashore3@proton.me>

* Tree: Format

Signed-off-by: kingbri <bdashore3@proton.me>

* Formatting: Line length is 88

Use the same value as Black.

Signed-off-by: kingbri <bdashore3@proton.me>

* Tree: Format

Update to new line length rules.

Signed-off-by: kingbri <bdashore3@proton.me>

---------

Authored-by: AlpinDale <52078762+AlpinDale@users.noreply.github.com>
Co-authored-by: kingbri <bdashore3@proton.me>
2023-12-22 16:20:35 +00:00
kingbri
a14abfe21c Templates: Support bos_token and eos_token fields
These are commonly seen in huggingface provided chat templates and
aren't that difficult to add in.

For feature parity, honor the add_bos_token and ban_eos_token
parameters when constructing the prompt.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-22 10:33:11 -05:00
kingbri
8fa764bfbe Auth: Add option to disable authentication
This creates a massive security hole, but it's gated behind a flag
for users who only use localhost.

A warning will pop up when users disable authentication.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-21 23:40:16 -05:00
kingbri
99a798e117 API: Add auth enforcement to draft list
This didn't have an API key gate.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-21 23:14:04 -05:00
kingbri
1a8afcb6ad Generator: Fix semaphore scheduling
Non-streaming tasks were not regulated by the semaphore, causing these
tasks to interfere with streaming generations. Add helper functions
to take in both sync and async functions for callbacks and sequential
blocking with the semaphore.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-21 21:39:45 -05:00
kingbri
c9e43e51aa API: Add route for draft model list
Does the same thing as model list except with draft models.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-19 23:45:53 -05:00