Commit graph

275 commits

Author SHA1 Message Date
kingbri
e4084b15c1 Downloader: Format
Make a public function private.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-30 01:16:57 -04:00
kingbri
50e0b71690 Downloader: Fix handling of include pattern
If an include or exclude pattern is provided, include should include
all files by default.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-30 01:13:06 -04:00
kingbri
21a01741c9 Downloader: Add include and exclude parameters
These both take an array of glob strings to state what files or
directories to include or exclude when parsing the download list.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-30 00:58:54 -04:00
kingbri
c47869c606 Downloader: Fix fallback mechanisms
Use None-ish coalescing instead of unwrap optional handling. This means
that any value that is "empty" for python will default to the fallback.

Ex. print("" or "test") will print out "test"

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-29 23:33:37 -04:00
kingbri
55ccd1baad API: Add HuggingFace downloader
Adds an asynchronous huggingface downloader that uses HF hub to fetch
all repo files. The current HF hub package has a snapshot_download
function that does not cancel on KeyboardInterrupt.

Instead, make a downloader that uses the Rich progress bar styling
along with a cancellable interface. Finally, link this to TabbyAPI.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-29 01:15:02 -04:00
kingbri
6114bfd221 API: Fix banned_tokens string when empty
The string should not be parsed and any non-string elements should
be removed as well.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-28 12:46:28 -04:00
kingbri
6f9da97114 API: Add banned_tokens
Appends the banned tokens to the generation. This is equivalent of
setting logit bias to -100 on a specific set of tokens.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-28 11:06:09 -04:00
kingbri
ed7cd3cb59 Network: Fix socket check timeout
Make this a one second timeout to check if a socket is connected.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-22 21:33:41 -04:00
kingbri
cab789e685 Templates: Migrate to class
Having many utility functions for initialization doesn't make much sense.
Instead, handle anything regarding template creation inside the
class which reduces the amount of function imports.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-21 23:28:14 -04:00
kingbri
9f93505bc1 OAI: Add skip_special_tokens parameter
Allows the ability to decode special tokens if the user wishes.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-21 00:37:46 -04:00
kingbri
67f061859d Tree: Add transformers_utils
Part of commit 8824ea0205

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-20 00:07:39 -04:00
kingbri
46ac3beea9 Templates: Support list style chat_template keys
HuggingFace updated transformers to provide templates in a list for
tokenizers. Update to support this new format. Providing the name
of a template for the "prompt_template" value in config.yml will also
look inside the template list.

In addition, log if there's a template exception, but continue model
loading since it shouldn't shut down the application.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-07 11:20:25 -04:00
Brian Dashore
cdb96e4f74
Merge pull request #93 from AlpinDale/chore/log-level
chore: make log level configurable via env variable
2024-04-02 00:52:06 -04:00
kingbri
f9f8c97c6d Templates: Fix stop_string parsing
Template modules grab all set vars, including ones that use runtime
vars. If a template var is set to a runtime var and a module is created,
an UndefinedError fires.

Use make_module instead to pass runtime vars when creating a template
module.

Resolves #92

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-02 00:44:04 -04:00
AlpinDale
1650e6e640 ruff 2024-04-01 23:11:30 +00:00
AlpinDale
5e599ddbd4 typo 2024-04-01 23:08:28 +00:00
AlpinDale
6c4a1a9c70 make log level a global var 2024-04-01 23:07:30 +00:00
AlpinDale
031349133b properly order imports 2024-04-01 23:03:16 +00:00
AlpinDale
e90ead3b35 chore: make log level configurable via env variable 2024-04-01 22:57:56 +00:00
kingbri
d716527b92 Sampling: Add additive param to overrides
Additive is used to add collections together. Currently, it's used
for lists, but it can be used for dictionaries in the future.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-31 01:10:55 -04:00
kingbri
dc456f4cc2 Templates: Add stop_strings meta param
Adding the stop_strings var to chat templates will allow for the
template creator to specify stopping strings to add onto chat completions.

Thes get appended with existing stopping strings that are passed
in the API request. However, a sampler override with force: true will
override all stopping strings.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-27 22:22:07 -04:00
kingbri
6dfcbbd813 Common: Migrate request utils to networking
Helps organize the project better. Utils is meant to be for simple
functions like unwrap.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-21 23:21:57 -04:00
kingbri
2961c5f3f9 API: Handle request disconnect on non-streaming gens
Works the same way as streaming gens. If the request is cancelled,
it will log an error to the user and release the semaphore if it's
holding anything.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-21 23:12:59 -04:00
kingbri
09a4c79847 Model: Auto-scale max_tokens by default
If max_tokens is None, it automatically scales to fill up the context.
This does not mean the generation will fill up that context since
EOS stops also exist.

Originally suggested by #86

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-18 22:54:59 -04:00
kingbri
25f5d4a690 API: Cleanup permission endpoint
Don't return an OAI specific type from a common file.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-18 15:13:26 -04:00
kingbri
3c08f46c51 Endpoints: Add key permission checker
This is a definite way to check if an authorized key is API or admin.
The endpoint only runs if the key is valid in the first place to keep
inline with the API's security model.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-18 00:53:27 -04:00
kingbri
14d8ec2007 Signal: Fix signal handlers for uvicorn
Add the ability to override uvicorn's signal handler in addition
to using main's signal handler for any SIGINTs before the API server
starts.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-16 23:23:31 -04:00
kingbri
95e44c20d6 Model: Fix load if model didn't load properly
If the model didn't load properly, the container still exists until
unload is called. However, the name check still registered as true.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-16 23:23:31 -04:00
kingbri
2755fd1af0 API: Fix blocking iterator execution
Run these iterators on the background thread. On startup, the API
spawns a background thread as needed to run sync code on without blocking
the event loop.

Use asyncio's run_thread function since it allows for errors to be
propegated.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-16 23:23:31 -04:00
kingbri
7fded4f183 Tree: Switch to async generators
Async generation helps remove many roadblocks to managing tasks
using threads. It should allow for abortables and modern-day paradigms.

NOTE: Exllamav2 itself is not an asynchronous library. It's just
been added into tabby's async nature to allow for a fast and concurrent
API server. It's still being debated to run stream_ex in a separate
thread or manually manage it using asyncio.sleep(0)

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-16 23:23:31 -04:00
kingbri
7006fa4cc8 Tree: Format
Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-13 23:33:18 -04:00
kingbri
efc01d947b API + Model: Add speculative ngram decoding
Speculative ngram decoding is like speculative decoding without the
draft model. It's not as useful because it only decodes on predictable
sequences, but it depends on the usecase.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-13 23:32:11 -04:00
kingbri
2ebefe8258 Logging: Move metrics to gen logging
This didn't have a place in the generation function.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-13 23:13:55 -04:00
kingbri
1ec8eb9620 Tree: Format
Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-13 00:02:55 -04:00
kingbri
6f03be9523 API: Split functions into their own files
Previously, generation function were bundled with the request function
causing the overall code structure and API to look ugly and unreadable.

Split these up and cleanup a lot of the methods that were previously
overlooked in the API itself.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-12 23:59:30 -04:00
kingbri
5a2de30066 Tree: Update to cleanup globals
Use the module singleton pattern to share global state. This can also
be a modified version of the Global Object Pattern. The main reason
this pattern is used is for ease of use when handling global state
rather than adding extra dependencies for a DI parameter.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-12 23:59:30 -04:00
kingbri
b373b25235 API: Move to ModelManager
This is a shared module  which manages the model container and provides
extra utility functions around it to help slim down the API.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-12 23:59:30 -04:00
kingbri
894be4a818 Startup: Check if the port is available and fallback
Similar to Gradio, fall back to port + 1 if the config port isn't
bindable. If both ports aren't available, let the user know and exit.
An infinite loop of finding a port isn't advisable.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-11 21:57:28 -04:00
kingbri
ba3da6d92f Logging: Escape rich markup sequences
Rich markup sequences inside the log string were causing issues
with printing. Fix this by using their escape function.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-11 00:28:48 -04:00
kingbri
42c0dbe795 Generation: Explicitly release semaphore on disconnect
This prevents any lockups when querying another request.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-10 17:54:48 -04:00
kingbri
045262f51f Logging: Loglevel INFO
This is the max that Tabby should log because debug and trace aren't
used within the application.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-10 17:44:19 -04:00
kingbri
d45e847c7a API: Fix disconnect handling on streaming responses
Starlette's StreamingResponse has an issue where it yields after
a request has disconnected. A bugfix to starlette will fix this
issue, but FastAPI uses starlette <= 0.36 which isn't ideal.

Therefore, switch back to sse-starlette which handles these disconnects
correctly.

Also don't try yielding after the request is disconnected. Just return
out of the generator instead.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-10 17:43:13 -04:00
kingbri
6b4f100db2 Logger: Escape tags
Angle brackets should be escaped to avoid mistaken color formatting.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-10 01:24:50 -05:00
kingbri
c77259bfbb Logger: Fix reformatting of message
Use the reformatted message when splitting lines instead of the
raw message to prevent exceptions.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-09 15:40:37 -05:00
kingbri
4d09226364 Logging: Fix Uvicorn hook
The Uvicorn logging config wasn't being set. Fix that when creating
a new server.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-08 17:56:48 -05:00
kingbri
c9b4b7c509 Tree: Format
Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-08 01:00:48 -05:00
kingbri
ef2dc326f5 Logging: Fix inconsistent formatting
Some colorization was incorrect and the separator insertion has become
more robust.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-08 01:00:48 -05:00
kingbri
228c227c1e Logging: Switch to loguru
Loguru is a flexible logger that allows for easier hooking and imports
into Rich with no problems. Also makes progress bars stick to the
bottom of the terminal window.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-08 01:00:48 -05:00
kingbri
fe0ff240e7 Progress: Switch to Rich
Rich is a more mature library for displaying progress bars, logging,
and console output. This should help properly align progress bars
within the terminal.

Side note: "We're Rich!"

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-08 01:00:48 -05:00
kingbri
0b25c208d6 API: Fix error reporting
Make a disconnect on load error consistently. It should be safer to
warn the user to run unload (or re-run load) if a model does not
load correctly.

Also don't log the traceback for request errors that don't have one.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-05 18:16:02 -05:00