Commit graph

133 commits

Author SHA1 Message Date
kingbri
2c3bc71afa Tree: Switch to asynchronous file handling
Using aiofiles, there's no longer a possiblity of blocking file operations
that can hang up the event loop. In addition, partially migrate
classes to use asynchronous init instead of the normal python magic method.

The only exception is config, since that's handled in the synchonous
init before the event loop starts.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-10 16:45:14 -04:00
kingbri
d34756dc98 Tree: Format
Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-05 18:05:59 -04:00
kingbri
1c9991f79e Config: Format and organize
Rename some methods and change comments.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-05 17:59:18 -04:00
Jake
cb91670c7a fix command line args
- move to a complet class singleton to avoid propagation errors
- remove legacy load confing precedure
2024-09-05 15:33:00 +01:00
kingbri
93872b34d7 Config: Migrate to global class instead of dicts
The config categories can have defined separation, but preserve
the dynamic nature of adding new config options by making all the
internal class vars as dictionaries.

This was necessary since storing global callbacks stored a state
of the previous global_config var that wasn't populated.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-04 23:18:47 -04:00
Jake
fa6404a95a refactor config loading
- improve DRY
- alter logging
- allow extensibility
- add foundation for environment variables as config
2024-09-04 12:22:49 +01:00
kingbri
685e3836e9 Args: Add api-servers to parser
Also run OpenAPI export after args/config are parsed.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-08 16:32:29 -04:00
kingbri
6a0cfd731b Main: Only import psutil when the experimental function is run
Experimental options shouldn't be imported at the top level until the
testing period is over.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-03 22:00:15 -04:00
kingbri
bfa011e0ce Embeddings: Add model management
Embedding models are managed on a separate backend, but are run
in parallel with the model itself. Therefore, manage this in a separate
container with separate routes.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-30 15:19:27 -04:00
kingbri
fbf1455db1 Embeddings: Migrate and organize Infinity
Use Infinity as a separate backend and handle the model within the
common module. This separates out the embeddings model from the endpoint
which allows for model loading/unloading in core.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-30 11:00:23 -04:00
kingbri
42bc4adcfb Config: Add option to set priority to realtime
Realtime process priority assigns resources to point to tabby's
processes. Running as administrator will give realtime priority
while running as a normal user will set as high priority.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-24 21:50:06 -04:00
kingbri
5c082b7e8c Async: Add option to use Uvloop/Winloop
These are faster event loops for asyncio which should improve overall
performance. Gate these under an experimental flag for now to stress
test these loops.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-24 18:59:20 -04:00
kingbri
522999ebb4 Config: Change from gen_logging to logging
More accurately reflects the config.yml's sections.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-22 21:15:16 -04:00
kingbri
6613e38436 Main: Make openapi export store locally
This runs faster than always making a syscall to check if the env
var is set.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-08 14:54:06 -04:00
kingbri
b907421285 Main: Fix launch if EXPORT_OPENAPI is unset
A default needs to be provided with getenv. Fix that with an empty
string.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-08 13:41:44 -04:00
kingbri
a59e8ef9e7 Main: Make EXPORT_OPENAPI only work if true or 1
Use truthy values instead of checking if the variable is set.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-08 12:51:24 -04:00
kingbri
933268f7e2 API: Integrate OpenAPI export script
Move OpenAPI export as an env var within the main function. This
allows for easy export by running main.

In addition, an env variable provides global and explicit state to
disable conditional wheel imports (ex. Exl2 and torch) which caused
errors at first.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-08 12:34:32 -04:00
kingbri
ae879a623f Main: Add await to an async function
load_loras wasn't properly updated.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-02 21:24:43 -04:00
kingbri
5bb4995a7c API: Move OAI to APIRouter
This makes the API more modular for other API implementations in the
future.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-06 01:25:31 -04:00
kingbri
6dfcbbd813 Common: Migrate request utils to networking
Helps organize the project better. Utils is meant to be for simple
functions like unwrap.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-21 23:21:57 -04:00
kingbri
14d8ec2007 Signal: Fix signal handlers for uvicorn
Add the ability to override uvicorn's signal handler in addition
to using main's signal handler for any SIGINTs before the API server
starts.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-16 23:23:31 -04:00
kingbri
7fded4f183 Tree: Switch to async generators
Async generation helps remove many roadblocks to managing tasks
using threads. It should allow for abortables and modern-day paradigms.

NOTE: Exllamav2 itself is not an asynchronous library. It's just
been added into tabby's async nature to allow for a fast and concurrent
API server. It's still being debated to run stream_ex in a separate
thread or manually manage it using asyncio.sleep(0)

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-16 23:23:31 -04:00
kingbri
104a6121cb API: Split into separate folder
Moving the API into its own directory helps compartmentalize it
and allows for cleaning up the main file to just contain bootstrapping
and the entry point.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-12 23:59:30 -04:00
kingbri
5a2de30066 Tree: Update to cleanup globals
Use the module singleton pattern to share global state. This can also
be a modified version of the Global Object Pattern. The main reason
this pattern is used is for ease of use when handling global state
rather than adding extra dependencies for a DI parameter.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-12 23:59:30 -04:00
kingbri
b373b25235 API: Move to ModelManager
This is a shared module  which manages the model container and provides
extra utility functions around it to help slim down the API.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-12 23:59:30 -04:00
kingbri
894be4a818 Startup: Check if the port is available and fallback
Similar to Gradio, fall back to port + 1 if the config port isn't
bindable. If both ports aren't available, let the user know and exit.
An infinite loop of finding a port isn't advisable.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-11 21:57:28 -04:00
kingbri
7c6fd7ac60 Main: Cleanup
Remove leftover debug statements.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-11 18:10:35 -04:00
kingbri
42c0dbe795 Generation: Explicitly release semaphore on disconnect
This prevents any lockups when querying another request.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-10 17:54:48 -04:00
kingbri
bbb1a4ec20 Tree: Format
Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-10 17:45:09 -04:00
kingbri
d45e847c7a API: Fix disconnect handling on streaming responses
Starlette's StreamingResponse has an issue where it yields after
a request has disconnected. A bugfix to starlette will fix this
issue, but FastAPI uses starlette <= 0.36 which isn't ideal.

Therefore, switch back to sse-starlette which handles these disconnects
correctly.

Also don't try yielding after the request is disconnected. Just return
out of the generator instead.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-10 17:43:13 -04:00
kingbri
a69ee976f0 API: Let the user know if a disconnect occurred
If a user disconnects from a request, log this in the console.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-09 15:48:27 -05:00
kingbri
4d09226364 Logging: Fix Uvicorn hook
The Uvicorn logging config wasn't being set. Fix that when creating
a new server.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-08 17:56:48 -05:00
kingbri
2295b12643 Progress: Fix bar with draft models
Show two bars and clarify which bar is which.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-08 01:48:06 -05:00
kingbri
cad72315f4 Init: Switch to display redoc endpoint
Redoc looks much better than Swagger docs, so show that by default.
Both endpoints still exist.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-08 01:00:48 -05:00
kingbri
228c227c1e Logging: Switch to loguru
Loguru is a flexible logger that allows for easier hooking and imports
into Rich with no problems. Also makes progress bars stick to the
bottom of the terminal window.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-08 01:00:48 -05:00
kingbri
fe0ff240e7 Progress: Switch to Rich
Rich is a more mature library for displaying progress bars, logging,
and console output. This should help properly align progress bars
within the terminal.

Side note: "We're Rich!"

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-08 01:00:48 -05:00
kingbri
9a007c4707 Model: Add support for Q4 cache
Add this in addition to 8bit cache and 16bit cache. Passing "Q4" with
the cache_mode request parameter will set this on model load.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-06 00:59:28 -05:00
kingbri
0b25c208d6 API: Fix error reporting
Make a disconnect on load error consistently. It should be safer to
warn the user to run unload (or re-run load) if a model does not
load correctly.

Also don't log the traceback for request errors that don't have one.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-05 18:16:02 -05:00
kingbri
165cc6fc2d API: Remove unnecessary endpoint
This used to be a shim for ooba, but it's no longer necessary.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-04 23:21:40 -05:00
kingbri
d2c6ae2d35 API: Back to async
According to FastAPI docs, if you're using a generic function, running
it in async will make it more performant (which makes sense since
running def functions for routes will automatically run the caller
through a threadpool).

Tested and everything works fine.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-04 23:21:40 -05:00
kingbri
b0c295dd2f API: Add more methods to semaphore
The semaphore/queue model for Tabby is as follows:
- Any load requests go through the semaphore by default
- Any load request can include the skip_queue parameter to bypass
the semaphore
- Any unload requests are immediately executed
- All completion requests are placed inside the semaphore by default

This model preserves the parallelism of single-user mode with extra
convenience methods for queues in multi-user. It also helps mitigate
problems that were previously present in the concurrency stack.

Also change how the program's loop runs so it exits when the API thread
dies.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-04 23:21:40 -05:00
kingbri
c82697fef2 API: Fix issues with concurrent requests and queueing
This is the first in many future commits that will overhaul the API
to be more robust and concurrent. The model is admin-first where the
admin can do anything in-case something goes awry.

Previously, calls to long running synchronous background tasks would
block the entire API, making it ignore any terminal signals until
generation is completed.

To fix this, levrage FastAPI's run_in_threadpool to offload the long
running tasks to another thread. However, signals to abort the process
still kept the background thread running and made the terminal hang.

This was due to an issue with Uvicorn not propegating the SIGINT signal
across threads in its event loop. To fix this in a catch-all way, run
the API processes in a separate thread so the main thread can still
kill the process if needed.

In addition, make request error logging more robust and refer to the
console for full error logs rather than creating a long message on the
client-side.

Finally, add state checks to see if a model is fully loaded before
generating a completion.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-04 23:21:40 -05:00
kingbri
5a23b9ebc9 Tree: Format
Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-22 01:28:30 -05:00
kingbri
bee26a2f2c API: Auto-unload on a load request
Automatically unload the existing model when calling /load. This was
requested many times, and does make more sense in the long run.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-21 23:00:11 -05:00
kingbri
949248fb94 Config: Add experimental torch cuda malloc backend
This option saves some VRAM, but does have the chance to error out.
Add this in the experimental config section.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-14 21:45:56 -05:00
kingbri
c02fe4d1db API: Fix response creation
Change chat completion and text completion responses to be more
flexible.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-08 21:26:53 -05:00
kingbri
0af6a38af3 Model: Add logprobs support
Returns token offsets, selected tokens, probabilities of tokens
post-sampling, and normalized probability of selecting a token
pre-sampling (for efficiency purposes).

Only for text completions. Chat completions in a later commit.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-08 21:26:53 -05:00
kingbri
284f20263f API: Clean up tokenizing endpoint
Split the get tokens function into separate wrapper encode and decode
functions for overall code cleanliness.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-08 21:26:53 -05:00
kingbri
58590a6c57 Config: Add option to force streaming off
Many APIs automatically ask for request streaming without giving
the user the option to turn it off. Therefore, give the user more
freedom by giving a server-side kill switch.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-07 21:09:59 -05:00
kingbri
1919bf7705 Launch: Make exllamav2 requirement more friendly
Add the ability to use an unsafe config flag if needed and migrate
the exl2 check to a different file within the exl2 backend code.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-02 23:36:17 -05:00