Commit graph

28 commits

Author SHA1 Message Date
kingbri
d339139fb6 Config: Deep merge model overrides
Anything below the first level of kwargs was not being merged properly.
A more bulletproof solution would be to refactor the loading code
to separate draft and normal model parameters.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-07-03 12:17:09 -04:00
kingbri
3649d3bb51 Tree: Format + Lint
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-04-26 02:14:30 -04:00
Vhallo
1aefa01a68
Fix RoPE Ratio 2025-04-21 01:46:18 +02:00
kingbri
8e238fa8f6 Model: Move calculate_rope_alpha from backend
Makes more sense to use as a utility function. Also clarify how the
vars are set.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-04-20 18:20:19 -04:00
kingbri
027ffce05d Utils: Remove unused defer utils
These did not work anyways

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-04-20 17:59:09 -04:00
kingbri
11ed3cf5ee Model: Cleanup logging and remove extraneous declarations
Log the parameters passed into the generate gen function rather than
the generation settings to reduce complexity.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-04-15 23:31:12 -04:00
TerminalMan
f4791e7ed9
Cleanup config file loader (#208)
* fix config file loader

* prune nonetype values from config dict

fixes default values not initialising properly

* Utils: Shrink None removal function

It is more concise to use a list and dict collection if necessary
rather than iterating through and checking each value. Tested and
works with Tabby's cases.

Signed-off-by: kingbri <bdashore3@proton.me>

---------

Signed-off-by: kingbri <bdashore3@proton.me>
Co-authored-by: kingbri <bdashore3@proton.me>
2024-09-23 21:42:01 -04:00
TerminalMan
948fcb7f5b migrate to ruamel.yaml 2024-09-18 01:06:34 +01:00
kingbri
ebe7f3567e Config: Alter migration error handling and cleanup
Rollback to the old config if automigration fails.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-16 18:02:18 -04:00
kingbri
81ae461eb8 Config: Allow existing values to get included in generated file
Allows for generation from an existing config file. Primarily used
for migration purposes.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-16 12:19:58 -04:00
TerminalMan
564bdcf0a8 add legacy config converter 2024-09-16 14:12:47 +01:00
kingbri
a09dd802c2 Config: Cleanup and organize functions
Remove access of private attributes and use safer functions. Also
move generalized functions into utils files.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-14 21:48:39 -04:00
kingbri
93872b34d7 Config: Migrate to global class instead of dicts
The config categories can have defined separation, but preserve
the dynamic nature of adding new config options by making all the
internal class vars as dictionaries.

This was necessary since storing global callbacks stored a state
of the previous global_config var that wasn't populated.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-04 23:18:47 -04:00
Jake
e772fa2981 Switch to internal dict merge implementation
- remove deepmerge dependency
- fix ruff formatting
2024-09-04 16:27:28 +01:00
kingbri
7522b1447b Model: Add support for HuggingFace config and bad_words_ids
This is necessary for Kobold's API. Current models use bad_words_ids
in generation_config.json, but for some reason, they're also present
in the model's config.json.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-26 18:23:22 -04:00
kingbri
b9fd8555fe Sampling: Copy over iterable overrides
If an override was iterable, any modifications to the returned value
would alter the reference to the global storage dict.

Therefore, copy the structure if it's an iterable so any modification
won't alter the original override. Also apply this for the function
that checks for forced overrides.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-17 21:38:28 -04:00
kingbri
6dfcbbd813 Common: Migrate request utils to networking
Helps organize the project better. Utils is meant to be for simple
functions like unwrap.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-21 23:21:57 -04:00
kingbri
2961c5f3f9 API: Handle request disconnect on non-streaming gens
Works the same way as streaming gens. If the request is cancelled,
it will log an error to the user and release the semaphore if it's
holding anything.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-21 23:12:59 -04:00
kingbri
7fded4f183 Tree: Switch to async generators
Async generation helps remove many roadblocks to managing tasks
using threads. It should allow for abortables and modern-day paradigms.

NOTE: Exllamav2 itself is not an asynchronous library. It's just
been added into tabby's async nature to allow for a fast and concurrent
API server. It's still being debated to run stream_ex in a separate
thread or manually manage it using asyncio.sleep(0)

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-16 23:23:31 -04:00
kingbri
894be4a818 Startup: Check if the port is available and fallback
Similar to Gradio, fall back to port + 1 if the config port isn't
bindable. If both ports aren't available, let the user know and exit.
An infinite loop of finding a port isn't advisable.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-11 21:57:28 -04:00
kingbri
d45e847c7a API: Fix disconnect handling on streaming responses
Starlette's StreamingResponse has an issue where it yields after
a request has disconnected. A bugfix to starlette will fix this
issue, but FastAPI uses starlette <= 0.36 which isn't ideal.

Therefore, switch back to sse-starlette which handles these disconnects
correctly.

Also don't try yielding after the request is disconnected. Just return
out of the generator instead.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-10 17:43:13 -04:00
kingbri
228c227c1e Logging: Switch to loguru
Loguru is a flexible logger that allows for easier hooking and imports
into Rich with no problems. Also makes progress bars stick to the
bottom of the terminal window.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-08 01:00:48 -05:00
kingbri
fe0ff240e7 Progress: Switch to Rich
Rich is a more mature library for displaying progress bars, logging,
and console output. This should help properly align progress bars
within the terminal.

Side note: "We're Rich!"

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-08 01:00:48 -05:00
kingbri
0b25c208d6 API: Fix error reporting
Make a disconnect on load error consistently. It should be safer to
warn the user to run unload (or re-run load) if a model does not
load correctly.

Also don't log the traceback for request errors that don't have one.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-05 18:16:02 -05:00
kingbri
b0c295dd2f API: Add more methods to semaphore
The semaphore/queue model for Tabby is as follows:
- Any load requests go through the semaphore by default
- Any load request can include the skip_queue parameter to bypass
the semaphore
- Any unload requests are immediately executed
- All completion requests are placed inside the semaphore by default

This model preserves the parallelism of single-user mode with extra
convenience methods for queues in multi-user. It also helps mitigate
problems that were previously present in the concurrency stack.

Also change how the program's loop runs so it exits when the API thread
dies.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-04 23:21:40 -05:00
kingbri
c82697fef2 API: Fix issues with concurrent requests and queueing
This is the first in many future commits that will overhaul the API
to be more robust and concurrent. The model is admin-first where the
admin can do anything in-case something goes awry.

Previously, calls to long running synchronous background tasks would
block the entire API, making it ignore any terminal signals until
generation is completed.

To fix this, levrage FastAPI's run_in_threadpool to offload the long
running tasks to another thread. However, signals to abort the process
still kept the background thread running and made the terminal hang.

This was due to an issue with Uvicorn not propegating the SIGINT signal
across threads in its event loop. To fix this in a catch-all way, run
the API processes in a separate thread so the main thread can still
kill the process if needed.

In addition, make request error logging more robust and refer to the
console for full error logs rather than creating a long message on the
client-side.

Finally, add state checks to see if a model is fully loaded before
generating a completion.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-04 23:21:40 -05:00
kingbri
b827bcbb44 Sampling: Cleanup and update
Cleanup how overrides are handled, class naming, and adopt exllamav2's
model class to enforce latest stable version methods rather than
adding multiple backwards compatability checks.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-02-02 23:36:17 -05:00
kingbri
78f920eeda Tree: Refactor code organization
Move common functions into their own folder and refactor the backends
to use their own folder as well.

Also cleanup imports and alphabetize import statments themselves.

Finally, move colab and docker into their own folders as well.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-01-25 00:15:40 -05:00
Renamed from utils.py (Browse further)