Commit graph

46 commits

Author SHA1 Message Date
kingbri
0858b6d4b2 Tree: Format
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-17 00:46:40 -04:00
kingbri
390daeb92f Model: Create universal HFModel class
The HFModel class serves to coalesce all config files that contain
random keys which are required for model usage.

Adding this base class allows us to expand as HuggingFace randomly
changes their JSON schemas over time, reducing the brunt that backend
devs need to feel when their next model isn't supported.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-13 18:12:38 -04:00
kingbri
d5963007f0 Model: Add backend print
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-08 23:45:04 -04:00
kingbri
cfee16905b Model: Migrate backend detection to a separate function
Seemed out of place in the common load function. In addition, rename
the transformers utils signature which actually takes a directory
instead of a file.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-08 23:42:39 -04:00
DocShotgun
f8070e7707 Model: Auto detect model backend from config
* Use exllamav3 for exl3 models, exllamav2 otherwise
2025-05-06 18:51:58 -07:00
turboderp
ff38305145 Common: Fix exception f-string 2025-05-05 02:01:16 +02:00
kingbri
0c1d794390 Model: Add exl3 and associated load functions
Initial exl3 compat and loading functionality.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-02 21:32:39 -04:00
kingbri
7c6a053747 Model: Add option to select backend
Changing the backend switches the container that's used.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-02 21:32:39 -04:00
kingbri
f4757d31bd Model: Raise a 503 exception with model checks
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-04-25 00:00:15 -04:00
kingbri
f070587e9f Model: Add proper jobs cleanup and fix var calls
Jobs should be started and immediately cleaned up when calling the
generation stream. Expose a stream_generate function and append
this to the base class since it's more idiomatic than generate_gen.

The exl2 container's generate_gen function is now internal.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-04-24 21:30:55 -04:00
kingbri
d26260b332 Model: Add fixes for kwargs and add note for migration
One goal is to try migrating away from kwargs and use the ModelLoadRequest
instead. However, Pydantic doesn't support async validators making
parsing of the inline config impossible due to its use of aiofiles.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-04-21 22:39:07 -04:00
kingbri
8e238fa8f6 Model: Move calculate_rope_alpha from backend
Makes more sense to use as a utility function. Also clarify how the
vars are set.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-04-20 18:20:19 -04:00
kingbri
b751e0a1d5 Model: Move inline overrides to common
This is applied across containers. Doesn't make sense to put this method
in the backend.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-04-20 17:51:57 -04:00
kingbri
034682fcf1 Backends: Add base model container
Base class for all model containers. Used in the shared model file
for interface.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-04-20 17:24:10 -04:00
kingbri
552a64c723 Model: Have load take the highest priority
The admin takes priority over the regular user. Therefore, if a model
is loading, ignore all incoming generation requests

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-04-18 22:08:48 -04:00
kingbri
c49047eea1 Model: Fix load packets
The model_type internal reference was changed to an enum for
a more extendable loading process. Return the current model type
when loading a new model.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-11-21 18:06:47 -05:00
kingbri
69ac0eb8aa Model: Add vision loading support
Adds the ability to load vision parts of text + image models. Requires
an explicit flag in config because there isn't a way to automatically
determine whether the vision tower should be used.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-11-11 12:10:11 -05:00
kingbri
e0ffa90865 Dependencies: Change handling of exllamav2 checks
ExllamaV2 should check for solely exllamav2, otherwise errors don't
make sense. Migrate the combined "exl2" computed property to "inference"
since those are the required dependencies for minimal inference.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-22 12:57:28 -04:00
TerminalMan
3aeddc5255
fix issues with optional dependencies (#204)
* fix issues with optional dependencies

* format document

* Tree: Format and comment
2024-09-19 22:24:55 -04:00
kingbri
b9e5693c1b API + Model: Apply config.yml defaults for all load paths
There are two ways to load a model:
1. Via the load endpoint
2. Inline with a completion

The defaults were not applying on the inline load, so rewrite to fix
that. However, while doing this, set up a defaults dictionary rather
than comparing it at runtime and remove the pydantic default lambda
on all the model load fields.

This makes the code cleaner and establishes a clear config tree for
loading models.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-10 23:35:35 -04:00
kingbri
2c3bc71afa Tree: Switch to asynchronous file handling
Using aiofiles, there's no longer a possiblity of blocking file operations
that can hang up the event loop. In addition, partially migrate
classes to use asynchronous init instead of the normal python magic method.

The only exception is config, since that's handled in the synchonous
init before the event loop starts.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-10 16:45:14 -04:00
kingbri
93872b34d7 Config: Migrate to global class instead of dicts
The config categories can have defined separation, but preserve
the dynamic nature of adding new config options by making all the
internal class vars as dictionaries.

This was necessary since storing global callbacks stored a state
of the previous global_config var that wasn't populated.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-04 23:18:47 -04:00
kingbri
4aebe8a2a5 Config: Use an explicit "auto" value for rope_alpha
Using "auto" for rope alpha removes ambiguity on how to explicitly
enable automatic rope calculation. The same behavior of None -> auto
calculate still exists, but can be overwritten if a model's tabby_config.yml
includes `rope_alpha`.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-31 22:59:56 -04:00
kingbri
a96fa5f138 API: Don't fallback to default values on model load request
It's best to pass them down the config stack.

API/User config.yml -> model config.yml -> model config.json -> fallback.

Doing this allows for seamless flow and yielding control to each
member in the stack.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-31 22:59:56 -04:00
kingbri
dd55b99af5 Model: Store directory paths
Storing a pathlib type makes it easier to manipulate the model
directory path in the long run without constantly fetching it
from the config.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-31 22:59:56 -04:00
kingbri
2a33ebbf29 Model: Bypass lock checks when shutting down
Previously, when a SIGINT was emitted and a model load is running,
the API didn't shut down until the load finished due to waitng for
the lock. However, when shutting down, the lock doesn't matter since
the process is being killed anyway.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-03 16:05:34 -04:00
kingbri
3e42211c3e Config: Embeddings: Make embeddings_device a default when API loading
When loading from the API, the fallback for embeddings_device will be
the same as the config.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-01 13:59:49 -04:00
kingbri
bfa011e0ce Embeddings: Add model management
Embedding models are managed on a separate backend, but are run
in parallel with the model itself. Therefore, manage this in a separate
container with separate routes.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-30 15:19:27 -04:00
kingbri
f13d0fb8b3 Embeddings: Add model load checks
Same as the normal model container.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-30 11:17:36 -04:00
kingbri
fbf1455db1 Embeddings: Migrate and organize Infinity
Use Infinity as a separate backend and handle the model within the
common module. This separates out the embeddings model from the endpoint
which allows for model loading/unloading in core.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-30 11:00:23 -04:00
kingbri
64c2cc85c9 OAI: Migrate model depends into proper file
Use amongst multiple routers.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-23 13:59:56 -04:00
kingbri
c7ce97f119 Tree: Ruff lint
Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-08 15:06:28 -04:00
kingbri
6613e38436 Main: Make openapi export store locally
This runs faster than always making a syscall to check if the env
var is set.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-08 14:54:06 -04:00
kingbri
ae66e8f9ba Ruff: Lint
Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-08 13:44:12 -04:00
kingbri
b907421285 Main: Fix launch if EXPORT_OPENAPI is unset
A default needs to be provided with getenv. Fix that with an empty
string.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-08 13:41:44 -04:00
kingbri
933268f7e2 API: Integrate OpenAPI export script
Move OpenAPI export as an env var within the main function. This
allows for easy export by running main.

In addition, an env variable provides global and explicit state to
disable conditional wheel imports (ex. Exl2 and torch) which caused
errors at first.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-08 12:34:32 -04:00
kingbri
27d2d5f3d2 Config + Model: Allow for default fallbacks from config for model loads
Previously, the parameters under the "model" block in config.yml only
handled the loading of a model on startup. This meant that any subsequent
API request required each parameter to be filled out or use a sane default
(usually defaults to the model's config.json).

However, there are cases where admins may want an argument from the
config to apply if the parameter isn't provided in the request body.
To help alleviate this, add a mechanism that works like sampler overrides
where users can specify a flag that acts as a fallback.

Therefore, this change both preserves the source of truth of what
parameters the admin is loading and adds some convenience for users
that want customizable defaults for their requests.

This behavior may change in the future, but I think it solves the
issue for now.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-06 17:50:58 -04:00
turboderp
0eb8fa5d1e
[fix] Bring draft progress and model progress in sync with model loader (#125)
* Bring draft progress and model progress in sync with model loader

* Fix formatting
2024-06-03 19:41:02 +02:00
kingbri
43cd7f57e8 API + Model: Add blocks and checks for various load requests
Add a sequential lock and wait until jobs are completed before executing
any loading requests that directly alter the model. However, we also
need to block any new requests that come in until the load is finished,
so add a condition that triggers once the lock is free.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-25 21:16:14 -04:00
kingbri
6dfcbbd813 Common: Migrate request utils to networking
Helps organize the project better. Utils is meant to be for simple
functions like unwrap.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-21 23:21:57 -04:00
kingbri
95e44c20d6 Model: Fix load if model didn't load properly
If the model didn't load properly, the container still exists until
unload is called. However, the name check still registered as true.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-16 23:23:31 -04:00
kingbri
2755fd1af0 API: Fix blocking iterator execution
Run these iterators on the background thread. On startup, the API
spawns a background thread as needed to run sync code on without blocking
the event loop.

Use asyncio's run_thread function since it allows for errors to be
propegated.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-16 23:23:31 -04:00
kingbri
7fded4f183 Tree: Switch to async generators
Async generation helps remove many roadblocks to managing tasks
using threads. It should allow for abortables and modern-day paradigms.

NOTE: Exllamav2 itself is not an asynchronous library. It's just
been added into tabby's async nature to allow for a fast and concurrent
API server. It's still being debated to run stream_ex in a separate
thread or manually manage it using asyncio.sleep(0)

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-16 23:23:31 -04:00
kingbri
6f03be9523 API: Split functions into their own files
Previously, generation function were bundled with the request function
causing the overall code structure and API to look ugly and unreadable.

Split these up and cleanup a lot of the methods that were previously
overlooked in the API itself.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-12 23:59:30 -04:00
kingbri
5a2de30066 Tree: Update to cleanup globals
Use the module singleton pattern to share global state. This can also
be a modified version of the Global Object Pattern. The main reason
this pattern is used is for ease of use when handling global state
rather than adding extra dependencies for a DI parameter.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-12 23:59:30 -04:00
kingbri
b373b25235 API: Move to ModelManager
This is a shared module  which manages the model container and provides
extra utility functions around it to help slim down the API.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-03-12 23:59:30 -04:00