Commit graph

56 commits

Author SHA1 Message Date
kingbri
067d63773e Config: Move sampling higher in the list
This has become a bigger priority with addition of the safe_defaults
noob proofing.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-08-18 22:55:03 -04:00
DocShotgun
6fb0c2cdbd Config: Update description for override_preset default
* We provide safe_defaults as a default in config_sample.yml but not internally
2025-08-18 12:39:52 -07:00
DocShotgun
998abe5ad1 Config: Enable safe sampler overrides by default
* Provides safe fallback samplers, intended for better out-of-the-box support for clients that do not pass sampler params
2025-08-18 12:32:28 -07:00
kingbri
43f9483bc4 Model: Add tensor_parallel_backend option
This allows for users to use nccl or native depending on the GPU setup.
NCCL is only available with Linux built wheels.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-08-17 22:35:10 -04:00
DocShotgun
102af306e5 Config: Remove developer arg cuda_malloc_backend
* cudaMallocAsync is now enabled by default on supported configurations
2025-08-01 10:59:13 -07:00
kingbri
2096c9bad2 Model: Default max_seq_len to 4096
A common problem in TabbyAPI is that users who want to get up and
running with a model always had issues with max_seq_len causing OOMs.
This is because model devs set max context values in the millions which
requires a lot of VRAM.

To idiot-proof first time setup, make the fallback default 4096 so
users can run their models. If a user still wants to use the model's
max_seq_len, set it to -1.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-06-13 14:57:24 -04:00
Brian
02a8d68e17
Merge branch 'exl3' into backend-detect 2025-05-08 23:50:33 -04:00
Brian
527afc206b
Merge pull request #329 from DocShotgun/exl3
Exllamav3 cache quantization
2025-05-08 23:11:45 -04:00
DocShotgun
f8070e7707 Model: Auto detect model backend from config
* Use exllamav3 for exl3 models, exllamav2 otherwise
2025-05-06 18:51:58 -07:00
DocShotgun
9dcde59c57 Model: Check for unsupported cache mode in exllamav2 2025-05-06 01:18:15 -07:00
kingbri
b683545d0e Config: Fix argparse help
Adding a comma in the description converts the string to a tuple,
which isn't parseable by argparse's help.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-05 21:52:30 -04:00
DocShotgun
58e34ba4c5 Model: Exl3 cache quant settings lenient with whitespace 2025-05-03 20:35:35 -07:00
DocShotgun
68a660bdb3 Model: Initial Exl3 cache quantization support 2025-05-03 20:35:35 -07:00
kingbri
303e2dde12 Model: Correct exl3 generation, add concurrency, and cleanup
Fixes application of sampler parameters by adding a new sampler builder
interface. Also expose the generator class-wide and add wait_for_jobs.

Finally, allow inline loading to specify the backend.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-02 21:33:25 -04:00
randoentity
306fc7cd15 fixup: autosplit reserve
this probably breaks v2 support
2025-05-02 21:33:25 -04:00
kingbri
7c6a053747 Model: Add option to select backend
Changing the backend switches the container that's used.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-02 21:32:39 -04:00
kingbri
529c90b93e Tree: Format and lint
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-03-19 11:55:02 -04:00
kingbri
d990bbc431 Args: Remove action arguments
Superseded by subcommands to perform the same action.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-03-19 11:53:47 -04:00
kingbri
79f9c6e854 Model: Remove num_experts_per_token
This shouldn't even be an exposed option since changing it always
breaks inference with the model. Let the model's config.json handle
it.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-03-19 11:52:10 -04:00
Brian
2e491472d1
Merge pull request #254 from lucyknada/main
add draft_gpu_split option for spec decoding
2025-02-11 16:48:03 -05:00
kingbri
30f02e5453 Main: Remove uvloop/winloop from experimental status
Uvloop/Winloop does provide advantages to asyncio vs the standard
Proactor loop, so remove experimental status.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-02-10 21:30:48 -05:00
kingbri
beb6d8faa5 Model: Adjust draft_gpu_split and add to config
The previous code overrode the existing gpu split and device idx
values. This now sets an independent draft_gpu_split value and
adjusts the gpu_devices check only if the draft_gpu_split array
is larger than the gpu_split array.

Draft gpu split is not Tensor Parallel, and defaults to gpu_split_auto
if a split is not provided.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-02-08 16:09:46 -05:00
kingbri
0fadb1e5e8 Merge branch 'main' into vision 2024-11-19 21:19:21 -05:00
DocShotgun
c42655336b Config: Add option to disable fetching content from URLs 2024-11-17 23:05:17 -08:00
kingbri
bd9e78e19e API: Add inline exception for dummy models
If an API key sends a dummy model, it shouldn't error as the server
is catering to clients that expect specific OAI model names. This
is a problem with inline model loading since these names would error
by default. Therefore, add an exception if the provided name is in the
dummy model names (which also doubles as inline strict exceptions).

However, the dummy model names weren't configurable, so add a new
option to specify exception names, otherwise the default is gpt-3.5-turbo.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-11-17 21:15:45 -05:00
kingbri
69ac0eb8aa Model: Add vision loading support
Adds the ability to load vision parts of text + image models. Requires
an explicit flag in config because there isn't a way to automatically
determine whether the vision tower should be used.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-11-11 12:10:11 -05:00
DocShotgun
603760cecb Model: Remove override_base_seq_len 2024-10-30 10:03:08 +08:00
kingbri
126a44483c Tree: Remove fasttensors
Now a noop in upstream.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-30 00:18:47 -04:00
kingbri
b30336c75b Tree: Format
Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-18 21:42:01 -04:00
kingbri
edf3a00310 Config: Make API server literals case insensitive
There's no native way to handle case insensitivity in pydantic, so
add a validator which converts the API server input to be lowercase.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-18 21:39:18 -04:00
kingbri
63634beb5e Config: Clarify Rope alpha options
Leaving blank will use the model's set value or auto-calculate.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-17 23:03:28 -04:00
kingbri
a34bd9a684 Config: Alter YAML generation script for formatting adherence
Properly add comments and newlines where they need to go.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-17 22:44:42 -04:00
TerminalMan
bb4dd7200e fix defaults for api_servers 2024-09-17 15:41:32 +01:00
kingbri
63f8c46a92 Config: Make a better description for lora config
This is not ideal because users may still have trouble understanding
what a lora includes, but adding an example comment will help instead
of leaving a blank line.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-16 23:29:39 -04:00
kingbri
f6fb60a6ed Config: Inline model loading is False
This is not a True default.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-16 22:54:35 -04:00
kingbri
46f9fff210 Config: Move config file generation to tabby_config
Keep the models as a separate reference file without any extra
functions.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-16 22:22:24 -04:00
kingbri
e60c4ba5bc Config: Fix existing value check
If a sub-field exists in the model provided to the file generator,
use it. Otherwise always fallback to the default factory. This prevents
any subsequent errors from setting None.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-16 17:51:40 -04:00
kingbri
81ae461eb8 Config: Allow existing values to get included in generated file
Allows for generation from an existing config file. Primarily used
for migration purposes.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-16 12:19:58 -04:00
TerminalMan
564bdcf0a8 add legacy config converter 2024-09-16 14:12:47 +01:00
kingbri
b6dd21f737 Config: Handle default factories in config generation
Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-16 00:55:46 -04:00
kingbri
3340c3bf2f Config: Rewrite descriptions
This makes both config.yml and args more descriptive than before.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-16 00:55:14 -04:00
kingbri
4c8bb42ec1 Config: Reorder models
It makes sense for the LLM model groups to be clustered around
each other with the least used groups towards the bottom.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-16 00:55:14 -04:00
kingbri
8ff9f2c6c0 Config: Rewrite docstrings for models
Adheres to the old config.yml's descriptions and allows for newlines
in generated YAML.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-16 00:55:14 -04:00
kingbri
250d76f5c6 Config: Alter YAML generator function
These changes fix the amount and order of newlines to look pleasing
for the user. However, the changes used in here are kind of hacky
and need a proper fix that can contain the same level of efficiency.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-16 00:55:11 -04:00
TerminalMan
92af656705 improve config generation action 2024-09-15 17:50:37 +01:00
kingbri
d013729b7d Config: Add aliases for logging config
Config.yml and args take in two different values.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-14 21:56:16 -04:00
kingbri
a09dd802c2 Config: Cleanup and organize functions
Remove access of private attributes and use safer functions. Also
move generalized functions into utils files.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-14 21:48:39 -04:00
TerminalMan
0903f852db add export openAPI to config 2024-09-15 00:17:36 +01:00
TerminalMan
533e7c9119 remove unnecessary code 2024-09-14 22:49:37 +01:00
TerminalMan
dc4946b565 make pydantic do all the validation 2024-09-13 10:21:27 +01:00