Commit graph

1018 commits

Author SHA1 Message Date
turboderp
d357f100d0 Dependencies: Bump ExllamaV3 2025-06-15 19:12:45 +02:00
turboderp
a0c16bba2a Exl2: Fix banned_strings (move outside of assign_gen_params) 2025-06-15 16:51:42 +02:00
kingbri
0ea56382f0 Dependencies: Fix unsupported dependency error
Log the package name provided to the check function.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-06-13 14:57:02 -04:00
kingbri
f4ee56ba13 Update README
Include ExllamaV3

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-06-13 14:57:01 -04:00
turboderp
691a080ac7 Dependencies: Bump ExllamaV3 and ExllamaV2 2025-05-31 23:55:04 +02:00
kingbri
2d89c96879 API: Re-add BOS token stripping in template render
Matching YALS, if the model has add_bos_token enabled, then remove
an extra BOS token at the start of the prompt. This usually happens
with misconfigured templates such as Llama 3.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-24 21:11:53 -04:00
kingbri
10fbe043a4 API: Fix typing for chat templates in CC requests
Tools must be None by default. Chat completion message content can
be None, a string, or a list, so default to None. Exclude all None
values from a CC message since the template can say the variable
"exists" despite being None, causing an error.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-24 21:06:05 -04:00
kingbri
0c4cc1eba3 Model: Add prompt logging to ExllamaV3
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-17 22:05:18 -04:00
Brian
729caaeddc
Merge pull request #346 from gakada/main
Exl3: some models aren't functional without add_bos?
2025-05-17 22:05:15 -04:00
kingbri
0646d358a2 Main: Log auth and sampler overrides after model load
Like YALS, logging all pertinent information after model load makes
it easier to parse by the user.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-17 18:10:34 -04:00
kingbri
54b8a20a19 API: Fix types for chat completions
Messages were mistakenly being sent as Pydantic objects, but templates
expect dictionaries. Properly convert these before render.

In addition, initialize all Optional lists as an empty list since
this will cause the least problems when interacting with other parts
of API code, such as templates.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-17 18:10:34 -04:00
gakada
ba6248eec0
Exl3: fix add_bos in generator 2025-05-17 19:10:49 +09:00
Brian
81170eee00
Merge pull request #312 from davidallada/add-file-based-logging
Add file based logging in addition to the normal console logs
2025-05-17 01:24:19 -04:00
kingbri
17f3dca6fc Packaging: Add agnostic method to check version of packages
Some packages such as ExllamaV2 and V3 require specific versions for
the latest features. Rather than creating repetitive functions, create
an agnostic function to check the installed package and then report
to the user to upgrade.

This is also sent to requests for loading and unloading, so keep the
error short.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-17 01:04:24 -04:00
kingbri
084916c04f Model: Fix autosplit reserve crash with GPU split
ExllamaV3 does not accept autosplit_reserve and gpu_split at the same
time.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-17 00:51:14 -04:00
kingbri
0858b6d4b2 Tree: Format
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-17 00:46:40 -04:00
kingbri
fa534fe551 Dependencies: Update Ruff
v0.11.10

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-17 00:46:25 -04:00
kingbri
390daeb92f Model: Create universal HFModel class
The HFModel class serves to coalesce all config files that contain
random keys which are required for model usage.

Adding this base class allows us to expand as HuggingFace randomly
changes their JSON schemas over time, reducing the brunt that backend
devs need to feel when their next model isn't supported.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-13 18:12:38 -04:00
kingbri
7900b72848 API: Add chat_template_kwargs alias for template_vars
This key is used in VLLM and SGLang.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-12 15:48:39 -04:00
kingbri
c9dc0b2aa4 Dependencies: Bump ExllamaV3 and ExllamaV2
v0.0.2 and v0.3.0 respectively

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-12 15:29:31 -04:00
kingbri
bd3fec929c Tree: Format
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-12 11:32:27 -04:00
kingbri
a524ac3c0f Model: Fix cache mode again
If statements can be difficult to work with.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-12 11:30:47 -04:00
kingbri
20cad851e9 Model: Fix param call
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-12 09:52:28 -04:00
kingbri
d15eb55f20 Model: Fix exl2 cache mode check
FP16 was not included in the validation step.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-12 09:51:09 -04:00
kingbri
8996dc7b02 API: Add default for backend in model load request
Should be None so pydantic doesn't complain.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-12 09:51:09 -04:00
Brian
b555eeb6e7
Merge pull request #339 from Maaaxiii/fix/tool-calling-embeddings
fix: Aligned Parameter Name in chat completions generate_tool_calls
2025-05-11 20:41:58 -04:00
kingbri
f4adca1f3e API: Remove default fallback from backend param
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-11 09:56:53 -04:00
Brian
3674d7b9b5
Merge pull request #341 from theroyallab/exl3
Exl3
2025-05-10 23:43:02 -04:00
kingbri
6379081dd8 Sampling: Make add_bos_token override concise
Also set the default to None so text completions follows the same
pattern.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-10 19:07:35 -04:00
kingbri
656af41b5d Model: Always enable decode_special_tokens
The frontend should handle the special tokens if they get emitted.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-09 22:25:50 -04:00
kingbri
83826b56be Main: Remove unnecessary import
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-09 22:14:11 -04:00
kingbri
42346c6b39 Sampling: Remove skip_special_tokens
This parameter is way too confusing and does not make sense in
the modern LLM space.

Change approved by all maintainers.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-09 22:11:33 -04:00
kingbri
25c77ebf77 Model: Remove exllamav2-specific version check
No longer necessary thanks to the agnostic check.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-09 22:08:15 -04:00
kingbri
48ea1737cf Startup: Check agnostically for inference deps
If an inference dep isn't present, force exit the application. This
occurs after all subcommands have been appropriately processed.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-09 21:59:00 -04:00
kingbri
33ac016023 Dependencies: Add ExllamaV3
v0.0.1

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-09 21:42:07 -04:00
Brian
f26ca23f1a
Merge pull request #336 from DocShotgun/backend-detect
Automatically select model backend based on config.json
2025-05-09 01:56:44 -04:00
Brian
02a8d68e17
Merge branch 'exl3' into backend-detect 2025-05-08 23:50:33 -04:00
kingbri
d5963007f0 Model: Add backend print
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-08 23:45:04 -04:00
kingbri
cfee16905b Model: Migrate backend detection to a separate function
Seemed out of place in the common load function. In addition, rename
the transformers utils signature which actually takes a directory
instead of a file.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-08 23:42:39 -04:00
Brian
527afc206b
Merge pull request #329 from DocShotgun/exl3
Exllamav3 cache quantization
2025-05-08 23:11:45 -04:00
kingbri
638eef401a Model: Move cache creation to a common function
Prevents repetitiveness while also creating a Cache class.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-08 23:10:03 -04:00
Maximilian Klem
22f7f1e1ec fix: flipped parameter name with variable name 2025-05-07 21:04:30 +02:00
DocShotgun
f8070e7707 Model: Auto detect model backend from config
* Use exllamav3 for exl3 models, exllamav2 otherwise
2025-05-06 18:51:58 -07:00
DocShotgun
9dcde59c57 Model: Check for unsupported cache mode in exllamav2 2025-05-06 01:18:15 -07:00
kingbri
bc0a84241a API: Patch kobold generation call
Calling the model requires different args now.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-05 22:11:21 -04:00
kingbri
b683545d0e Config: Fix argparse help
Adding a comma in the description converts the string to a tuple,
which isn't parseable by argparse's help.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-05 21:52:30 -04:00
turboderp
ff38305145 Common: Fix exception f-string 2025-05-05 02:01:16 +02:00
DocShotgun
45b966363e Tree: Format 2025-05-03 21:01:03 -07:00
DocShotgun
a635a719d7 Model: Enable draft model q-cache in Exl3
* Remove unneeded default fp16 cache layer import
2025-05-03 20:59:36 -07:00
DocShotgun
58e34ba4c5 Model: Exl3 cache quant settings lenient with whitespace 2025-05-03 20:35:35 -07:00