Brian
729caaeddc
Merge pull request #346 from gakada/main
...
Exl3: some models aren't functional without add_bos?
2025-05-17 22:05:15 -04:00
kingbri
0646d358a2
Main: Log auth and sampler overrides after model load
...
Like YALS, logging all pertinent information after model load makes
it easier to parse by the user.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-17 18:10:34 -04:00
kingbri
54b8a20a19
API: Fix types for chat completions
...
Messages were mistakenly being sent as Pydantic objects, but templates
expect dictionaries. Properly convert these before render.
In addition, initialize all Optional lists as an empty list since
this will cause the least problems when interacting with other parts
of API code, such as templates.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-17 18:10:34 -04:00
gakada
ba6248eec0
Exl3: fix add_bos in generator
2025-05-17 19:10:49 +09:00
Brian
81170eee00
Merge pull request #312 from davidallada/add-file-based-logging
...
Add file based logging in addition to the normal console logs
2025-05-17 01:24:19 -04:00
kingbri
17f3dca6fc
Packaging: Add agnostic method to check version of packages
...
Some packages such as ExllamaV2 and V3 require specific versions for
the latest features. Rather than creating repetitive functions, create
an agnostic function to check the installed package and then report
to the user to upgrade.
This is also sent to requests for loading and unloading, so keep the
error short.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-17 01:04:24 -04:00
kingbri
084916c04f
Model: Fix autosplit reserve crash with GPU split
...
ExllamaV3 does not accept autosplit_reserve and gpu_split at the same
time.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-17 00:51:14 -04:00
kingbri
0858b6d4b2
Tree: Format
...
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-17 00:46:40 -04:00
kingbri
fa534fe551
Dependencies: Update Ruff
...
v0.11.10
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-17 00:46:25 -04:00
kingbri
390daeb92f
Model: Create universal HFModel class
...
The HFModel class serves to coalesce all config files that contain
random keys which are required for model usage.
Adding this base class allows us to expand as HuggingFace randomly
changes their JSON schemas over time, reducing the brunt that backend
devs need to feel when their next model isn't supported.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-13 18:12:38 -04:00
kingbri
7900b72848
API: Add chat_template_kwargs alias for template_vars
...
This key is used in VLLM and SGLang.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-12 15:48:39 -04:00
kingbri
c9dc0b2aa4
Dependencies: Bump ExllamaV3 and ExllamaV2
...
v0.0.2 and v0.3.0 respectively
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-12 15:29:31 -04:00
kingbri
bd3fec929c
Tree: Format
...
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-12 11:32:27 -04:00
kingbri
a524ac3c0f
Model: Fix cache mode again
...
If statements can be difficult to work with.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-12 11:30:47 -04:00
kingbri
20cad851e9
Model: Fix param call
...
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-12 09:52:28 -04:00
kingbri
d15eb55f20
Model: Fix exl2 cache mode check
...
FP16 was not included in the validation step.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-12 09:51:09 -04:00
kingbri
8996dc7b02
API: Add default for backend in model load request
...
Should be None so pydantic doesn't complain.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-12 09:51:09 -04:00
Brian
b555eeb6e7
Merge pull request #339 from Maaaxiii/fix/tool-calling-embeddings
...
fix: Aligned Parameter Name in chat completions generate_tool_calls
2025-05-11 20:41:58 -04:00
kingbri
f4adca1f3e
API: Remove default fallback from backend param
...
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-11 09:56:53 -04:00
Brian
3674d7b9b5
Merge pull request #341 from theroyallab/exl3
...
Exl3
2025-05-10 23:43:02 -04:00
kingbri
6379081dd8
Sampling: Make add_bos_token override concise
...
Also set the default to None so text completions follows the same
pattern.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-10 19:07:35 -04:00
kingbri
656af41b5d
Model: Always enable decode_special_tokens
...
The frontend should handle the special tokens if they get emitted.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-09 22:25:50 -04:00
kingbri
83826b56be
Main: Remove unnecessary import
...
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-09 22:14:11 -04:00
kingbri
42346c6b39
Sampling: Remove skip_special_tokens
...
This parameter is way too confusing and does not make sense in
the modern LLM space.
Change approved by all maintainers.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-09 22:11:33 -04:00
kingbri
25c77ebf77
Model: Remove exllamav2-specific version check
...
No longer necessary thanks to the agnostic check.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-09 22:08:15 -04:00
kingbri
48ea1737cf
Startup: Check agnostically for inference deps
...
If an inference dep isn't present, force exit the application. This
occurs after all subcommands have been appropriately processed.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-09 21:59:00 -04:00
kingbri
33ac016023
Dependencies: Add ExllamaV3
...
v0.0.1
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-09 21:42:07 -04:00
Brian
f26ca23f1a
Merge pull request #336 from DocShotgun/backend-detect
...
Automatically select model backend based on config.json
2025-05-09 01:56:44 -04:00
Brian
02a8d68e17
Merge branch 'exl3' into backend-detect
2025-05-08 23:50:33 -04:00
kingbri
d5963007f0
Model: Add backend print
...
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-08 23:45:04 -04:00
kingbri
cfee16905b
Model: Migrate backend detection to a separate function
...
Seemed out of place in the common load function. In addition, rename
the transformers utils signature which actually takes a directory
instead of a file.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-08 23:42:39 -04:00
Brian
527afc206b
Merge pull request #329 from DocShotgun/exl3
...
Exllamav3 cache quantization
2025-05-08 23:11:45 -04:00
kingbri
638eef401a
Model: Move cache creation to a common function
...
Prevents repetitiveness while also creating a Cache class.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-08 23:10:03 -04:00
Maximilian Klem
22f7f1e1ec
fix: flipped parameter name with variable name
2025-05-07 21:04:30 +02:00
DocShotgun
f8070e7707
Model: Auto detect model backend from config
...
* Use exllamav3 for exl3 models, exllamav2 otherwise
2025-05-06 18:51:58 -07:00
DocShotgun
9dcde59c57
Model: Check for unsupported cache mode in exllamav2
2025-05-06 01:18:15 -07:00
kingbri
bc0a84241a
API: Patch kobold generation call
...
Calling the model requires different args now.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-05 22:11:21 -04:00
kingbri
b683545d0e
Config: Fix argparse help
...
Adding a comma in the description converts the string to a tuple,
which isn't parseable by argparse's help.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-05 21:52:30 -04:00
turboderp
ff38305145
Common: Fix exception f-string
2025-05-05 02:01:16 +02:00
DocShotgun
45b966363e
Tree: Format
2025-05-03 21:01:03 -07:00
DocShotgun
a635a719d7
Model: Enable draft model q-cache in Exl3
...
* Remove unneeded default fp16 cache layer import
2025-05-03 20:59:36 -07:00
DocShotgun
58e34ba4c5
Model: Exl3 cache quant settings lenient with whitespace
2025-05-03 20:35:35 -07:00
DocShotgun
68a660bdb3
Model: Initial Exl3 cache quantization support
2025-05-03 20:35:35 -07:00
turboderp
036af02bf6
Common: No default add_bos_token value for chat completion requests
2025-05-04 05:25:58 +02:00
turboderp
92ea7ee7cd
Model: Add draft model/speculative decoding
2025-05-04 01:27:42 +02:00
turboderp
1db2cb99cb
Model: Avoid initializing class variables
2025-05-04 01:26:42 +02:00
turboderp
0405a94a89
Model: Cast penalty range to int
2025-05-03 22:28:36 +02:00
turboderp
58c380b8ca
Model: Create generator on load
2025-05-03 18:33:37 +02:00
turboderp
0d949d00b9
Model: Set default max_batch_size
2025-05-03 18:33:37 +02:00
turboderp
8c75b29923
Model: Fix some warnings
2025-05-03 18:33:36 +02:00