DocShotgun
f8070e7707
Model: Auto detect model backend from config
...
* Use exllamav3 for exl3 models, exllamav2 otherwise
2025-05-06 18:51:58 -07:00
turboderp
92ea7ee7cd
Model: Add draft model/speculative decoding
2025-05-04 01:27:42 +02:00
turboderp
1db2cb99cb
Model: Avoid initializing class variables
2025-05-04 01:26:42 +02:00
turboderp
0405a94a89
Model: Cast penalty range to int
2025-05-03 22:28:36 +02:00
turboderp
58c380b8ca
Model: Create generator on load
2025-05-03 18:33:37 +02:00
turboderp
0d949d00b9
Model: Set default max_batch_size
2025-05-03 18:33:37 +02:00
turboderp
8c75b29923
Model: Fix some warnings
2025-05-03 18:33:36 +02:00
kingbri
15cc480cb0
Exl3: Simplify add_bos_token handling
...
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-02 21:50:42 -04:00
randoentity
d8a8ccfc2a
Model: fix add_bos_token
2025-05-02 21:33:25 -04:00
kingbri
0d02af3c81
Model: Set model_dir on init
...
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-02 21:33:25 -04:00
kingbri
c89bea030e
Model: Add template fetching to Exl3
...
Use the same functionality as exl2's loader.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-02 21:33:25 -04:00
kingbri
e8f00412f6
Model: Fetch from generation_config and tokenizer_config in Exl3
...
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-02 21:33:25 -04:00
kingbri
bdc5189a4b
Exl3: Add chunk size, cache size, and model info
...
Use the same algorithm for estimating and adjusting cache size based
on multiples of 256 and above max seq len.
Same applies for chunk size.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-02 21:33:25 -04:00
kingbri
303e2dde12
Model: Correct exl3 generation, add concurrency, and cleanup
...
Fixes application of sampler parameters by adding a new sampler builder
interface. Also expose the generator class-wide and add wait_for_jobs.
Finally, allow inline loading to specify the backend.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-02 21:33:25 -04:00
randoentity
c744790f14
fixup: add sampler logs
...
Also passing sampler to job with this, no idea if this is correct
2025-05-02 21:33:25 -04:00
randoentity
b35c48da37
fixup: some metrics
2025-05-02 21:33:25 -04:00
randoentity
c0f268f33e
fixup: autosplit, start work on metrics
2025-05-02 21:33:25 -04:00
randoentity
306fc7cd15
fixup: autosplit reserve
...
this probably breaks v2 support
2025-05-02 21:33:25 -04:00
randoentity
acb3adb953
fixup: auto split
2025-05-02 21:33:25 -04:00
randoentity
14fb573371
fixup: max_seq_len
...
Whoops
2025-05-02 21:33:25 -04:00
randoentity
daae9ec43d
Exl3: Couldn't wait
...
Just copied some stuff around and it ended up working for basic use.
2025-05-02 21:33:25 -04:00
kingbri
b4ff2f23cf
Exl3: Add token encode, decode, and special token fetch
...
Base class methods
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-02 21:32:53 -04:00
kingbri
0c1d794390
Model: Add exl3 and associated load functions
...
Initial exl3 compat and loading functionality.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-02 21:32:39 -04:00