Model: Default max_seq_len to 4096
A common problem in TabbyAPI is that users who want to get up and running with a model always had issues with max_seq_len causing OOMs. This is because model devs set max context values in the millions which requires a lot of VRAM. To idiot-proof first time setup, make the fallback default 4096 so users can run their models. If a user still wants to use the model's max_seq_len, set it to -1. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
This commit is contained in:
parent
322f9b773a
commit
2096c9bad2
5 changed files with 20 additions and 12 deletions
|
|
@ -78,8 +78,8 @@ model:
|
|||
# Options: exllamav2, exllamav3
|
||||
backend:
|
||||
|
||||
# Max sequence length (default: Empty).
|
||||
# Fetched from the model's base sequence length in config.json by default.
|
||||
# Max sequence length (default: 4096).
|
||||
# Set to -1 to fetch from the model's config.json
|
||||
max_seq_len:
|
||||
|
||||
# Load model with tensor parallelism.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue