Model: Default max_seq_len to 4096

A common problem in TabbyAPI is that users who want to get up and
running with a model always had issues with max_seq_len causing OOMs.
This is because model devs set max context values in the millions which
requires a lot of VRAM.

To idiot-proof first time setup, make the fallback default 4096 so
users can run their models. If a user still wants to use the model's
max_seq_len, set it to -1.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
This commit is contained in:
kingbri 2025-06-13 14:12:03 -04:00
parent 322f9b773a
commit 2096c9bad2
5 changed files with 20 additions and 12 deletions

View file

@ -39,12 +39,11 @@ class GenerationConfig(BaseModel):
class HuggingFaceConfig(BaseModel):
"""
DEPRECATED: Currently a stub and doesn't do anything.
An abridged version of HuggingFace's model config.
Will be expanded as needed.
"""
max_position_embeddings: int = 4096
eos_token_id: Optional[Union[int, List[int]]] = None
quantization_config: Optional[Dict] = None