Model: Dynamically scale generate_window

Allows for adjustment of reservation space at the end of the context
before rolling it. This should be scaled as a model's max_seq_len
goes up.

Signed-off-by: kingbri <bdashore3@proton.me>
This commit is contained in:
kingbri 2024-01-24 01:26:38 -05:00 committed by Brian Dashore
parent b14c5443fd
commit 740b0215dd

View file

@ -554,7 +554,9 @@ class ExllamaV2Container:
token_healing = unwrap(kwargs.get("token_healing"), False)
max_tokens = unwrap(kwargs.get("max_tokens"), 150)
stream_interval = unwrap(kwargs.get("stream_interval"), 0)
generate_window = min(unwrap(kwargs.get("generate_window"), 512), max_tokens)
generate_window = max(
unwrap(kwargs.get("generate_window"), 512), max_tokens // 8
)
# Sampler settings
gen_settings = ExLlamaV2Sampler.Settings()