Model: Dynamically scale generate_window
Allows for adjustment of reservation space at the end of the context before rolling it. This should be scaled as a model's max_seq_len goes up. Signed-off-by: kingbri <bdashore3@proton.me>
This commit is contained in:
parent
b14c5443fd
commit
740b0215dd
1 changed files with 3 additions and 1 deletions
|
|
@ -554,7 +554,9 @@ class ExllamaV2Container:
|
|||
token_healing = unwrap(kwargs.get("token_healing"), False)
|
||||
max_tokens = unwrap(kwargs.get("max_tokens"), 150)
|
||||
stream_interval = unwrap(kwargs.get("stream_interval"), 0)
|
||||
generate_window = min(unwrap(kwargs.get("generate_window"), 512), max_tokens)
|
||||
generate_window = max(
|
||||
unwrap(kwargs.get("generate_window"), 512), max_tokens // 8
|
||||
)
|
||||
|
||||
# Sampler settings
|
||||
gen_settings = ExLlamaV2Sampler.Settings()
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue