Model: Add support for num_experts_by_token
New parameter that's safe to edit in exllamav2 v0.0.11. Only recommended for people who know what they're doing. Signed-off-by: kingbri <bdashore3@proton.me>
This commit is contained in:
parent
70fbee3edd
commit
ad8807a830
3 changed files with 16 additions and 1 deletions
|
|
@ -60,6 +60,11 @@ model:
|
|||
# NOTE: Only works with chat completion message lists!
|
||||
prompt_template:
|
||||
|
||||
# Number of experts to use per token. Loads from the model's config.json if not specified (default: None)
|
||||
# WARNING: Don't set this unless you know what you're doing!
|
||||
# NOTE: For MoE models (ex. Mixtral) only!
|
||||
num_experts_per_token:
|
||||
|
||||
# Options for draft models (speculative decoding). This will use more VRAM!
|
||||
draft:
|
||||
# Overrides the directory to look for draft (default: models)
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue