Model: Add support for num_experts_by_token

New parameter that's safe to edit in exllamav2 v0.0.11. Only recommended for people who know what they're doing. Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-17 18:03:01 -05:00 · 2023-12-17 18:03:01 -05:00 · ad8807a830
commit ad8807a830
parent 70fbee3edd
3 changed files with 16 additions and 1 deletions
--- a/config_sample.yml
+++ b/config_sample.yml
@ -60,6 +60,11 @@ model:
  # NOTE: Only works with chat completion message lists!
  prompt_template:

+  # Number of experts to use per token. Loads from the model's config.json if not specified (default: None)
+  # WARNING: Don't set this unless you know what you're doing!
+  # NOTE: For MoE models (ex. Mixtral) only!
+  num_experts_per_token:
+
  # Options for draft models (speculative decoding). This will use more VRAM!
  draft:
    # Overrides the directory to look for draft (default: models)