Model: Remove num_experts_per_token

This shouldn't even be an exposed option since changing it always breaks inference with the model. Let the model's config.json handle it. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-03-19 11:52:10 -04:00 · 2025-03-19 11:52:10 -04:00 · 79f9c6e854
commit 79f9c6e854
parent 698d8339cb
6 changed files with 0 additions and 30 deletions
--- a/docs/02.-Server-options.md
+++ b/docs/02.-Server-options.md
@ -75,7 +75,6 @@ Note: Most of the options here will only apply on initial model load/startup (ep
 | max_batch_size        | Int (None)                       | The absolute maximum amount of prompts to process at one time. This value is automatically adjusted based on cache size.                                                                                                       |
 | prompt_template       | String (None)                    | Name of a jinja2 chat template to apply for this model. Must be located in the `templates` directory.                                                                                                                          |
 | vision                | Bool (False)                     | Enable vision support for the provided model (if it exists).                                                                                                                                                                   |
-| num_experts_per_token | Int (None)                       | Number of experts to use per-token for MoE models. Pulled from the config.json if not specified.                                                                                                                               |

 ### Draft Model Options