Model: Remove num_experts_per_token

This shouldn't even be an exposed option since changing it always breaks inference with the model. Let the model's config.json handle it. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-03-19 11:52:10 -04:00 · 2025-03-19 11:52:10 -04:00 · 79f9c6e854
commit 79f9c6e854
parent 698d8339cb
6 changed files with 0 additions and 30 deletions
--- a/colab/TabbyAPI_Colab_Example.ipynb
+++ b/colab/TabbyAPI_Colab_Example.ipynb
@ -194,11 +194,6 @@
        "  # NOTE: Only works with chat completion message lists!\n",
        "  prompt_template: {PromptTemplate}\n",
        "\n",
-        "  # Number of experts to use per token. Loads from the model's config.json if not specified (default: None)\n",
-        "  # WARNING: Don't set this unless you know what you're doing!\n",
-        "  # NOTE: For MoE models (ex. Mixtral) only!\n",
-        "  num_experts_per_token: {NumExpertsPerToken}\n",
-        "\n",
        "  # Options for draft models (speculative decoding). This will use more VRAM!\n",
        "  draft:\n",
        "    # Overrides the directory to look for draft (default: models)\n",