diff --git a/config_sample.yml b/config_sample.yml
index 745433b..a13e64e 100644
--- a/config_sample.yml
+++ b/config_sample.yml
@@ -95,6 +95,9 @@ model:
# Used with tensor parallelism.
gpu_split: []
+ # NOTE: If a model has YaRN rope scaling, it will automatically be enabled by ExLlama.
+ # rope_scale and rope_alpha settings won't apply in this case.
+
# Rope scale (default: 1.0).
# Same as compress_pos_emb.
# Use if the model was trained on long context with rope.
diff --git a/docs/02.-Server-options.md b/docs/02.-Server-options.md
index 6a4f1d9..b319f76 100644
--- a/docs/02.-Server-options.md
+++ b/docs/02.-Server-options.md
@@ -67,8 +67,8 @@ Note: Most of the options here will only apply on initial model load/startup (ep
| gpu_split_auto | Bool (True) | Automatically split the model across multiple GPUs. Manual GPU split isn't used if this is enabled. |
| autosplit_reserve | List[Int] ([96]) | Amount of empty VRAM to reserve when loading with autosplit.
Represented as an array of MB per GPU used. |
| gpu_split | List[Float] ([]) | Float array of GBs to split a model between GPUs. |
-| rope_scale | Float (1.0) | Adjustment for rope scale (or compress_pos_emb) |
-| rope_alpha | Float (None) | Adjustment for rope alpha. Leave blank to automatically calculate based on the max_seq_len. |
+| rope_scale | Float (1.0) | Adjustment for rope scale (or compress_pos_emb)
Note: If the model has YaRN support, this option will not apply. |
+| rope_alpha | Float (None) | Adjustment for rope alpha. Leave blank to automatically calculate based on the max_seq_len.
Note: If the model has YaRN support, this option will not apply. |
| cache_mode | String ("FP16") | Cache mode for the model.
Options: FP16, Q8, Q6, Q4 |
| cache_size | Int (max_seq_len) | Size of the K/V cache
Note: If using CFG, the cache size should be 2 * max_seq_len. |
| chunk_size | Int (2048) | Amount of tokens per chunk with ingestion. A lower value reduces VRAM usage at the cost of ingestion speed. |