Docs: Edit inline loading for breaking changes
Add the model key for the YAML examples. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
This commit is contained in:
parent
ab04a6ed60
commit
e77fa0b7a8
1 changed files with 6 additions and 7 deletions
|
|
@ -94,21 +94,20 @@ To get started, set `inline_model_loading` to `true` under the model block of co
|
|||
|
||||
Now to create a tabby config, let's say we have a model in our models directory called `Meta-Llama-3-8B-exl2`. Navigate into that model folder and create a file called `tabby_config.yml`
|
||||
|
||||
> [!NOTE]
|
||||
> The formatting for tabby_config.yml may change in the future for consistency with config.yml. Please keep an eye out for breaking changes.
|
||||
|
||||
Now, you can place any model load parameter from `/v1/model/load` into that file. Here's a simple example which changes the default `max_seq_len` to 8192 and sets a Q6 quantized cache:
|
||||
|
||||
```yml
|
||||
max_seq_len: 8192
|
||||
cache_mode: Q6
|
||||
model:
|
||||
max_seq_len: 8192
|
||||
cache_mode: Q6
|
||||
```
|
||||
|
||||
If you'd like to provide draft model options, you can add them under the `draft_model` key:
|
||||
|
||||
```yml
|
||||
max_seq_len: 8192
|
||||
cache_mode: Q6
|
||||
model:
|
||||
max_seq_len: 8192
|
||||
cache_mode: Q6
|
||||
draft_model:
|
||||
draft_model_name: TinyLlama-1B-32k-exl2
|
||||
draft_rope_scale: 1.0
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue