tabbyAPI-ollama/config_sample.yml
kingbri e0e93c103b Sample config: Uncomment all parameters
This helps clarify things when users are configuring for the first
time. For example, some users were putting the model name in the
"model" block instead of the "model_name" field.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-19 01:12:07 -05:00

64 lines
2.3 KiB
YAML

# Sample YAML file for configuration.
# Comment out values as needed. Every value has a default within the application.
# Unless specified in the comments, DO NOT put these options in quotes!
# You can use https://www.yamllint.com/ if you want to check your YAML formatting.
# Options for networking
network:
# The IP to host on (default: 127.0.0.1).
# Use 0.0.0.0 to expose on all network adapters
host: 127.0.0.1
# The port to host on (default: 5000)
port: 5000
# Options for model overrides and loading
model:
# Overrides the directory to look for models (default: models)
# Windows users, DO NOT put this path in quotes! This directory will be invalid otherwise.
model_dir: your model directory path
# An initial model to load. Make sure the model is located in the model directory!
# A model can be loaded later via the API.
model_name: A model name
# Set the following to enable speculative decoding
# draft_model_dir: your model directory path to use as draft model (path is independent from model_dir)
draft_rope_alpha: 1.0 (default: the draft model's alpha value is calculated automatically to scale to the size of the full model.)
# The below parameters apply only if model_name is set
# Maximum model context length (default: 4096)
max_seq_len: 4096
# Automatically allocate resources to GPUs (default: True)
gpu_split_auto: True
# An integer array of GBs of vram to split between GPUs (default: [])
gpu_split: [20.6, 24]
# Rope scaling parameters (default: 1.0)
rope_scale: 1.0
rope_alpha: 1.0
# Disable Flash-attention 2. Set to True for GPUs lower than Nvidia's 3000 series. (default: False)
no_flash_attention: False
# Enable low vram optimizations in exllamav2 (default: False)
low_mem: False
# Enable 8 bit cache mode for VRAM savings (slight performance hit). Possible values FP16, FP8. (default: FP16)
cache_mode: FP16
# Options for draft models (speculative decoding). This will use more VRAM!
draft:
# Overrides the directory to look for draft (default: models)
draft_model_dir: Your draft model directory path
# An initial draft model to load. Make sure this model is located in the model directory!
# A draft model can be loaded later via the API.
draft_model_name: A model name
# Rope parameters for draft models (default: 1.0)
draft_rope_alpha: 1.0