Model + Config: Remove low_mem option
Low_mem doesn't work in exl2 and it was an experimental option to begin with. Keep the loading code commented out in case it gets fixed in the future. A better alternative is to use 8bit cache which works and helps save VRAM. Signed-off-by: kingbri <bdashore3@proton.me>
This commit is contained in:
parent
109e4223e0
commit
c67c9f6d66
2 changed files with 4 additions and 3 deletions
|
|
@ -45,9 +45,6 @@ model:
|
|||
# Disable Flash-attention 2. Set to True for GPUs lower than Nvidia's 3000 series. (default: False)
|
||||
no_flash_attention: False
|
||||
|
||||
# Enable low vram optimizations in exllamav2 (default: False)
|
||||
low_mem: False
|
||||
|
||||
# Enable 8 bit cache mode for VRAM savings (slight performance hit). Possible values FP16, FP8. (default: FP16)
|
||||
cache_mode: FP16
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue