The GPU reserve is used as a VRAM buffer to prevent GPU overflow when automatically deciding how to load a model on multiple GPUs. Make this configurable. Signed-off-by: kingbri <bdashore3@proton.me> |
||
|---|---|---|
| .. | ||
| chat_completion.py | ||
| common.py | ||
| completion.py | ||
| lora.py | ||
| model.py | ||
| sampler_overrides.py | ||
| template.py | ||
| token.py | ||