Main: Enable cudaMallocAsync backend by default

Works on cuda 12.4 and up. If CUDA doesn't exist, then don't enable the backend. This is an env var that needs to be set, so it's not really possible to set it via config.yml. This used to be experimental, but it's probably fine to keep it enabled since it only provides a benefit. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-07-27 22:29:46 -04:00 · 2025-07-27 22:29:46 -04:00 · 113643c0df
commit 113643c0df
parent 0b4ca567f8
2 changed files with 5 additions and 7 deletions
--- a/docs/02.-Server-options.md
+++ b/docs/02.-Server-options.md
@ -47,7 +47,6 @@ Note: These are experimental flags that may be removed at any point.
 | ------------------------- | -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------- |
 | unsafe_launch             | Bool (False)   | Skips dependency checks on startup. Only recommended for debugging.                                                                             |
 | disable_request_streaming | Bool (False)   | Forcefully disables streaming requests                                                                                                          |
-| cuda_malloc_backend       | Bool (False)   | Uses pytorch's CUDA malloc backend to load models. Helps save VRAM.<br><br>Safe to enable.                                                      |
 | realtime_process_priority | Bool (False)   | Set the process priority to "Realtime". Administrator/sudo access required, otherwise the priority is set to the highest it can go in userland. |

 ### Model Options