Model: Adjust draft_gpu_split and add to config
The previous code overrode the existing gpu split and device idx values. This now sets an independent draft_gpu_split value and adjusts the gpu_devices check only if the draft_gpu_split array is larger than the gpu_split array. Draft gpu split is not Tensor Parallel, and defaults to gpu_split_auto if a split is not provided. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
This commit is contained in:
parent
bd8256d168
commit
beb6d8faa5
3 changed files with 22 additions and 7 deletions
|
|
@ -351,6 +351,13 @@ class DraftModelConfig(BaseConfigModel):
|
|||
f"Possible values: {str(CACHE_SIZES)[15:-1]}."
|
||||
),
|
||||
)
|
||||
draft_gpu_split: List[float] = Field(
|
||||
default_factory=list,
|
||||
description=(
|
||||
"An integer array of GBs of VRAM to split between GPUs (default: []).\n"
|
||||
"If this isn't filled in, the draft model is autosplit."
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
class LoraInstanceModel(BaseConfigModel):
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue