The previous code overrode the existing gpu split and device idx
values. This now sets an independent draft_gpu_split value and
adjusts the gpu_devices check only if the draft_gpu_split array
is larger than the gpu_split array.
Draft gpu split is not Tensor Parallel, and defaults to gpu_split_auto
if a split is not provided.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>