* Exposed draft model args for speculative decoding * Exposed int8 cache, dummy models, and no flash attention * Resolved CUDA 11.8 dependency issue
:3
*note: this uses wheels for python 3.10 and torch 2.1.0+cu118 which is the current default in colab