Commit graph

6 commits

Author SHA1 Message Date
DocShotgun
7380a3b79a Implement lora support (#24)
* Model: Implement basic lora support

* Add ability to load loras from config on launch
* Supports loading multiple loras and lora scaling
* Add function to unload loras

* Colab: Update for basic lora support

* Model: Test vram alloc after lora load, add docs

* Git: Add loras folder to .gitignore

* API: Add basic lora-related endpoints

* Add /loras/ endpoint for querying available loras
* Add /model/lora endpoint for querying currently loaded loras
* Add /model/lora/load endpoint for loading loras
* Add /model/lora/unload endpoint for unloading loras
* Move lora config-checking logic to main.py for better compat with API endpoints

* Revert bad CRLF line ending changes

* API: Add basic lora-related endpoints (fixed)

* Add /loras/ endpoint for querying available loras
* Add /model/lora endpoint for querying currently loaded loras
* Add /model/lora/load endpoint for loading loras
* Add /model/lora/unload endpoint for unloading loras
* Move lora config-checking logic to main.py for better compat with API endpoints

* Model: Unload loras first when unloading model

* API + Models: Cleanup lora endpoints and functions

Condenses down endpoint and model load code. Also makes the routes
behave the same way as model routes to help not confuse the end user.

Signed-off-by: kingbri <bdashore3@proton.me>

* Loras: Optimize load endpoint

Return successes and failures along with consolidating the request
to the rewritten load_loras function.

Signed-off-by: kingbri <bdashore3@proton.me>

---------

Co-authored-by: kingbri <bdashore3@proton.me>
Co-authored-by: DocShotgun <126566557+DocShotgun@users.noreply.github.com>
2023-12-08 23:38:08 -05:00
kingbri
b83e1b704e Requirements: Split for configurations
Add self-contained requirements for cuda 11.8 and ROCm

Signed-off-by: kingbri <bdashore3@proton.me>
2023-12-06 00:00:30 -05:00
DocShotgun
39f7a2aabd
Expose draft_rope_scale 2023-12-05 12:59:32 -08:00
DocShotgun
67507105d0
Update colab, expose additional args
* Exposed draft model args for speculative decoding
* Exposed int8 cache, dummy models, and no flash attention
* Resolved CUDA 11.8 dependency issue
2023-12-04 22:20:46 -08:00
veryamazinglystupid
ad1a12a0f2
make colab better, fix libcudart errors
:3
2023-12-03 14:07:52 +05:30
DocShotgun
2a9e4ca051 Add Colab example
*note: this uses wheels for python 3.10 and torch 2.1.0+cu118 which is the current default in colab
2023-12-03 02:21:51 -05:00