jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
kingbri	78f920eeda	Tree: Refactor code organization Move common functions into their own folder and refactor the backends to use their own folder as well. Also cleanup imports and alphabetize import statments themselves. Finally, move colab and docker into their own folders as well. Signed-off-by: kingbri <bdashore3@proton.me>	2024-01-25 00:15:40 -05:00
DocShotgun	7967607f12	Colab: Expose new config arguments	2023-12-22 01:53:13 -08:00
kingbri	5fbb37405f	Colab: Remove the pydantic hotfix Requirements.txt is now pinned to install pydantic >= 2.0.0 Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-20 00:01:58 -05:00
veryamazinglystupid	12bf7a0174	fix the colab, pydantic error :3	2023-12-19 19:46:57 +05:30
DocShotgun	7380a3b79a	Implement lora support (#24 ) * Model: Implement basic lora support * Add ability to load loras from config on launch * Supports loading multiple loras and lora scaling * Add function to unload loras * Colab: Update for basic lora support * Model: Test vram alloc after lora load, add docs * Git: Add loras folder to .gitignore * API: Add basic lora-related endpoints * Add /loras/ endpoint for querying available loras * Add /model/lora endpoint for querying currently loaded loras * Add /model/lora/load endpoint for loading loras * Add /model/lora/unload endpoint for unloading loras * Move lora config-checking logic to main.py for better compat with API endpoints * Revert bad CRLF line ending changes * API: Add basic lora-related endpoints (fixed) * Add /loras/ endpoint for querying available loras * Add /model/lora endpoint for querying currently loaded loras * Add /model/lora/load endpoint for loading loras * Add /model/lora/unload endpoint for unloading loras * Move lora config-checking logic to main.py for better compat with API endpoints * Model: Unload loras first when unloading model * API + Models: Cleanup lora endpoints and functions Condenses down endpoint and model load code. Also makes the routes behave the same way as model routes to help not confuse the end user. Signed-off-by: kingbri <bdashore3@proton.me> * Loras: Optimize load endpoint Return successes and failures along with consolidating the request to the rewritten load_loras function. Signed-off-by: kingbri <bdashore3@proton.me> --------- Co-authored-by: kingbri <bdashore3@proton.me> Co-authored-by: DocShotgun <126566557+DocShotgun@users.noreply.github.com>	2023-12-08 23:38:08 -05:00
kingbri	b83e1b704e	Requirements: Split for configurations Add self-contained requirements for cuda 11.8 and ROCm Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-06 00:00:30 -05:00
DocShotgun	39f7a2aabd	Expose draft_rope_scale	2023-12-05 12:59:32 -08:00
DocShotgun	67507105d0	Update colab, expose additional args * Exposed draft model args for speculative decoding * Exposed int8 cache, dummy models, and no flash attention * Resolved CUDA 11.8 dependency issue	2023-12-04 22:20:46 -08:00
veryamazinglystupid	ad1a12a0f2	make colab better, fix libcudart errors :3	2023-12-03 14:07:52 +05:30
DocShotgun	2a9e4ca051	Add Colab example *note: this uses wheels for python 3.10 and torch 2.1.0+cu118 which is the current default in colab	2023-12-03 02:21:51 -05:00

10 commits