Update README
Signed-off-by: kingbri <bdashore3@proton.me>
This commit is contained in:
parent
60eb076b43
commit
c0525c042e
1 changed files with 21 additions and 3 deletions
24
README.md
24
README.md
|
|
@ -6,6 +6,12 @@ A FastAPI based application that allows for generating text using an LLM (large
|
|||
|
||||
This API is still in the alpha phase. There may be bugs and changes down the line. Please be aware that you might need to reinstall dependencies if needed.
|
||||
|
||||
### Help Wanted
|
||||
|
||||
Please check the issues page for issues that contributors can help on. We appreciate all contributions. Please read the contributions section for more details about issues and pull requests.
|
||||
|
||||
If you want to add samplers, add them in the [exllamav2 library](https://github.com/turboderp/exllamav2) and then link them to tabbyAPI.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
To get started, make sure you have the following installed on your system:
|
||||
|
|
@ -30,15 +36,27 @@ NOTE: For Flash Attention 2 to work on Windows, CUDA 12.1 **must** be installed!
|
|||
|
||||
4. Install torch using the instructions found [here](https://pytorch.org/get-started/locally/)
|
||||
|
||||
5. Install an exllamav2 wheel from [here](https://github.com/turboderp/exllamav2/releases):
|
||||
5. Install exllamav2 (must be v0.0.8 or greater!)
|
||||
|
||||
1. Find the version that corresponds with your cuda and python version. For example, a wheel with `cu121` and `cp311` corresponds to CUDA 12.1 and python 3.11
|
||||
1. From a [wheel/release](https://github.com/turboderp/exllamav2#method-2-install-from-release-with-prebuilt-extension) (Recommended)
|
||||
|
||||
1. Find the version that corresponds with your cuda and python version. For example, a wheel with `cu121` and `cp311` corresponds to CUDA 12.1 and python 3.11
|
||||
|
||||
2. From [pip](https://github.com/turboderp/exllamav2#method-3-install-from-pypi): `pip install exllamav2`
|
||||
|
||||
1. This is a JIT compiled extension, which means that the initial launch of tabbyAPI will take some time. The build may also not work due to improper environment configuration.
|
||||
|
||||
3. From [source](https://github.com/turboderp/exllamav2#method-1-install-from-source)
|
||||
|
||||
6. Install the other requirements via: `pip install -r requirements.txt`
|
||||
|
||||
7. If you want the `/v1/chat/completions` endpoint to work with a list of messages, install fastchat by running `pip install fschat[model_worker]`
|
||||
|
||||
## Configuration
|
||||
|
||||
Copy over `config_sample.yml` to `config.yml`. All the fields are commented, so make sure to read the descriptions and comment out or remove fields that you don't need.
|
||||
A config.yml file is required for overriding project defaults. If you are okay with the defaults, you don't need a config file!
|
||||
|
||||
If you do want a config file, copy over `config_sample.yml` to `config.yml`. All the fields are commented, so make sure to read the descriptions and comment out or remove fields that you don't need.
|
||||
|
||||
## Launching the Application
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue