Update README

Signed-off-by: kingbri <bdashore3@proton.me>
This commit is contained in:
kingbri 2023-11-16 12:03:51 -05:00
parent 60eb076b43
commit c0525c042e

View file

@ -6,6 +6,12 @@ A FastAPI based application that allows for generating text using an LLM (large
This API is still in the alpha phase. There may be bugs and changes down the line. Please be aware that you might need to reinstall dependencies if needed.
### Help Wanted
Please check the issues page for issues that contributors can help on. We appreciate all contributions. Please read the contributions section for more details about issues and pull requests.
If you want to add samplers, add them in the [exllamav2 library](https://github.com/turboderp/exllamav2) and then link them to tabbyAPI.
## Prerequisites
To get started, make sure you have the following installed on your system:
@ -30,15 +36,27 @@ NOTE: For Flash Attention 2 to work on Windows, CUDA 12.1 **must** be installed!
4. Install torch using the instructions found [here](https://pytorch.org/get-started/locally/)
5. Install an exllamav2 wheel from [here](https://github.com/turboderp/exllamav2/releases):
5. Install exllamav2 (must be v0.0.8 or greater!)
1. Find the version that corresponds with your cuda and python version. For example, a wheel with `cu121` and `cp311` corresponds to CUDA 12.1 and python 3.11
1. From a [wheel/release](https://github.com/turboderp/exllamav2#method-2-install-from-release-with-prebuilt-extension) (Recommended)
1. Find the version that corresponds with your cuda and python version. For example, a wheel with `cu121` and `cp311` corresponds to CUDA 12.1 and python 3.11
2. From [pip](https://github.com/turboderp/exllamav2#method-3-install-from-pypi): `pip install exllamav2`
1. This is a JIT compiled extension, which means that the initial launch of tabbyAPI will take some time. The build may also not work due to improper environment configuration.
3. From [source](https://github.com/turboderp/exllamav2#method-1-install-from-source)
6. Install the other requirements via: `pip install -r requirements.txt`
7. If you want the `/v1/chat/completions` endpoint to work with a list of messages, install fastchat by running `pip install fschat[model_worker]`
## Configuration
Copy over `config_sample.yml` to `config.yml`. All the fields are commented, so make sure to read the descriptions and comment out or remove fields that you don't need.
A config.yml file is required for overriding project defaults. If you are okay with the defaults, you don't need a config file!
If you do want a config file, copy over `config_sample.yml` to `config.yml`. All the fields are commented, so make sure to read the descriptions and comment out or remove fields that you don't need.
## Launching the Application