## Prerequisites To get started, make sure you have the following installed on your system: - [Python 3.x](https://www.python.org/downloads/release/python-31210/) (preferably 3.12) with pip - Do NOT install python from the Microsoft store! This will cause issues with pip. - Alternatively, you can use uv (preferred) or miniconda if it's present on your system. > [!NOTE] > Prefer a video guide? Watch the step-by-step tutorial on [YouTube](https://www.youtube.com/watch?v=03jYz0ijbUU) > [!WARNING] > CUDA and ROCm aren't prerequisites because torch can install them for you. However, if this doesn't work (ex. DLL load failed), install the CUDA toolkit or ROCm on your system. > > - [CUDA 12.x](https://developer.nvidia.com/cuda-downloads) > > - [ROCm 6.1](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.1.0/how-to/prerequisites.html) > > [!WARNING] > Sometimes there may be an error with Windows that VS build tools needs to be installed. This means that there's a package that isn't supported for your python version. > You can install [VS build tools 17.8](https://aka.ms/vs/17/release.ltsc.17.8/vs_buildtools.exe) and build the wheel locally. In addition, open an issue stating that a dependency is building a wheel. ## Installing ### For Beginners 1. Clone this repository to your machine: `git clone https://github.com/theroyallab/tabbyAPI` 2. Navigate to the project directory: `cd tabbyAPI` 3. Run the appropriate start script (`start.bat` for Windows and `start.sh` for linux). 1. Follow the on-screen instructions and select the correct GPU library. 2. Assuming that the prerequisites are installed and can be located, a virtual environment will be created for you and dependencies will be installed. 4. The API should start with no model loaded. Please read more to see how to download a model. ### For Advanced Users 1. Follow steps 1-2 in the [For Beginners](#for-beginners) section 2. Create a python environment through venv: 1. `python -m venv venv` 2. Activate the venv 1. On Windows: `.\venv\Scripts\activate` 2. On Linux: `source venv/bin/activate` 3. Install the pyproject features based on your system: 1. Cuda 12.x: `pip install -U .[cu12]` 2. ROCm 5.6: `pip install -U .[amd]` 4. Start the API by either 1. Run `start.bat/sh`. The script will check if you're in a conda environment and skip venv checks. 2. Run `python main.py` to start the API. This won't automatically upgrade your dependencies. ## Download a Model TabbyAPI includes a built-in Hugging Face downloader that works via both the API and terminal. You can use the following command to download a repository with a specific branch revision: `.\Start.bat download --revision ` Example with Turboderp's Llama 3.1 8B quants: `.\Start.bat download turboderp/Qwen2.5-VL-7B-Instruct-exl2 --revision 4.0bpw` If a model is gated, you can provide a HuggingFace access token (most exl2 quants aren't private): `.\Start.bat download meta-llama/Llama-3.1-8B --token ` Alternatively, running `main.py` directly can also trigger the downloader. For additional options, run `.\Start.bat download --help` ## Configuration Loading solely the API may not be your optimal usecase. Therefore, a config.yml exists to tune initial launch parameters and other configuration options. A config.yml file is required for overriding project defaults. **If you are okay with the defaults, you don't need a config file!** If you do want a config file, copy over `config_sample.yml` to `config.yml`. All the fields are commented, so make sure to read the descriptions and comment out or remove fields that you don't need. > [!WARNING] > Due to frontends not sending sampler settings per request, tabbyAPI sets a safe defaults sampler override in config_sample.yml. If you are testing metrics or experimenting, please remove `safe_defaults` from the `override_preset` key! In addition, if you want to manually set the API keys, copy over `api_keys_sample.yml` to `api_keys.yml` and fill in the fields. However, doing this is less secure and autogenerated keys should be used instead. You can also access the configuration parameters under [2. Configuration](https://github.com/theroyallab/tabbyAPI/wiki/2.-Configuration) in this wiki! ## Where next? 1. Take a look at the [usage docs](https://github.com/theroyallab/tabbyAPI/wiki/03.-Usage) 2. Get started with [community projects](https://github.com/theroyallab/tabbyAPI/wiki/09.-Community-Projects): Find loaders, UIs, and more created by the wider AI community. Any OAI compatible client is also supported. ## Updating There are a couple ways to update TabbyAPI: 1. **Update scripts** - Inside the update_scripts folder, you can run the following scripts: 1. `update_deps`: Updates dependencies to their latest versions. 2. `update_deps_and_pull`: Updates dependencies and pulls the latest commit of the Github repository. These scripts exit after running their respective tasks. To start TabbyAPI, run `start.bat` or `start.sh`. 2. **Manual** - Install the pyproject features and update dependencies depending on your GPU: 1. `pip install -U .[cu12]` = CUDA 12.x 2. `pip install -U .[amd]` = ROCm 6.0 If you don't want to update dependencies that come from wheels (torch, exllamav2, and flash attention 2), use `pip install .` or pass the `--nowheel` flag when invoking the start scripts. ### Update Exllamav2 > [!WARNING] > These instructions are meant for advanced users. > [!IMPORTANT] > If you're installing a custom Exllamav2 wheel, make sure to use `pip install .` when updating! Otherwise, each update will overwrite your custom exllamav2 version. NOTE: - TabbyAPI enforces the latest Exllamav2 version for compatibility purposes. - Any upgrades using a pyproject gpu lib feature will result in overwriting your installed wheel. - To fix this, change the feature in `pyproject.toml` locally, create an issue or PR, or install your version of exllamav2 after upgrades. Here are ways to install exllamav2: 1. From a [wheel/release](https://github.com/turboderp/exllamav2#method-2-install-from-release-with-prebuilt-extension) (Recommended) 1. Find the version that corresponds with your cuda and python version. For example, a wheel with `cu121` and `cp311` corresponds to CUDA 12.1 and python 3.11 2. From [pip](https://github.com/turboderp/exllamav2#method-3-install-from-pypi): `pip install exllamav2` 2. This is a JIT compiled extension, which means that the initial launch of tabbyAPI will take some time. The build may also not work due to improper environment configuration. 3. From [source](https://github.com/turboderp/exllamav2#method-1-install-from-source) ## Other installation methods These are short-form instructions for other methods that users can use to install TabbyAPI. > [!WARNING] > Using methods other than venv may not play nice with startup scripts. Using these methods indicates that you're an advanced user and know what you're doing. ### Uv > [!NOTE] > Uv is the preferred way to handle python. It's recommended to use it over conda and its derivatives. 1. Install [Uv](https://docs.astral.sh/uv/getting-started/installation/#installation-methods) from astral's website 2. Continue installation steps from: 1. [For Beginners](#for-beginners) - Step 3. The start scripts detect if uv is installed and runs the appropriate commands. 2. [For Advanced Users](#For-advanced-users) - Step 3 ### Conda 1. Install [Miniconda3](https://docs.conda.io/projects/miniconda/en/latest/miniconda-other-installer-links.html) with python 3.11 as your base python 2. Create a new conda environment `conda create -n tabbyAPI python=3.11` 3. Activate the conda environment `conda activate tabbyAPI` 4. Install optional dependencies if they aren't present 1. CUDA via 1. CUDA 12 - `conda install -c "nvidia/label/cuda-12.4.1" cuda` 2. Git via `conda install -k git` 5. Clone TabbyAPI via `git clone https://github.com/theroyallab/tabbyAPI` 6. Continue installation steps from: 1. [For Beginners](#for-beginners) - Step 3. The start scripts detect if you're in a conda environment and skips the venv check. 2. [For Advanced Users](#For-advanced-users) - Step 3 ### Docker > [!NOTE] > If you are planning to use custom versions of dependencies such as dev ExllamaV2, make sure to build the Docker image yourself! 1. Install Docker and docker compose from the [docs](https://docs.docker.com/compose/install/ 2. Install the Nvidia container compatibility layer 1. For Linux: [Nvidia container toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) 2. For Windows: [Cuda Toolkit on WSL](https://docs.nvidia.com/cuda/wsl-user-guide/index.html) 3. Clone TabbyAPI via `git clone https://github.com/theroyallab/tabbyAPI` 4. Enter the tabbyAPI directory by `cd tabbyAPI`. 1. Optional: Set up a config.yml or api_tokens.yml ([configuration](#configuration)) 5. Update the volume mount section in the `docker/docker-compose.yml` file ```yml volumes:   # - /path/to/models:/app/models                       # Change me   # - /path/to/config.yml:/app/config.yml               # Change me   # - /path/to/api_tokens.yml:/app/api_tokens.yml       # Change me ``` 6. Optional: If you'd like to build the dockerfile from source, follow the instructions below in `docker/docker-compose.yml`: ```yml     # Uncomment this to build a docker image from source     #build:     #  context: ..     #  dockerfile: ./docker/Dockerfile     # Comment this to build a docker image from source     image: ghcr.io/theroyallab/tabbyapi:latest ``` 7. Run `docker compose -f docker/docker-compose.yml up` to build the dockerfile and start the server.