tabbyAPI-ollama/docs/01.-Getting-Started.md
kingbri 4036c70d75 Tree: Format
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-08-19 22:59:26 -04:00

192 lines
9.6 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

## Prerequisites
To get started, make sure you have the following installed on your system:
- [Python 3.x](https://www.python.org/downloads/release/python-31210/) (preferably 3.12) with pip
- Do NOT install python from the Microsoft store! This will cause issues with pip.
- Alternatively, you can use uv (preferred) or miniconda if it's present on your system.
> [!NOTE]
> Prefer a video guide? Watch the step-by-step tutorial on [YouTube](https://www.youtube.com/watch?v=03jYz0ijbUU)
> [!WARNING]
> CUDA and ROCm aren't prerequisites because torch can install them for you. However, if this doesn't work (ex. DLL load failed), install the CUDA toolkit or ROCm on your system.
>
> - [CUDA 12.x](https://developer.nvidia.com/cuda-downloads)
>
> - [ROCm 6.1](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.1.0/how-to/prerequisites.html)
>
> [!WARNING]
> Sometimes there may be an error with Windows that VS build tools needs to be installed. This means that there's a package that isn't supported for your python version.
> You can install [VS build tools 17.8](https://aka.ms/vs/17/release.ltsc.17.8/vs_buildtools.exe) and build the wheel locally. In addition, open an issue stating that a dependency is building a wheel.
## Installing
### For Beginners
1. Clone this repository to your machine: `git clone https://github.com/theroyallab/tabbyAPI`
2. Navigate to the project directory: `cd tabbyAPI`
3. Run the appropriate start script (`start.bat` for Windows and `start.sh` for linux).
1. Follow the on-screen instructions and select the correct GPU library.
2. Assuming that the prerequisites are installed and can be located, a virtual environment will be created for you and dependencies will be installed.
4. The API should start with no model loaded. Please read more to see how to download a model.
### For Advanced Users
1. Follow steps 1-2 in the [For Beginners](#for-beginners) section
2. Create a python environment through venv:
1. `python -m venv venv`
2. Activate the venv
1. On Windows: `.\venv\Scripts\activate`
2. On Linux: `source venv/bin/activate`
3. Install the pyproject features based on your system:
1. Cuda 12.x: `pip install -U .[cu12]`
2. ROCm 5.6: `pip install -U .[amd]`
4. Start the API by either
1. Run `start.bat/sh`. The script will check if you're in a conda environment and skip venv checks.
2. Run `python main.py` to start the API. This won't automatically upgrade your dependencies.
## Download a Model
TabbyAPI includes a built-in Hugging Face downloader that works via both the API and terminal. You can use the following command to download a repository with a specific branch revision:
`.\Start.bat download <repo name> --revision <branch>`
Example with Turboderp's Llama 3.1 8B quants:
`.\Start.bat download turboderp/Qwen2.5-VL-7B-Instruct-exl2 --revision 4.0bpw`
If a model is gated, you can provide a HuggingFace access token (most exl2 quants aren't private):
`.\Start.bat download meta-llama/Llama-3.1-8B --token <token>`
Alternatively, running `main.py` directly can also trigger the downloader. For additional options, run `.\Start.bat download --help`
## Configuration
Loading solely the API may not be your optimal usecase. Therefore, a config.yml exists to tune initial launch parameters and other configuration options.
A config.yml file is required for overriding project defaults. **If you are okay with the defaults, you don't need a config file!**
If you do want a config file, copy over `config_sample.yml` to `config.yml`. All the fields are commented, so make sure to read the descriptions and comment out or remove fields that you don't need.
> [!WARNING]
> Due to frontends not sending sampler settings per request, tabbyAPI sets a safe defaults sampler override in config_sample.yml. If you are testing metrics or experimenting, please remove `safe_defaults` from the `override_preset` key!
In addition, if you want to manually set the API keys, copy over `api_keys_sample.yml` to `api_keys.yml` and fill in the fields. However, doing this is less secure and autogenerated keys should be used instead.
You can also access the configuration parameters under [2. Configuration](https://github.com/theroyallab/tabbyAPI/wiki/2.-Configuration) in this wiki!
## Where next?
1. Take a look at the [usage docs](https://github.com/theroyallab/tabbyAPI/wiki/03.-Usage)
2. Get started with [community projects](https://github.com/theroyallab/tabbyAPI/wiki/09.-Community-Projects): Find loaders, UIs, and more created by the wider AI community. Any OAI compatible client is also supported.
## Updating
There are a couple ways to update TabbyAPI:
1. **Update scripts** - Inside the update_scripts folder, you can run the following scripts:
1. `update_deps`: Updates dependencies to their latest versions.
2. `update_deps_and_pull`: Updates dependencies and pulls the latest commit of the Github repository.
These scripts exit after running their respective tasks. To start TabbyAPI, run `start.bat` or `start.sh`.
2. **Manual** - Install the pyproject features and update dependencies depending on your GPU:
1. `pip install -U .[cu12]` = CUDA 12.x
2. `pip install -U .[amd]` = ROCm 6.0
If you don't want to update dependencies that come from wheels (torch, exllamav2, and flash attention 2), use `pip install .` or pass the `--nowheel` flag when invoking the start scripts.
### Update Exllamav2
> [!WARNING]
> These instructions are meant for advanced users.
> [!IMPORTANT]
> If you're installing a custom Exllamav2 wheel, make sure to use `pip install .` when updating! Otherwise, each update will overwrite your custom exllamav2 version.
NOTE:
- TabbyAPI enforces the latest Exllamav2 version for compatibility purposes.
- Any upgrades using a pyproject gpu lib feature will result in overwriting your installed wheel.
- To fix this, change the feature in `pyproject.toml` locally, create an issue or PR, or install your version of exllamav2 after upgrades.
Here are ways to install exllamav2:
1. From a [wheel/release](https://github.com/turboderp/exllamav2#method-2-install-from-release-with-prebuilt-extension) (Recommended)
1. Find the version that corresponds with your cuda and python version. For example, a wheel with `cu121` and `cp311` corresponds to CUDA 12.1 and python 3.11
2. From [pip](https://github.com/turboderp/exllamav2#method-3-install-from-pypi): `pip install exllamav2`
2. This is a JIT compiled extension, which means that the initial launch of tabbyAPI will take some time. The build may also not work due to improper environment configuration.
3. From [source](https://github.com/turboderp/exllamav2#method-1-install-from-source)
## Other installation methods
These are short-form instructions for other methods that users can use to install TabbyAPI.
> [!WARNING]
> Using methods other than venv may not play nice with startup scripts. Using these methods indicates that you're an advanced user and know what you're doing.
### Uv
> [!NOTE]
> Uv is the preferred way to handle python. It's recommended to use it over conda and its derivatives.
1. Install [Uv](https://docs.astral.sh/uv/getting-started/installation/#installation-methods) from astral's website
2. Continue installation steps from:
1. [For Beginners](#for-beginners) - Step 3. The start scripts detect if uv is installed and runs the appropriate commands.
2. [For Advanced Users](#For-advanced-users) - Step 3
### Conda
1. Install [Miniconda3](https://docs.conda.io/projects/miniconda/en/latest/miniconda-other-installer-links.html) with python 3.11 as your base python
2. Create a new conda environment `conda create -n tabbyAPI python=3.11`
3. Activate the conda environment `conda activate tabbyAPI`
4. Install optional dependencies if they aren't present
1. CUDA via
1. CUDA 12 - `conda install -c "nvidia/label/cuda-12.4.1" cuda`
2. Git via `conda install -k git`
5. Clone TabbyAPI via `git clone https://github.com/theroyallab/tabbyAPI`
6. Continue installation steps from:
1. [For Beginners](#for-beginners) - Step 3. The start scripts detect if you're in a conda environment and skips the venv check.
2. [For Advanced Users](#For-advanced-users) - Step 3
### Docker
> [!NOTE]
> If you are planning to use custom versions of dependencies such as dev ExllamaV2, make sure to build the Docker image yourself!
1. Install Docker and docker compose from the [docs](https://docs.docker.com/compose/install/
2. Install the Nvidia container compatibility layer
1. For Linux: [Nvidia container toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
2. For Windows: [Cuda Toolkit on WSL](https://docs.nvidia.com/cuda/wsl-user-guide/index.html)
3. Clone TabbyAPI via `git clone https://github.com/theroyallab/tabbyAPI`
4. Enter the tabbyAPI directory by `cd tabbyAPI`.
1. Optional: Set up a config.yml or api_tokens.yml ([configuration](#configuration))
5. Update the volume mount section in the `docker/docker-compose.yml` file
```yml
volumes:
  # - /path/to/models:/app/models                       # Change me
  # - /path/to/config.yml:/app/config.yml               # Change me
  # - /path/to/api_tokens.yml:/app/api_tokens.yml       # Change me
```
6. Optional: If you'd like to build the dockerfile from source, follow the instructions below in `docker/docker-compose.yml`:
```yml
    # Uncomment this to build a docker image from source
    #build:
    #  context: ..
    #  dockerfile: ./docker/Dockerfile
    # Comment this to build a docker image from source
    image: ghcr.io/theroyallab/tabbyapi:latest
```
7. Run `docker compose -f docker/docker-compose.yml up` to build the dockerfile and start the server.