No description

Find a file

kingbri 5e8419ec0c OAI: Add chat completions endpoint Chat completions is the endpoint that will be used by OAI in the future. Makes sense to support it even though the completions endpoint will be used more often. Also unify common parameters between the chat completion and completion requests since they're very similar. Signed-off-by: kingbri <bdashore3@proton.me>		2023-11-16 01:06:07 -05:00
OAI	OAI: Add chat completions endpoint	2023-11-16 01:06:07 -05:00
.gitignore	Update .gitignore	2023-11-15 00:55:15 -05:00
auth.py	Auth: Fix init from YAML dict	2023-11-15 23:00:12 -05:00
config_sample.yml	OAI: Add API-based model loading/unloading and auth routes	2023-11-14 01:17:19 -05:00
main.py	OAI: Add chat completions endpoint	2023-11-16 01:06:07 -05:00
model.py	Model: Fix gpu split params	2023-11-15 00:55:15 -05:00
model_test.py	Model container with generator logic, initial	2023-11-11 02:53:00 +01:00
README.md	Updated readme	2023-11-14 21:17:03 -06:00
requirements.txt	Requirements: Add fastchat and override pydantic	2023-11-15 01:00:08 -05:00
utils.py	OAI: Add API-based model loading/unloading and auth routes	2023-11-14 01:17:19 -05:00

README.md

tabbyAPI

tabbyAPI is a FastAPI-based application that provides an API for generating text using a language model. This README provides instructions on how to launch and use the tabbyAPI.

Prerequisites

Before you get started, ensure you have the following prerequisites installed on your system:

Python 3.x (with pip)
Dependencies listed in requirements.txt

Installation

Clone the repository to your local machine:

git clone https://github.com/Splice86/tabbyAPI.git

Navigate to the project directory:

cd tabbyAPI

Create a virtual environment (optional but recommended):

python -m venv venv source venv/bin/activate

Install project dependencies using pip:

pip install -r requirements.txt

Install exllamav2 to your venv

git clone https://github.com/turboderp/exllamav2.git

cd exllamav2

pip install -r requirements.txt

python setup.py install

Launch the tabbyAPI Application

To start the tabbyAPI application, follow these steps:

Ensure you are in the project directory and the virtual environment is activated (if used).
Run the tabbyAPI application:

python main.py

The tabbyAPI application should now be running. You can access it by opening a web browser and navigating to http://localhost:8000 (if running locally).

Usage

The tabbyAPI application provides the following endpoint:

'/v1/model' Retrieves information about the currently loaded model.
'/v1/model/load' Loads a new model based on provided data and model configuration.
'/v1/model/unload' Unloads the currently loaded model from the system.
'/v1/completions' Use this endpoint to generate text based on the provided input data.

Example Request (using `curl`)

curl -X POST
-H "Content-Type: application/json"
-H "Authorization: Bearer 2261702e8a220c6c4671a264cd1236ce"
-d '{ "model": "airoboros-mistral2.2-7b-exl2", "prompt": ["A tabby","is"], "stream": true, "top_p": 0.73, "stop": "[", "max_tokens": 360, "temperature": 0.8, "mirostat_mode": 2, "mirostat_tau": 5, "mirostat_eta": 0.1 }'
http://127.0.0.1:8012/v1/completions

Parameter Guide

note This stuff still needs to be expanded and updated

{ "model": "airoboros-mistral2.2-7b-exl2", "prompt": ["A tabby","is"], "stream": true, "top_p": 0.73, "stop": "[", "max_tokens": 360, "temperature": 0.8, "mirostat_mode": 2, "mirostat_tau": 5, "mirostat_eta": 0.1 }

Model: "airoboros-mistral2.2-7b-exl2" This specifies the specific language model being used. It's essential for the API to know which model to employ for generating responses.

Prompt: ["Hello there! My name is", "Brian", "and I am", "an AI"] The prompt QUESTION why is it a list of strings instead of a single string? Stream: true Whether the response should be streamed back or not.

Top_p: 0.73 cumulative probability threshold

Stop: "[" The stop parameter defines a string that stops the generation.

Max_tokens: 360 This parameter determines the maximum number of tokens.

Temperature: 0.8 Temperature controls the randomness of the generated text.

Mirostat_mode: 2 ? Mirostat_tau: 5 ? Mirostat_eta: 0.1 ?