diff --git a/README.md b/README.md index 63caef6..2d8c386 100644 --- a/README.md +++ b/README.md @@ -1,127 +1,5 @@ -# TabbyAPI +# Tabby API Fork -

- Python 3.10, 3.11, and 3.12 - - License: AGPL v3 - - - Discord Server - -

+Added extra endpoints to enable Open Webui / LibreChat -

- - Developer facing API documentation - -

- -

- - Support on Ko-Fi - -

- -> [!IMPORTANT] -> -> In addition to the README, please read the [Wiki](https://github.com/theroyallab/tabbyAPI/wiki/1.-Getting-Started) page for information about getting started! - -> [!NOTE] -> -> Need help? Join the [Discord Server](https://discord.gg/sYQxnuD7Fj) and get the `Tabby` role. Please be nice when asking questions. - -> [!NOTE] -> -> Want to run GGUF models? Take a look at [YALS](https://github.com/theroyallab/YALS), TabbyAPI's sister project. - -A FastAPI based application that allows for generating text using an LLM (large language model) using the [Exllamav2](https://github.com/turboderp-org/exllamav2) and [Exllamav3](https://github.com/turboderp-org/exllamav3) backends. - -TabbyAPI is also the official API backend server for ExllamaV2 and V3. - -## Disclaimer - -This project is marked as rolling release. There may be bugs and changes down the line. Please be aware that you might need to reinstall dependencies if needed. - -TabbyAPI is a hobby project made for a small amount of users. It is not meant to run on production servers. For that, please look at other solutions that support those workloads. - -## Getting Started - -> [!IMPORTANT] -> -> Looking for more information? Check out the Wiki. - -For a step-by-step guide, choose the format that works best for you: - -📖 Read the [Wiki](https://github.com/theroyallab/tabbyAPI/wiki/01.-Getting-Started) – Covers installation, configuration, API usage, and more. - -🎥 Watch the [Video Guide](https://www.youtube.com/watch?v=03jYz0ijbUU) – A hands-on walkthrough to get you up and running quickly. - -## Features - -- OpenAI compatible API -- Loading/unloading models -- HuggingFace model downloading -- Embedding model support -- JSON schema + Regex + EBNF support -- AI Horde support -- Speculative decoding via draft models -- Multi-lora with independent scaling (ex. a weight of 0.9) -- Inbuilt proxy to override client request parameters/samplers -- Flexible Jinja2 template engine for chat completions that conforms to HuggingFace -- Concurrent inference with asyncio -- Utilizes modern python paradigms -- Continuous batching engine using paged attention -- Fast classifier-free guidance -- OAI style tool/function calling - -And much more. If something is missing here, PR it in! - -## Supported Model Types - -TabbyAPI uses Exllama as a powerful and fast backend for model inference, loading, etc. Therefore, the following types of models are supported: - -- Exl2 (Highly recommended) - -- Exl3 (Highly recommended) - -- GPTQ - -- FP16 (using Exllamav2's loader) - -In addition, TabbyAPI supports parallel batching using paged attention for Nvidia Ampere GPUs and higher. - -## Contributing - -Use the template when creating issues or pull requests, otherwise the developers may not look at your post. - -If you have issues with the project: - -- Describe the issue in detail - -- If you have a feature request, please indicate it as such. - -If you have a Pull Request - -- Describe the pull request in detail, what, and why you are changing something - -## Acknowldgements - -TabbyAPI would not exist without the work of other contributors and FOSS projects: - -- [ExllamaV2](https://github.com/turboderp-org/exllamav2) -- [ExllamaV3](https://github.com/turboderp-org/exllamav3) -- [Aphrodite Engine](https://github.com/PygmalionAI/Aphrodite-engine) -- [infinity-emb](https://github.com/michaelfeil/infinity) -- [FastAPI](https://github.com/fastapi/fastapi) -- [Text Generation WebUI](https://github.com/oobabooga/text-generation-webui) -- [SillyTavern](https://github.com/SillyTavern/SillyTavern) - -## Developers and Permissions - -Creators/Developers: - -- [kingbri](https://github.com/bdashore3) - -- [Splice86](https://github.com/Splice86) - -- [Turboderp](https://github.com/turboderp) +You can select the initial model but due to tabby/Exllama switching models won't free up the memory.