Commit graph

1036 commits

Author SHA1 Message Date
kingbri
282b5b2931 API: Fix responses and some params
Responses were not being properly sent as JSON. Only run pydantic's
JSON function on stream responses. FastAPI does the rest with static
responses.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 17:11:55 -05:00
kingbri
d8d61fa19b API: Add fallback if model isn't loaded
Most endpoints require the model to be loaded, so add a depends.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 12:20:35 -05:00
kingbri
c0525c042e Update README
Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 12:06:37 -05:00
kingbri
60eb076b43 Tree: Basic formatting and comments
Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 11:48:40 -05:00
kingbri
5defb1b0b4 Config: Fix errors when stuff doesn't exist
Add safe fallbacks if any part of the config tree doesn't exist. This
prevents random internal server errors from showing up.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 11:41:03 -05:00
kingbri
03f45cb0a3 Tree: Update documentation and configs
Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 02:30:33 -05:00
kingbri
2248705c4a Requirements: Don't force fastchat installation
Fastchat requires a lot of dependencies such as transformers, peft,
and accelerate which are heavy. This is not useful unless a user
wants to add a shim for the chat completion endpoint.

Instead, try importing fastchat and notify the console of the error.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 01:26:46 -05:00
kingbri
5e8419ec0c OAI: Add chat completions endpoint
Chat completions is the endpoint that will be used by OAI in the
future. Makes sense to support it even though the completions
endpoint will be used more often.

Also unify common parameters between the chat completion and completion
requests since they're very similar.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-16 01:06:07 -05:00
kingbri
593471a04d Auth: Fix init from YAML dict
A class can't have multiple constructors.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-15 23:00:12 -05:00
kingbri
1f444c8fb7 Requirements: Add fastchat and override pydantic
Use an older version of pydantic to stay compatible

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-15 01:00:08 -05:00
kingbri
bbb59d0747 Auth: Fix methods for writing and validation
These were not working properly. Make the YAML file get written
to properly and the validator to return a 401 when the bearer
token is invalid.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-15 00:55:15 -05:00
kingbri
cb8da7f092 Chore: Remove mistakenly committed file
Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-15 00:55:15 -05:00
kingbri
d0b6b11068 OAI: Make freq and presence pen floats
Also rename the completions typing file.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-15 00:55:15 -05:00
kingbri
126afdfdc2 Model: Fix gpu split params
GPU split auto is a bool and GPU split is an array of integers for
GBs to allocate per GPU.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-15 00:55:15 -05:00
kingbri
ea91d17a11 Api: Add ban_eos_token and add_bos_token support
Adds the ability for the client to specify whether to add the BOS
token and ban the EOS token.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-15 00:55:15 -05:00
kingbri
8fea5391a8 Api: Add token endpoints
Support for encoding and decoding with various parameters.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-15 00:55:15 -05:00
kingbri
2d741653c3 Update .gitignore
Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-15 00:55:15 -05:00
Splice86
fc14046318 Updated readme 2023-11-14 21:17:03 -06:00
Splice86
4fd7da8fb6 Updated readme 2023-11-14 21:16:24 -06:00
Splice86
a0cf65e88f Updated readme 2023-11-14 21:13:36 -06:00
kingbri
4670a77c26 API: Don't use response_class
This arg in routes caused many errors and isn't even needed for
responses.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-14 22:09:26 -05:00
kingbri
b625bface9 OAI: Add API-based model loading/unloading and auth routes
Models can be loaded and unloaded via the API. Also add authentication
to use the API and for administrator tasks.

Both types of authorization use different keys.

Also fix the unload function to properly free all used vram.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-14 01:17:19 -05:00
kingbri
47343e2f1a OAI: Add models support
The models endpoint fetches all the models that OAI has to offer.
However, since this is an OAI clone, just list the models inside
the user's configured model directory instead.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-13 21:38:34 -05:00
kingbri
eee8b642bd OAI: Implement completion API endpoint
Add support for /v1/completions with the option to use streaming
if needed. Also rewrite API endpoints to use async when possible
since that improves request performance.

Model container parameter names also needed rewrites as well and
set fallback cases to their disabled values.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-13 18:31:26 -05:00
turboderp
4fa4386275 Add new samplers 2023-11-12 08:12:08 +01:00
kingbri
a10c14d357 Config: Switch to YAML and add load progress
YAML is a more flexible format when it comes to configuration. Commandline
arguments are difficult to remember and configure especially for
an API with complicated commandline names. Rather than using half-baked
textfiles, implement a proper config solution.

Also add a progress bar when loading models in the commandline.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-12 00:21:16 -05:00
kingbri
5d32aa02cd Tree: Update to use ModelContainer and args
Use command-line arguments to load an initial model if necessary.
API routes are broken, but we should be using the container from
now on as a primary interface with the exllama2 library.

Also these args should be turned into a YAML configuration file in
the future.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-10 23:19:54 -05:00
turboderp
9d34479e3e Model container with generator logic, initial 2023-11-11 02:53:00 +01:00
turboderp
d2480bae28 Test 2023-11-10 23:57:41 +01:00
turboderp
5de2a4005f Test 2023-11-10 23:57:12 +01:00
kingbri
ef099cb55a Chore: Add gitignore and remove ignored files
Ignore any IDE specific configurations and extra files that can break
git indexing.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-11-10 15:58:51 -05:00
Splice86
8e2671a265 Update to README and other minor changes 2023-11-10 01:37:24 -06:00
Splice86
ab84b01fdf
Updated readme 2023-11-10 00:39:08 -06:00
david
ca992f483f Update README.md 2023-11-09 23:45:21 -06:00
david
f844b1ee91 Update README.md
Updated readme
2023-11-09 23:44:28 -06:00
david
b967e2e604 Initial 2023-11-09 21:27:45 -06:00