Fastchat requires a lot of dependencies such as transformers, peft,
and accelerate which are heavy. This is not useful unless a user
wants to add a shim for the chat completion endpoint.
Instead, try importing fastchat and notify the console of the error.
Signed-off-by: kingbri <bdashore3@proton.me>
Chat completions is the endpoint that will be used by OAI in the
future. Makes sense to support it even though the completions
endpoint will be used more often.
Also unify common parameters between the chat completion and completion
requests since they're very similar.
Signed-off-by: kingbri <bdashore3@proton.me>
These were not working properly. Make the YAML file get written
to properly and the validator to return a 401 when the bearer
token is invalid.
Signed-off-by: kingbri <bdashore3@proton.me>
Models can be loaded and unloaded via the API. Also add authentication
to use the API and for administrator tasks.
Both types of authorization use different keys.
Also fix the unload function to properly free all used vram.
Signed-off-by: kingbri <bdashore3@proton.me>
The models endpoint fetches all the models that OAI has to offer.
However, since this is an OAI clone, just list the models inside
the user's configured model directory instead.
Signed-off-by: kingbri <bdashore3@proton.me>
Add support for /v1/completions with the option to use streaming
if needed. Also rewrite API endpoints to use async when possible
since that improves request performance.
Model container parameter names also needed rewrites as well and
set fallback cases to their disabled values.
Signed-off-by: kingbri <bdashore3@proton.me>
YAML is a more flexible format when it comes to configuration. Commandline
arguments are difficult to remember and configure especially for
an API with complicated commandline names. Rather than using half-baked
textfiles, implement a proper config solution.
Also add a progress bar when loading models in the commandline.
Signed-off-by: kingbri <bdashore3@proton.me>
Use command-line arguments to load an initial model if necessary.
API routes are broken, but we should be using the container from
now on as a primary interface with the exllama2 library.
Also these args should be turned into a YAML configuration file in
the future.
Signed-off-by: kingbri <bdashore3@proton.me>