Python doesn't have proper handling of optionals. The only way to
handle them is checking via an if statement if the value is None or
by using the "or" keyword to unwrap optionals.
Previously, I used the "or" method to unwrap, but this caused issues
due to falsy values falling back to the default. This is especially
the case with booleans were "False" changed to "True".
Instead, add two new functions: unwrap and coalesce. Both function
to properly implement a functional way of "None" coalescing.
Signed-off-by: kingbri <bdashore3@proton.me>
Models do not fully unload if an exception is caught in load. Therefore,
leave it to the client to unload on cancel.
Also add handlers in the event a SSE stream is cancelled. These packets
can't be sent back to the client since the client has severed the
connection, so print them in terminal.
Signed-off-by: kingbri <bdashore3@proton.me>
sse_starlette kept firing a ping response if it was taking too long
to set an event. Rather than using a hacky workaround, switch to
FastAPI's inbuilt streaming response and construct SSE requests with
a utility function.
This helps the API become more robust and removes an extra requirement.
Signed-off-by: kingbri <bdashore3@proton.me>
If the generator errors, there's no proper handling to send an error
packet and close the connection.
This is especially important for unloading models if the load fails
at any stage to reclaim a user's VRAM. Raising an exception caused
the model_container object to lock and not get freed by the GC.
This made sense to propegate SSE errors across all generator functions
rather than relying on abort signals.
Signed-off-by: kingbri <bdashore3@proton.me>
Models can be loaded and unloaded via the API. Also add authentication
to use the API and for administrator tasks.
Both types of authorization use different keys.
Also fix the unload function to properly free all used vram.
Signed-off-by: kingbri <bdashore3@proton.me>
YAML is a more flexible format when it comes to configuration. Commandline
arguments are difficult to remember and configure especially for
an API with complicated commandline names. Rather than using half-baked
textfiles, implement a proper config solution.
Also add a progress bar when loading models in the commandline.
Signed-off-by: kingbri <bdashore3@proton.me>
Use command-line arguments to load an initial model if necessary.
API routes are broken, but we should be using the container from
now on as a primary interface with the exllama2 library.
Also these args should be turned into a YAML configuration file in
the future.
Signed-off-by: kingbri <bdashore3@proton.me>