It's useful for the client to know what the T/s and total time for
generation are per-request.
Works with both completions and chat completions.
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
- add models for config options
- add function to regenerate config.yml
- replace references to config with pydantic compatible references
- remove unnecessary unwrap() statements
TODO:
- auto generate env vars
- auto generate argparse
- test loading a model
Identify which request is being processed to help users disambiguate
which logs correspond to which request.
Signed-off-by: kingbri <bdashore3@proton.me>
The new async dynamic job allows for native async support without the
need of threading. Also add logprobs and metrics back to responses.
Signed-off-by: kingbri <bdashore3@proton.me>
Use the module singleton pattern to share global state. This can also
be a modified version of the Global Object Pattern. The main reason
this pattern is used is for ease of use when handling global state
rather than adding extra dependencies for a DI parameter.
Signed-off-by: kingbri <bdashore3@proton.me>
Loguru is a flexible logger that allows for easier hooking and imports
into Rich with no problems. Also makes progress bars stick to the
bottom of the terminal window.
Signed-off-by: kingbri <bdashore3@proton.me>
Move common functions into their own folder and refactor the backends
to use their own folder as well.
Also cleanup imports and alphabetize import statments themselves.
Finally, move colab and docker into their own folders as well.
Signed-off-by: kingbri <bdashore3@proton.me>