It's useful for the client to know what the T/s and total time for generation are per-request. Works with both completions and chat completions. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com> |
||
|---|---|---|
| .. | ||
| chat_completion.py | ||
| completion.py | ||
| embeddings.py | ||
| tools.py | ||