Waiting for request disconnect takes some extra time and allows generation chunks to pile up, resulting in large payloads being sent at once not making up a smooth stream. Use the polling method in non-streaming requests by creating a background task and then check if the task is done, signifying that the request has been disconnected. Signed-off-by: kingbri <bdashore3@proton.me> |
||
|---|---|---|
| .. | ||
| OAI | ||
| server.py | ||