OAI: Add response_prefix and fix BOS token issues in chat completions

response_prefix is used to add a prefix before generating the next message. This is used in many cases such as continuining a prompt (see #96). Also if a template has BOS token specified, add_bos_token will append two BOS tokens. Add a check which strips a starting BOS token from the prompt if it exists. Signed-off-by: kingbri <bdashore3@proton.me>
2024-04-25 00:54:43 -04:00 · 2024-04-25 00:54:43 -04:00 · fb1d2f34c1
commit fb1d2f34c1
parent ed7cd3cb59
4 changed files with 20 additions and 1 deletions
--- a/endpoints/OAI/utils/completion.py
+++ b/endpoints/OAI/utils/completion.py
@ -94,8 +94,8 @@ async def generate_completion(data: CompletionRequest, model_path: pathlib.Path)

    try:
        generation = await model.container.generate(data.prompt, **data.to_gen_params())
-
        response = _create_response(generation, model_path.name)
+
        return response
    except Exception as exc:
        error_message = handle_request_error(