* Add non-JSON version of `tools` and `functions` to `template_vars`. Increase the compatibility with VLLM templates which use a non-JSON tools object. * Add list of tool template variables to the documentation * Use Jinja templates to provide `tools_json` and `functions_json` This should be functionally equivelant, but the JSON won't be produced unless it's needed. * Make message.tool_calls match the JSON from ToolCallProcessor * Log something when generating tool calls * Add template for Qwen QwQ 32b * Only log if tool calls have been detected * API: Fix tool call variable assignments Jinja functions do not run when variables are called. Use json.dumps instead. In addition, log the request ID when stating that a tool call was fired. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com> * Add `ToolCallProcessor.dump()` to get the list of processed dicts * Remove qwen_qwq_32b.jinja This will be added to the following repository at a later date: https://github.com/theroyallab/llm-prompt-templates --------- Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com> Co-authored-by: kingbri <8082010+kingbri1@users.noreply.github.com>
7.4 KiB
Tool Calling in TabbyAPI
Note
Before getting started here, please look at the Custom templates page for foundational concepts.
Thanks to Storm for creating this documentation page.
TabbyAPI's tool calling implementation aligns with the OpenAI Standard, following the OpenAI Tools Implementation closely.
Features and Limitations
TabbyAPI's tool implementation supports:
- Tool calling when streaming
- Calling multiple tools per turn
Current limitations:
- No support for
tool_choiceparameter (always assumed to be auto) strictparameter not yet supported (OAI format ensured, but dtype and argument name choices not yet enforced)
Model Support
TabbyAPI exposes controls within the prompt_template to accommodate models specifically tuned for tool calling and those that aren't. By default, TabbyAPI includes chatml_with_headers_tool_calling.jinja, a generic template built to support the Llama 3.1 family and other models following the ChatML (with headers) format.
For more templates, check out llm-prompt-templates.
Usage
In order to use tool calling in TabbyAPI, you must select a prompt_template that supports tool calling when loading your model.
For example, if you are using a Llama 3.1 Family model you can simply modify your config.yml's prompt_template: to use the default tool calling template like so:
model:
...
prompt_template: chatml_with_headers_tool_calling
If loading via /v1/model/load, you would also need to specify a tool-supporting prompt_template.
Tool Template Variables
tools: Tools object.tools_json: Tools object as a JSON string.
Creating a Tool Calling Prompt Template
Here's how to create a TabbyAPI tool calling prompt template:
-
Define proper metadata:
Tool Call supporting
prompt_templatescan have the following fields as metadata:tool_startThis is a string that we expect the model to write when initating a tool call. (Required)tool_endThis is a string the model expects after completing a tool call.
Here is an example of these being defined:
{# Metadata #} {% set stop_strings = ["<|im_start|>", "<|im_end|>"] %} {% set message_roles = ['system', 'user', 'assistant', 'tool'] %} {% set tool_start = "<|tool_start|>" %} {% set tool_end = "<|tool_end|>" %}tool_startandtool_endshould be selected based on which model you decide to use. For example, Groq's Tool calling models expects<tool_call>and</tool_call>while Llama3 FireFunctionV2's model expects onlyfunctoolsto start the call, without atool_end -
Define an
initial_system_prompt:While the name of your
inital_system_promptcan vary, it's purpose does not. This inital prompt is typically a simple instruction set followed by accessing thetools_jsonvariable. This will contain the function specification the user provided to thetoolsendpoint in their client when the chat completion request. Inside the template we can call this like so:{{ tools_json }}.Note: Depending on the model you are using, it's possible your model may expect a special set of tokens to surround the function specifications. Feel free to surround
tools_jsonwith these tokens.{% set initial_system_prompt %} Your instructions here... Available functions: {{ tools_json }} {% endset %}You'll then want to make sure to provide this to the model in the first message it recieves. Here is a simple example:
{%- if loop.first -%} {{ bos_token }}{{ start_header }}{{ role }}{{ end_header }} {{ inital_system_prompt }} {{ content }}{{ eos_token }} -
Handle messages with the
toolrole:After a tool call is made, a well behaved client will respond to the model with a new message containing the role
tool. This is a response to a tool call containing the results of it's execution.The simplest implementation of this will be to ensure your
message_roleslist within your prompt template containstool. Further customization may be required for models that expect specific tokens surrounding tool reponses. An example of this customization is the Groq family of models from above. They expect special tokens surrounding their tool responses such as:{% if role == 'tool' %} <tool_response>{{ content }}</tool_response> {% endif %} -
Preserve tool calls from prior messages:
When creating a tool calling
prompt_template, ensure you handle previous tool calls from the model gracefully. Eachmessageobject withinmessagesexposed within theprompt_templatecould also containtool_calls_json. This field will contain tool calls made by the assistant in previous turns, and must be handled appropriatly so that the model understands what previous actions it has taken (and can properly identify what tool response ID belongs to which call).This will require using the
tool_start(and possiblytool_end) from above to wrap thetool_call_jsonlike so:{% if 'tool_calls_json' in message and message['tool_calls_json'] %} {{ tool_start }}{{ message['tool_calls_json'] }}{{ tool_end }} {% endif %} -
Handle tool call generation:
{% set tool_reminder %} Available Tools: {{ tools_json }} Tool Call Format Example: {{ tool_start }}{{ example_tool_call }} Prefix & Suffix: Begin tool calls with {{ tool_start }} and end with {{ tool_end }}. Argument Types: Use correct data types for arguments (e.g., strings in quotes, numbers without). {% endset %} {% if tool_precursor %} {{ start_header }}system{{ end_header }} {{ tool_reminder }}{{ eos_token }} {{ start_header }}assistant{{ end_header }} {{ tool_precursor }}{{ tool_start }} {% else %} {{ start_header }}assistant{{ end_header }} {% endif %}This clever bit of temporal manipulation allows us to slip in a reminder as a system message right before the model generates a tool call, but after it writes the
tool_starttoken. This is possible due to TabbyAPI revisitng theprompt_templateafter atool_starttoken is detected. Here's how it works:- We detect
tool_precursor, which signals the model is about to generate a tool call. - We then inject a system message with our
tool_reminder. - Finally, we initialize an assistant message using
tool_precursoras the content.
This creates the illusion that the model just happened to remember the available tools and proper formatting right before generating the tool call. It's like giving the model a little nudge at exactly the right moment, enhancing its performance without altering what the user sees.
- We detect
When creating your own tool calling prompt_template, it's best to reference the default chatml_with_headers_tool_calling.jinja template as a starting point.
Support and Bug Reporting
For bugs, please create a detailed issue with the model, prompt template, and conversation that caused it. Alternatively, join our Discord and ask for Storm.