Commit graph

9 commits

Author SHA1 Message Date
kingbri
7900b72848 API: Add chat_template_kwargs alias for template_vars
This key is used in VLLM and SGLang.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-05-12 15:48:39 -04:00
Andrew Phillips
436ce752da
Support more common tool variables in templates (tools, message.tool_calls) (#308)
* Add non-JSON version of `tools` and `functions` to `template_vars`.

Increase the compatibility with VLLM templates which use a non-JSON tools object.

* Add list of tool template variables to the documentation

* Use Jinja templates to provide `tools_json` and `functions_json`

This should be functionally equivelant, but the JSON won't be produced
unless it's needed.

* Make message.tool_calls match the JSON from ToolCallProcessor

* Log something when generating tool calls

* Add template for Qwen QwQ 32b

* Only log if tool calls have been detected

* API: Fix tool call variable assignments

Jinja functions do not run when variables are called. Use json.dumps
instead. In addition, log the request ID when stating that a tool
call was fired.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>

* Add `ToolCallProcessor.dump()` to get the list of processed dicts

* Remove qwen_qwq_32b.jinja

This will be added to the following repository at a later date:
https://github.com/theroyallab/llm-prompt-templates

---------

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
Co-authored-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-03-23 13:23:00 -04:00
kingbri
ccf23243c1 Docs: Update getting started with downloading from private repos
Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-03-19 12:02:48 -04:00
kingbri
79f9c6e854 Model: Remove num_experts_per_token
This shouldn't even be an exposed option since changing it always
breaks inference with the model. Let the model's config.json handle
it.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-03-19 11:52:10 -04:00
kingbri
698d8339cb Config + Docs: Clarify YaRN rope scaling changes
In ExllamaV2, if a model has YaRN support, linear RoPE options are
not applied. Users can set max_seq_len and exl2 will take care of
the rest.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-03-19 11:47:49 -04:00
kingbri
de77955428 Docs: Update
Update getting started and server options

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-03-12 00:41:17 -04:00
kingbri
73688670a6 Docs: Add model and inline loading documentation
Sorely required due to the amount of questions about how does inline
loading work.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-02-25 00:09:18 -05:00
kingbri
d6b8c7db4b Docs: Update getting started guide
Add downloader options and edit some points.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-02-18 12:17:14 -05:00
kingbri
5614b342a7 Tree: Migrate docs into repository
This will auto-publish to the Github wiki via an action.

Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>
2025-02-17 23:39:35 -05:00