jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
kingbri	113643c0df	Main: Enable cudaMallocAsync backend by default Works on cuda 12.4 and up. If CUDA doesn't exist, then don't enable the backend. This is an env var that needs to be set, so it's not really possible to set it via config.yml. This used to be experimental, but it's probably fine to keep it enabled since it only provides a benefit. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-27 22:31:38 -04:00
kingbri	e77fa0b7a8	Docs: Edit inline loading for breaking changes Add the model key for the YAML examples. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-24 18:11:42 -04:00
kingbri	1c3f84151f	Docs: Update tool calling For new variables and format. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-05 21:43:04 -04:00
kingbri	7900b72848	API: Add chat_template_kwargs alias for template_vars This key is used in VLLM and SGLang. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 15:48:39 -04:00
Andrew Phillips	436ce752da	Support more common tool variables in templates (tools, message.tool_calls) (#308 ) * Add non-JSON version of `tools` and `functions` to `template_vars`. Increase the compatibility with VLLM templates which use a non-JSON tools object. * Add list of tool template variables to the documentation * Use Jinja templates to provide `tools_json` and `functions_json` This should be functionally equivelant, but the JSON won't be produced unless it's needed. * Make message.tool_calls match the JSON from ToolCallProcessor * Log something when generating tool calls * Add template for Qwen QwQ 32b * Only log if tool calls have been detected * API: Fix tool call variable assignments Jinja functions do not run when variables are called. Use json.dumps instead. In addition, log the request ID when stating that a tool call was fired. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com> * Add `ToolCallProcessor.dump()` to get the list of processed dicts * Remove qwen_qwq_32b.jinja This will be added to the following repository at a later date: https://github.com/theroyallab/llm-prompt-templates --------- Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com> Co-authored-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-23 13:23:00 -04:00
kingbri	ccf23243c1	Docs: Update getting started with downloading from private repos Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-19 12:02:48 -04:00
kingbri	79f9c6e854	Model: Remove num_experts_per_token This shouldn't even be an exposed option since changing it always breaks inference with the model. Let the model's config.json handle it. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-19 11:52:10 -04:00
kingbri	698d8339cb	Config + Docs: Clarify YaRN rope scaling changes In ExllamaV2, if a model has YaRN support, linear RoPE options are not applied. Users can set max_seq_len and exl2 will take care of the rest. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-19 11:47:49 -04:00
kingbri	de77955428	Docs: Update Update getting started and server options Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-12 00:41:17 -04:00
kingbri	73688670a6	Docs: Add model and inline loading documentation Sorely required due to the amount of questions about how does inline loading work. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-25 00:09:18 -05:00
kingbri	d6b8c7db4b	Docs: Update getting started guide Add downloader options and edit some points. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-18 12:17:14 -05:00
kingbri	5614b342a7	Tree: Migrate docs into repository This will auto-publish to the Github wiki via an action. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-17 23:39:35 -05:00

12 commits