jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
kingbri	43f9483bc4	Model: Add tensor_parallel_backend option This allows for users to use nccl or native depending on the GPU setup. NCCL is only available with Linux built wheels. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-17 22:35:10 -04:00
kingbri	0b4ca567f8	API: Persist request IDs and append full_text to finish chunk Adding these to each generation chunk helps remove redundancy and unecessary request ID operations. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-25 12:27:44 -04:00
kingbri	707d005aad	API: Default tool call ID and type Doing this helps reduce the model's burden of generating the tool call ID and type (which is always "function"). Follow mistral's spec for tool call IDs by using a 9 character alphanumeric string. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-11 01:11:09 -04:00
kingbri	5b1db3ad83	API: Don't do a second re-render when tool calling Re-rendering the template is an expensive operation when it's possible to just concatenate the prompt and current generation text together. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-06 11:32:36 -04:00
kingbri	3dfa965019	API: Add tool_call_id for role = tool If a message with role = tool is present, the tool_call_id should also be given to the template. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-05 21:52:58 -04:00
kingbri	879f4cee7e	API: Modify tool calling for wider compat When revisiting tool calls, the formats have more or less become standard. For greater compatibility with templates, primarily use the message.tools parameter and remove the extra custom metadata that is no longer required. However, unlike other backends, tabbyAPI still uses template metadata to declare what the tool start string is. This allows for template-level customization along with giving more power to the user while the server exists to consume rather than work on a case-by-case basis. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-05 14:28:12 -04:00
kingbri	b6a26da50c	API: Fix tool call serialization To render in the template, tool call start tokens needed to have less checks and remove the line to convert message.tool_calls to a dict since that breaks the rest of the chain by disconnecting the types. model_dump on the message itself already accomplishes this. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-04 15:02:49 -04:00
kingbri	d23fefbecd	API + Model: Fix application of defaults use_as_default was not being properly applied into model overrides. For compartmentalization's sake, apply all overrides in a single function to avoid clutter. In addition, fix where the traditional /v1/model/load endpoint checks for draft options. These can be applied via an inline config, so let any failures fallthrough. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-03 14:37:34 -04:00
kingbri	2913ce29fc	API: Add timings to usage stats It's useful for the client to know what the T/s and total time for generation are per-request. Works with both completions and chat completions. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-06-17 22:54:51 -04:00
kingbri	a3c780ae58	API: Core: Remove load/template aliases These added extra complexity and should be removed and replaced with a single parameter. Changes: - /v1/model/load must use model_name and draft_model_name - /v1/model/embedding/load must use embedding_model_name - /v1/template/switch must use prompt_template_name Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-06-13 14:57:24 -04:00
kingbri	2d89c96879	API: Re-add BOS token stripping in template render Matching YALS, if the model has add_bos_token enabled, then remove an extra BOS token at the start of the prompt. This usually happens with misconfigured templates such as Llama 3. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-24 21:11:53 -04:00
kingbri	10fbe043a4	API: Fix typing for chat templates in CC requests Tools must be None by default. Chat completion message content can be None, a string, or a list, so default to None. Exclude all None values from a CC message since the template can say the variable "exists" despite being None, causing an error. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-24 21:06:05 -04:00
kingbri	54b8a20a19	API: Fix types for chat completions Messages were mistakenly being sent as Pydantic objects, but templates expect dictionaries. Properly convert these before render. In addition, initialize all Optional lists as an empty list since this will cause the least problems when interacting with other parts of API code, such as templates. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-17 18:10:34 -04:00
kingbri	0858b6d4b2	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-17 00:46:40 -04:00
kingbri	7900b72848	API: Add chat_template_kwargs alias for template_vars This key is used in VLLM and SGLang. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 15:48:39 -04:00
kingbri	8996dc7b02	API: Add default for backend in model load request Should be None so pydantic doesn't complain. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 09:51:09 -04:00
Brian	b555eeb6e7	Merge pull request #339 from Maaaxiii/fix/tool-calling-embeddings fix: Aligned Parameter Name in chat completions generate_tool_calls	2025-05-11 20:41:58 -04:00
kingbri	f4adca1f3e	API: Remove default fallback from backend param Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-11 09:56:53 -04:00
kingbri	6379081dd8	Sampling: Make add_bos_token override concise Also set the default to None so text completions follows the same pattern. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-10 19:07:35 -04:00
Brian	527afc206b	Merge pull request #329 from DocShotgun/exl3 Exllamav3 cache quantization	2025-05-08 23:11:45 -04:00
Maximilian Klem	22f7f1e1ec	fix: flipped parameter name with variable name	2025-05-07 21:04:30 +02:00
kingbri	bc0a84241a	API: Patch kobold generation call Calling the model requires different args now. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-05 22:11:21 -04:00
DocShotgun	45b966363e	Tree: Format	2025-05-03 21:01:03 -07:00
turboderp	036af02bf6	Common: No default add_bos_token value for chat completion requests	2025-05-04 05:25:58 +02:00
kingbri	7c6a053747	Model: Add option to select backend Changing the backend switches the container that's used. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:32:39 -04:00
kingbri	aa657fa6e9	API: Ignore add_bos_token in chat completions When fetching special tokens from the model, don't factor in the add_bos_token and ban_eos_token parameters as switches. In addition, change the internal handling of add_bos_token to an optional boolean. This allows us to fallback to the model when selecting whether or not to add the BOS token, especially for chat completions. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-01 22:51:15 -04:00
kingbri	3960612d38	API: Format and fix message naming Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-28 22:36:30 -04:00
kingbri	9157be3e34	API: Append task index to generations with n > 1 Since jobs are tracked via request IDs now, each generation task should be uniquely identified in the event of cancellation. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-28 22:29:48 -04:00
kingbri	3649d3bb51	Tree: Format + Lint Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-26 02:14:30 -04:00
kingbri	f070587e9f	Model: Add proper jobs cleanup and fix var calls Jobs should be started and immediately cleaned up when calling the generation stream. Expose a stream_generate function and append this to the base class since it's more idiomatic than generate_gen. The exl2 container's generate_gen function is now internal. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-24 21:30:55 -04:00
kingbri	3f09fcd8c9	Model: Make model params return a model card The model card is a unified structure for sharing model params. Rather than kwargs, use this instead. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-21 23:15:46 -04:00
kingbri	3084ef9fa1	Model + API: Migrate to use BaseSamplerParams kwargs is pretty ugly when figuring out which arguments to use. The base requests falls back to defaults anyways, so pass in the params object as is. However, since Python's typing isn't like TypeScript where types can be transformed, the type hinting has a possiblity of None showing up despite there always being a value for some params. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-16 00:50:05 -04:00
Andrew Phillips	436ce752da	Support more common tool variables in templates (tools, message.tool_calls) (#308 ) * Add non-JSON version of `tools` and `functions` to `template_vars`. Increase the compatibility with VLLM templates which use a non-JSON tools object. * Add list of tool template variables to the documentation * Use Jinja templates to provide `tools_json` and `functions_json` This should be functionally equivelant, but the JSON won't be produced unless it's needed. * Make message.tool_calls match the JSON from ToolCallProcessor * Log something when generating tool calls * Add template for Qwen QwQ 32b * Only log if tool calls have been detected * API: Fix tool call variable assignments Jinja functions do not run when variables are called. Use json.dumps instead. In addition, log the request ID when stating that a tool call was fired. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com> * Add `ToolCallProcessor.dump()` to get the list of processed dicts * Remove qwen_qwq_32b.jinja This will be added to the following repository at a later date: https://github.com/theroyallab/llm-prompt-templates --------- Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com> Co-authored-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-23 13:23:00 -04:00
kingbri	79f9c6e854	Model: Remove num_experts_per_token This shouldn't even be an exposed option since changing it always breaks inference with the model. Let the model's config.json handle it. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-19 11:52:10 -04:00
Benjamin Oldenburg	a20abe2d33	Bugfix: Chat completion requests fail with UnboundLocalError: finish_reason variable not initialized (#307 ) * fix issue #306 * removed whitespaces for ruff	2025-03-15 20:31:21 -04:00
kingbri	d98c0bd3f6	API: Add tools class Was mistakenly not added in PR 302. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-14 15:07:11 -04:00
Benjamin Oldenburg	a2a14ea114	Fix Tool Call JSON Serialization Error (#302 ) * Fix Tool Call JSON Serialization Error * Incorporate changes from PR 292 kingbri note: Adjusts the tool JSON formation and incorporates finish reasons. Added both authors as co-authors due to edits on this commit from the original PR. Co-Authored-by: David Allada <dallada1@vt.edu> Co-Authored-by: Benjamin Oldenburg <benjamin.oldenburg@ordis.co.th> Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com> * API: Cleanup tool call JSON parsing Split pre and post-processing of tool calls to its own class. This cleans up the chat_completion utility module and also fixes the JSON serialization bug. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com> --------- Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com> Co-authored-by: David Allada <dallada1@vt.edu> Co-authored-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-14 15:01:33 -04:00
kingbri	35fe372f2b	Embeddings: Handle case if embedding input is passed as a string Infinity expects a list when embedding, so convert to a list if the input is a string. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-23 00:39:21 -05:00
kingbri	9f649647f0	Model + API: GPU split updates and fixes For the TP loader, GPU split cannot be an empty array. However, defaulting the parameter to an empty array makes it easier to calculate the device list. Therefore, cast an empty array to None using falsy comparisons at load time. Also add draft_gpu_split to the load request. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-15 21:50:14 -05:00
kingbri	e290b88568	Args: Expose api-servers to subcommands This is required for the export-openapi action. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-10 23:39:46 -05:00
kingbri	6da65a8fd3	Embeddings: Fix base64 return A base64 embedding can be a string post-encoding. Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2025-01-01 16:15:12 -05:00
kingbri	7878d351a7	Endpoints: Add props endpoint and add more values to model params The props endpoint is a standard used by llamacpp APIs which returns various properties of a model to a server. It's still recommended to use /v1/model to get all the parameters a TabbyAPI model has. Also include the contents of a prompt template when fetching the current model. Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-26 17:32:19 -05:00
kingbri	fa8035ef72	Dependencies: Update sse-starlette and formatron Also pin newer versions of dependencies and fix an import from sse-starlette Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-21 23:14:55 -05:00
Brian	fe44e4a524	Merge pull request #253 from randoentity/workaround-toolcall workaround for tool calling	2024-11-28 23:30:00 -05:00
kingbri	2e06fb01d3	OAI: Pass mm_embeddings to tool call generation Don't exclude the vision embeddings when regenerating for a tool call. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-28 23:27:59 -05:00
Brian	b81dcdaf66	Merge pull request #232 from AlpinDale/serviceinfo_uri feat: add serviceinfo URI	2024-11-28 23:19:52 -05:00
kingbri	5fadaa728a	API: Move serviceinfo to core Best to expose this endpoint to all APIs as its an information endpoint. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-28 23:07:58 -05:00
randoentity	a52610fb19	workaround for tool calling	2024-11-24 13:40:33 +01:00
kingbri	388d36e6bd	OAI: Fix chat completion list parsing The strings weren't being concatenated properly. Only add the combined text if the chat completion type is a List. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-22 17:30:29 -05:00
kingbri	902045edbb	API: Fix chat completion formatting flow Previously, the flow for parsing chat completion messages and rendering from the prompt template was disconnected between endpoints. Now, create a common function to render and handle everything appropriately afterwards. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-21 17:51:14 -05:00

1 2 3 4 5

205 commits