jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
kingbri	d23fefbecd	API + Model: Fix application of defaults use_as_default was not being properly applied into model overrides. For compartmentalization's sake, apply all overrides in a single function to avoid clutter. In addition, fix where the traditional /v1/model/load endpoint checks for draft options. These can be applied via an inline config, so let any failures fallthrough. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-03 14:37:34 -04:00
kingbri	2913ce29fc	API: Add timings to usage stats It's useful for the client to know what the T/s and total time for generation are per-request. Works with both completions and chat completions. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-06-17 22:54:51 -04:00
kingbri	a3c780ae58	API: Core: Remove load/template aliases These added extra complexity and should be removed and replaced with a single parameter. Changes: - /v1/model/load must use model_name and draft_model_name - /v1/model/embedding/load must use embedding_model_name - /v1/template/switch must use prompt_template_name Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-06-13 14:57:24 -04:00
kingbri	2d89c96879	API: Re-add BOS token stripping in template render Matching YALS, if the model has add_bos_token enabled, then remove an extra BOS token at the start of the prompt. This usually happens with misconfigured templates such as Llama 3. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-24 21:11:53 -04:00
kingbri	10fbe043a4	API: Fix typing for chat templates in CC requests Tools must be None by default. Chat completion message content can be None, a string, or a list, so default to None. Exclude all None values from a CC message since the template can say the variable "exists" despite being None, causing an error. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-24 21:06:05 -04:00
kingbri	54b8a20a19	API: Fix types for chat completions Messages were mistakenly being sent as Pydantic objects, but templates expect dictionaries. Properly convert these before render. In addition, initialize all Optional lists as an empty list since this will cause the least problems when interacting with other parts of API code, such as templates. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-17 18:10:34 -04:00
kingbri	0858b6d4b2	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-17 00:46:40 -04:00
kingbri	7900b72848	API: Add chat_template_kwargs alias for template_vars This key is used in VLLM and SGLang. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 15:48:39 -04:00
kingbri	8996dc7b02	API: Add default for backend in model load request Should be None so pydantic doesn't complain. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 09:51:09 -04:00
Brian	b555eeb6e7	Merge pull request #339 from Maaaxiii/fix/tool-calling-embeddings fix: Aligned Parameter Name in chat completions generate_tool_calls	2025-05-11 20:41:58 -04:00
kingbri	f4adca1f3e	API: Remove default fallback from backend param Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-11 09:56:53 -04:00
kingbri	6379081dd8	Sampling: Make add_bos_token override concise Also set the default to None so text completions follows the same pattern. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-10 19:07:35 -04:00
Brian	527afc206b	Merge pull request #329 from DocShotgun/exl3 Exllamav3 cache quantization	2025-05-08 23:11:45 -04:00
Maximilian Klem	22f7f1e1ec	fix: flipped parameter name with variable name	2025-05-07 21:04:30 +02:00
kingbri	bc0a84241a	API: Patch kobold generation call Calling the model requires different args now. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-05 22:11:21 -04:00
DocShotgun	45b966363e	Tree: Format	2025-05-03 21:01:03 -07:00
turboderp	036af02bf6	Common: No default add_bos_token value for chat completion requests	2025-05-04 05:25:58 +02:00
kingbri	7c6a053747	Model: Add option to select backend Changing the backend switches the container that's used. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:32:39 -04:00
kingbri	aa657fa6e9	API: Ignore add_bos_token in chat completions When fetching special tokens from the model, don't factor in the add_bos_token and ban_eos_token parameters as switches. In addition, change the internal handling of add_bos_token to an optional boolean. This allows us to fallback to the model when selecting whether or not to add the BOS token, especially for chat completions. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-01 22:51:15 -04:00
kingbri	3960612d38	API: Format and fix message naming Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-28 22:36:30 -04:00
kingbri	9157be3e34	API: Append task index to generations with n > 1 Since jobs are tracked via request IDs now, each generation task should be uniquely identified in the event of cancellation. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-28 22:29:48 -04:00
kingbri	3649d3bb51	Tree: Format + Lint Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-26 02:14:30 -04:00
kingbri	f070587e9f	Model: Add proper jobs cleanup and fix var calls Jobs should be started and immediately cleaned up when calling the generation stream. Expose a stream_generate function and append this to the base class since it's more idiomatic than generate_gen. The exl2 container's generate_gen function is now internal. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-24 21:30:55 -04:00
kingbri	3f09fcd8c9	Model: Make model params return a model card The model card is a unified structure for sharing model params. Rather than kwargs, use this instead. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-21 23:15:46 -04:00
kingbri	3084ef9fa1	Model + API: Migrate to use BaseSamplerParams kwargs is pretty ugly when figuring out which arguments to use. The base requests falls back to defaults anyways, so pass in the params object as is. However, since Python's typing isn't like TypeScript where types can be transformed, the type hinting has a possiblity of None showing up despite there always being a value for some params. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-16 00:50:05 -04:00
Andrew Phillips	436ce752da	Support more common tool variables in templates (tools, message.tool_calls) (#308 ) * Add non-JSON version of `tools` and `functions` to `template_vars`. Increase the compatibility with VLLM templates which use a non-JSON tools object. * Add list of tool template variables to the documentation * Use Jinja templates to provide `tools_json` and `functions_json` This should be functionally equivelant, but the JSON won't be produced unless it's needed. * Make message.tool_calls match the JSON from ToolCallProcessor * Log something when generating tool calls * Add template for Qwen QwQ 32b * Only log if tool calls have been detected * API: Fix tool call variable assignments Jinja functions do not run when variables are called. Use json.dumps instead. In addition, log the request ID when stating that a tool call was fired. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com> * Add `ToolCallProcessor.dump()` to get the list of processed dicts * Remove qwen_qwq_32b.jinja This will be added to the following repository at a later date: https://github.com/theroyallab/llm-prompt-templates --------- Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com> Co-authored-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-23 13:23:00 -04:00
kingbri	79f9c6e854	Model: Remove num_experts_per_token This shouldn't even be an exposed option since changing it always breaks inference with the model. Let the model's config.json handle it. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-19 11:52:10 -04:00
Benjamin Oldenburg	a20abe2d33	Bugfix: Chat completion requests fail with UnboundLocalError: finish_reason variable not initialized (#307 ) * fix issue #306 * removed whitespaces for ruff	2025-03-15 20:31:21 -04:00
kingbri	d98c0bd3f6	API: Add tools class Was mistakenly not added in PR 302. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-14 15:07:11 -04:00
Benjamin Oldenburg	a2a14ea114	Fix Tool Call JSON Serialization Error (#302 ) * Fix Tool Call JSON Serialization Error * Incorporate changes from PR 292 kingbri note: Adjusts the tool JSON formation and incorporates finish reasons. Added both authors as co-authors due to edits on this commit from the original PR. Co-Authored-by: David Allada <dallada1@vt.edu> Co-Authored-by: Benjamin Oldenburg <benjamin.oldenburg@ordis.co.th> Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com> * API: Cleanup tool call JSON parsing Split pre and post-processing of tool calls to its own class. This cleans up the chat_completion utility module and also fixes the JSON serialization bug. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com> --------- Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com> Co-authored-by: David Allada <dallada1@vt.edu> Co-authored-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-14 15:01:33 -04:00
kingbri	35fe372f2b	Embeddings: Handle case if embedding input is passed as a string Infinity expects a list when embedding, so convert to a list if the input is a string. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-23 00:39:21 -05:00
kingbri	9f649647f0	Model + API: GPU split updates and fixes For the TP loader, GPU split cannot be an empty array. However, defaulting the parameter to an empty array makes it easier to calculate the device list. Therefore, cast an empty array to None using falsy comparisons at load time. Also add draft_gpu_split to the load request. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-15 21:50:14 -05:00
kingbri	e290b88568	Args: Expose api-servers to subcommands This is required for the export-openapi action. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-10 23:39:46 -05:00
kingbri	6da65a8fd3	Embeddings: Fix base64 return A base64 embedding can be a string post-encoding. Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2025-01-01 16:15:12 -05:00
kingbri	7878d351a7	Endpoints: Add props endpoint and add more values to model params The props endpoint is a standard used by llamacpp APIs which returns various properties of a model to a server. It's still recommended to use /v1/model to get all the parameters a TabbyAPI model has. Also include the contents of a prompt template when fetching the current model. Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-26 17:32:19 -05:00
kingbri	fa8035ef72	Dependencies: Update sse-starlette and formatron Also pin newer versions of dependencies and fix an import from sse-starlette Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-21 23:14:55 -05:00
Brian	fe44e4a524	Merge pull request #253 from randoentity/workaround-toolcall workaround for tool calling	2024-11-28 23:30:00 -05:00
kingbri	2e06fb01d3	OAI: Pass mm_embeddings to tool call generation Don't exclude the vision embeddings when regenerating for a tool call. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-28 23:27:59 -05:00
Brian	b81dcdaf66	Merge pull request #232 from AlpinDale/serviceinfo_uri feat: add serviceinfo URI	2024-11-28 23:19:52 -05:00
kingbri	5fadaa728a	API: Move serviceinfo to core Best to expose this endpoint to all APIs as its an information endpoint. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-28 23:07:58 -05:00
randoentity	a52610fb19	workaround for tool calling	2024-11-24 13:40:33 +01:00
kingbri	388d36e6bd	OAI: Fix chat completion list parsing The strings weren't being concatenated properly. Only add the combined text if the chat completion type is a List. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-22 17:30:29 -05:00
kingbri	902045edbb	API: Fix chat completion formatting flow Previously, the flow for parsing chat completion messages and rendering from the prompt template was disconnected between endpoints. Now, create a common function to render and handle everything appropriately afterwards. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-21 17:51:14 -05:00
kingbri	c652a6e030	API: Transform multimodal into an actual class Migrate the add method into the class itself. Also, a BaseModel isn't needed here since this isn't a serialized class. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-20 00:06:20 -05:00
kingbri	8ffc636dce	OAI: Strictly type chat completions Previously, the messages were a list of dicts. These are untyped and don't provide strict hinting. Add types for chat completion messages and reformat existing code. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-19 23:18:18 -05:00
kingbri	0fadb1e5e8	Merge branch 'main' into vision	2024-11-19 21:19:21 -05:00
DocShotgun	731a345cfc	OAI: Keep behavior consistent between chat completion and encode * When vision is not enabled, only the first text block is kept in message.content if it is a list	2024-11-19 12:40:00 -08:00
DocShotgun	27d9af50a8	API: Report whether vision is enabled	2024-11-19 12:29:25 -08:00
DocShotgun	5611365c07	OAI: Allow /v1/encode endpoint to handle vision requests * More robust checks for OAI chat completion message lists on /v1/encode endpoint * Added TODO to support other aspects of chat completions * Fix oversight where embeddings was not defined in advance on /v1/chat/completions endpoint	2024-11-19 11:14:37 -08:00
DocShotgun	dd41eec8a4	OAI: Initial vision support in OAI chat completions * Support image_url inputs containing URLs or base64 strings following OAI vision spec * Use async lru cache for image embeddings * Add generic wrapper class for multimodal embeddings	2024-11-17 21:23:09 -08:00

1 2 3 4

198 commits