jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
kingbri	e07df3951e	Docs: Update sampler overrides Change the sampling subsection to sampler overrides and add a warning about the default preset. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-18 23:06:16 -04:00
kingbri	067d63773e	Config: Move sampling higher in the list This has become a bigger priority with addition of the safe_defaults noob proofing. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-18 22:55:03 -04:00
DocShotgun	6fb0c2cdbd	Config: Update description for override_preset default * We provide safe_defaults as a default in config_sample.yml but not internally	2025-08-18 12:39:52 -07:00
DocShotgun	998abe5ad1	Config: Enable safe sampler overrides by default * Provides safe fallback samplers, intended for better out-of-the-box support for clients that do not pass sampler params	2025-08-18 12:32:28 -07:00
kingbri	a4d02c2b70	Model: Add log messages for model loading It's useful to know the split method that the model is being loaded on. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-17 23:09:27 -04:00
kingbri	a3a32c30a4	Model: Add utils file Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-17 22:43:19 -04:00
Brian	05791a25a1	Merge pull request #375 from Ph0rk0z/patch-1 experimental: native exllamav3 TP, no fuss	2025-08-17 22:37:25 -04:00
kingbri	43f9483bc4	Model: Add tensor_parallel_backend option This allows for users to use nccl or native depending on the GPU setup. NCCL is only available with Linux built wheels. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-17 22:35:10 -04:00
kingbri	b9952f319e	Merge branch 'main' into exl3-tp	2025-08-17 21:21:40 -04:00
kingbri	f2a39e3a61	Dependencies: Update exllama, torch, and flash attention Torch: 2.8 ExllamaV2: v0.3.2 torch 2.8 ExllamaV3: v0.0.6 torch 2.8 FA: v2.8.3 Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-17 21:19:23 -04:00
Forkoz	60ae419746	Model.py TP changes	2025-08-12 21:01:54 +00:00
Brian	6623dbcd86	Merge pull request #373 from AUTOMATIC1111/exl3-logprobs add logprobs support for exl3	2025-08-05 01:24:06 -04:00
kingbri	fe149489af	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-05 01:22:18 -04:00
Brian	83f778db2d	Merge pull request #374 from DocShotgun/main Templating: Support chat_template.jinja	2025-08-05 01:18:25 -04:00
DocShotgun	81a115b781	Templating: Support chat_template.jinja	2025-08-03 16:10:08 -07:00
AUTOMATIC	056527ceb3	add logprobs support for exl3	2025-08-03 11:42:32 +03:00
Brian	03d72a37be	Merge pull request #371 from DocShotgun/main Config: Remove developer arg cuda_malloc_backend	2025-08-01 14:02:57 -04:00
DocShotgun	102af306e5	Config: Remove developer arg cuda_malloc_backend * cudaMallocAsync is now enabled by default on supported configurations	2025-08-01 10:59:13 -07:00
kingbri	113643c0df	Main: Enable cudaMallocAsync backend by default Works on cuda 12.4 and up. If CUDA doesn't exist, then don't enable the backend. This is an env var that needs to be set, so it's not really possible to set it via config.yml. This used to be experimental, but it's probably fine to keep it enabled since it only provides a benefit. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-27 22:31:38 -04:00
kingbri	0b4ca567f8	API: Persist request IDs and append full_text to finish chunk Adding these to each generation chunk helps remove redundancy and unecessary request ID operations. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-25 12:27:44 -04:00
kingbri	e77fa0b7a8	Docs: Edit inline loading for breaking changes Add the model key for the YAML examples. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-24 18:11:42 -04:00
kingbri	ab04a6ed60	Dependencies: Bump ExllamaV3 v0.0.5 Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-18 22:56:35 -04:00
kingbri	bf936f5c39	Dependencies: Update exllamav2 v0.3.2 Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-13 23:33:12 -04:00
Brian	2419d2d0a3	Merge pull request #364 from theroyallab/tool-calls Streamline tool calling	2025-07-11 11:34:10 -04:00
kingbri	707d005aad	API: Default tool call ID and type Doing this helps reduce the model's burden of generating the tool call ID and type (which is always "function"). Follow mistral's spec for tool call IDs by using a 9 character alphanumeric string. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-11 01:11:09 -04:00
kingbri	5b1db3ad83	API: Don't do a second re-render when tool calling Re-rendering the template is an expensive operation when it's possible to just concatenate the prompt and current generation text together. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-06 11:32:36 -04:00
kingbri	3dfa965019	API: Add tool_call_id for role = tool If a message with role = tool is present, the tool_call_id should also be given to the template. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-05 21:52:58 -04:00
kingbri	1c3f84151f	Docs: Update tool calling For new variables and format. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-05 21:43:04 -04:00
kingbri	871f71c4e7	Templates: Adjust tool call example Use the new tool call variables and formatting. Also prettify the template. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-05 21:42:23 -04:00
kingbri	879f4cee7e	API: Modify tool calling for wider compat When revisiting tool calls, the formats have more or less become standard. For greater compatibility with templates, primarily use the message.tools parameter and remove the extra custom metadata that is no longer required. However, unlike other backends, tabbyAPI still uses template metadata to declare what the tool start string is. This allows for template-level customization along with giving more power to the user while the server exists to consume rather than work on a case-by-case basis. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-05 14:28:12 -04:00
kingbri	b6a26da50c	API: Fix tool call serialization To render in the template, tool call start tokens needed to have less checks and remove the line to convert message.tool_calls to a dict since that breaks the rest of the chain by disconnecting the types. model_dump on the message itself already accomplishes this. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-04 15:02:49 -04:00
kingbri	d23fefbecd	API + Model: Fix application of defaults use_as_default was not being properly applied into model overrides. For compartmentalization's sake, apply all overrides in a single function to avoid clutter. In addition, fix where the traditional /v1/model/load endpoint checks for draft options. These can be applied via an inline config, so let any failures fallthrough. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-03 14:37:34 -04:00
kingbri	d339139fb6	Config: Deep merge model overrides Anything below the first level of kwargs was not being merged properly. A more bulletproof solution would be to refactor the loading code to separate draft and normal model parameters. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-03 12:17:09 -04:00
kingbri	0152a1665b	Downloader: Switch to use API sizes Rather than relying on Content-Length which can be unreliable, ping the API to get file sizes and work from there. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-06-30 12:49:53 -04:00
kingbri	03ff4c3128	Downloader: Handle if Content-Length is undefined Usually, the client and server both are aware of the file size by sending a Content-Length header. However, HuggingFace has changed their headers and now does not always send Content-Length. In this case, show an indeterminate progressbar and mark as complete once the download finishes. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-06-30 11:43:22 -04:00
turboderp	0ae878712e	Exl3: Clear image embedding cache on unload	2025-06-25 23:56:21 +02:00
Brian	e362319a4d	Merge pull request #358 from theroyallab/breaking Breaking changes for configuration	2025-06-17 23:10:16 -04:00
kingbri	a02d39de31	Model: Remove rogue print Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-06-17 23:09:07 -04:00
kingbri	2913ce29fc	API: Add timings to usage stats It's useful for the client to know what the T/s and total time for generation are per-request. Works with both completions and chat completions. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-06-17 22:54:51 -04:00
kingbri	5d94d4d022	Merge branch 'main' into breaking	2025-06-17 22:24:32 -04:00
turboderp	122d87ac36	Tree: Format	2025-06-15 19:33:14 +02:00
turboderp	21c5af48e1	Tree: Format	2025-06-15 19:30:38 +02:00
turboderp	1c9891bf04	Exl3: Add vision capability	2025-06-15 19:22:51 +02:00
turboderp	4605c0f6bd	Common: Refactor get_image to common functions	2025-06-15 19:20:36 +02:00
turboderp	d357f100d0	Dependencies: Bump ExllamaV3	2025-06-15 19:12:45 +02:00
turboderp	a0c16bba2a	Exl2: Fix banned_strings (move outside of assign_gen_params)	2025-06-15 16:51:42 +02:00
kingbri	2096c9bad2	Model: Default max_seq_len to 4096 A common problem in TabbyAPI is that users who want to get up and running with a model always had issues with max_seq_len causing OOMs. This is because model devs set max context values in the millions which requires a lot of VRAM. To idiot-proof first time setup, make the fallback default 4096 so users can run their models. If a user still wants to use the model's max_seq_len, set it to -1. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-06-13 14:57:24 -04:00
kingbri	322f9b773a	Model: Migrate inline config to new format This matches config.yml and all model overrides should go under the "model" block. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-06-13 14:57:24 -04:00
kingbri	a3c780ae58	API: Core: Remove load/template aliases These added extra complexity and should be removed and replaced with a single parameter. Changes: - /v1/model/load must use model_name and draft_model_name - /v1/model/embedding/load must use embedding_model_name - /v1/template/switch must use prompt_template_name Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-06-13 14:57:24 -04:00
kingbri	0ea56382f0	Dependencies: Fix unsupported dependency error Log the package name provided to the check function. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-06-13 14:57:02 -04:00

1 2 3 4 5 ...

1065 commits