jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
turboderp	d357f100d0	Dependencies: Bump ExllamaV3	2025-06-15 19:12:45 +02:00
turboderp	a0c16bba2a	Exl2: Fix banned_strings (move outside of assign_gen_params)	2025-06-15 16:51:42 +02:00
kingbri	0ea56382f0	Dependencies: Fix unsupported dependency error Log the package name provided to the check function. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-06-13 14:57:02 -04:00
kingbri	f4ee56ba13	Update README Include ExllamaV3 Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-06-13 14:57:01 -04:00
turboderp	691a080ac7	Dependencies: Bump ExllamaV3 and ExllamaV2	2025-05-31 23:55:04 +02:00
kingbri	2d89c96879	API: Re-add BOS token stripping in template render Matching YALS, if the model has add_bos_token enabled, then remove an extra BOS token at the start of the prompt. This usually happens with misconfigured templates such as Llama 3. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-24 21:11:53 -04:00
kingbri	10fbe043a4	API: Fix typing for chat templates in CC requests Tools must be None by default. Chat completion message content can be None, a string, or a list, so default to None. Exclude all None values from a CC message since the template can say the variable "exists" despite being None, causing an error. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-24 21:06:05 -04:00
kingbri	0c4cc1eba3	Model: Add prompt logging to ExllamaV3 Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-17 22:05:18 -04:00
Brian	729caaeddc	Merge pull request #346 from gakada/main Exl3: some models aren't functional without add_bos?	2025-05-17 22:05:15 -04:00
kingbri	0646d358a2	Main: Log auth and sampler overrides after model load Like YALS, logging all pertinent information after model load makes it easier to parse by the user. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-17 18:10:34 -04:00
kingbri	54b8a20a19	API: Fix types for chat completions Messages were mistakenly being sent as Pydantic objects, but templates expect dictionaries. Properly convert these before render. In addition, initialize all Optional lists as an empty list since this will cause the least problems when interacting with other parts of API code, such as templates. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-17 18:10:34 -04:00
gakada	ba6248eec0	Exl3: fix add_bos in generator	2025-05-17 19:10:49 +09:00
Brian	81170eee00	Merge pull request #312 from davidallada/add-file-based-logging Add file based logging in addition to the normal console logs	2025-05-17 01:24:19 -04:00
kingbri	17f3dca6fc	Packaging: Add agnostic method to check version of packages Some packages such as ExllamaV2 and V3 require specific versions for the latest features. Rather than creating repetitive functions, create an agnostic function to check the installed package and then report to the user to upgrade. This is also sent to requests for loading and unloading, so keep the error short. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-17 01:04:24 -04:00
kingbri	084916c04f	Model: Fix autosplit reserve crash with GPU split ExllamaV3 does not accept autosplit_reserve and gpu_split at the same time. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-17 00:51:14 -04:00
kingbri	0858b6d4b2	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-17 00:46:40 -04:00
kingbri	fa534fe551	Dependencies: Update Ruff v0.11.10 Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-17 00:46:25 -04:00
kingbri	390daeb92f	Model: Create universal HFModel class The HFModel class serves to coalesce all config files that contain random keys which are required for model usage. Adding this base class allows us to expand as HuggingFace randomly changes their JSON schemas over time, reducing the brunt that backend devs need to feel when their next model isn't supported. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-13 18:12:38 -04:00
kingbri	7900b72848	API: Add chat_template_kwargs alias for template_vars This key is used in VLLM and SGLang. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 15:48:39 -04:00
kingbri	c9dc0b2aa4	Dependencies: Bump ExllamaV3 and ExllamaV2 v0.0.2 and v0.3.0 respectively Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 15:29:31 -04:00
kingbri	bd3fec929c	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 11:32:27 -04:00
kingbri	a524ac3c0f	Model: Fix cache mode again If statements can be difficult to work with. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 11:30:47 -04:00
kingbri	20cad851e9	Model: Fix param call Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 09:52:28 -04:00
kingbri	d15eb55f20	Model: Fix exl2 cache mode check FP16 was not included in the validation step. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 09:51:09 -04:00
kingbri	8996dc7b02	API: Add default for backend in model load request Should be None so pydantic doesn't complain. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 09:51:09 -04:00
Brian	b555eeb6e7	Merge pull request #339 from Maaaxiii/fix/tool-calling-embeddings fix: Aligned Parameter Name in chat completions generate_tool_calls	2025-05-11 20:41:58 -04:00
kingbri	f4adca1f3e	API: Remove default fallback from backend param Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-11 09:56:53 -04:00
Brian	3674d7b9b5	Merge pull request #341 from theroyallab/exl3 Exl3	2025-05-10 23:43:02 -04:00
kingbri	6379081dd8	Sampling: Make add_bos_token override concise Also set the default to None so text completions follows the same pattern. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-10 19:07:35 -04:00
kingbri	656af41b5d	Model: Always enable decode_special_tokens The frontend should handle the special tokens if they get emitted. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-09 22:25:50 -04:00
kingbri	83826b56be	Main: Remove unnecessary import Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-09 22:14:11 -04:00
kingbri	42346c6b39	Sampling: Remove skip_special_tokens This parameter is way too confusing and does not make sense in the modern LLM space. Change approved by all maintainers. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-09 22:11:33 -04:00
kingbri	25c77ebf77	Model: Remove exllamav2-specific version check No longer necessary thanks to the agnostic check. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-09 22:08:15 -04:00
kingbri	48ea1737cf	Startup: Check agnostically for inference deps If an inference dep isn't present, force exit the application. This occurs after all subcommands have been appropriately processed. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-09 21:59:00 -04:00
kingbri	33ac016023	Dependencies: Add ExllamaV3 v0.0.1 Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-09 21:42:07 -04:00
Brian	f26ca23f1a	Merge pull request #336 from DocShotgun/backend-detect Automatically select model backend based on config.json	2025-05-09 01:56:44 -04:00
Brian	02a8d68e17	Merge branch 'exl3' into backend-detect	2025-05-08 23:50:33 -04:00
kingbri	d5963007f0	Model: Add backend print Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-08 23:45:04 -04:00
kingbri	cfee16905b	Model: Migrate backend detection to a separate function Seemed out of place in the common load function. In addition, rename the transformers utils signature which actually takes a directory instead of a file. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-08 23:42:39 -04:00
Brian	527afc206b	Merge pull request #329 from DocShotgun/exl3 Exllamav3 cache quantization	2025-05-08 23:11:45 -04:00
kingbri	638eef401a	Model: Move cache creation to a common function Prevents repetitiveness while also creating a Cache class. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-08 23:10:03 -04:00
Maximilian Klem	22f7f1e1ec	fix: flipped parameter name with variable name	2025-05-07 21:04:30 +02:00
DocShotgun	f8070e7707	Model: Auto detect model backend from config * Use exllamav3 for exl3 models, exllamav2 otherwise	2025-05-06 18:51:58 -07:00
DocShotgun	9dcde59c57	Model: Check for unsupported cache mode in exllamav2	2025-05-06 01:18:15 -07:00
kingbri	bc0a84241a	API: Patch kobold generation call Calling the model requires different args now. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-05 22:11:21 -04:00
kingbri	b683545d0e	Config: Fix argparse help Adding a comma in the description converts the string to a tuple, which isn't parseable by argparse's help. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-05 21:52:30 -04:00
turboderp	ff38305145	Common: Fix exception f-string	2025-05-05 02:01:16 +02:00
DocShotgun	45b966363e	Tree: Format	2025-05-03 21:01:03 -07:00
DocShotgun	a635a719d7	Model: Enable draft model q-cache in Exl3 * Remove unneeded default fp16 cache layer import	2025-05-03 20:59:36 -07:00
DocShotgun	58e34ba4c5	Model: Exl3 cache quant settings lenient with whitespace	2025-05-03 20:35:35 -07:00

1 2 3 4 5 ...

1018 commits