jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
kingbri	0858b6d4b2	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-17 00:46:40 -04:00
kingbri	fa534fe551	Dependencies: Update Ruff v0.11.10 Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-17 00:46:25 -04:00
kingbri	390daeb92f	Model: Create universal HFModel class The HFModel class serves to coalesce all config files that contain random keys which are required for model usage. Adding this base class allows us to expand as HuggingFace randomly changes their JSON schemas over time, reducing the brunt that backend devs need to feel when their next model isn't supported. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-13 18:12:38 -04:00
kingbri	7900b72848	API: Add chat_template_kwargs alias for template_vars This key is used in VLLM and SGLang. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 15:48:39 -04:00
kingbri	c9dc0b2aa4	Dependencies: Bump ExllamaV3 and ExllamaV2 v0.0.2 and v0.3.0 respectively Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 15:29:31 -04:00
kingbri	bd3fec929c	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 11:32:27 -04:00
kingbri	a524ac3c0f	Model: Fix cache mode again If statements can be difficult to work with. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 11:30:47 -04:00
kingbri	20cad851e9	Model: Fix param call Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 09:52:28 -04:00
kingbri	d15eb55f20	Model: Fix exl2 cache mode check FP16 was not included in the validation step. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 09:51:09 -04:00
kingbri	8996dc7b02	API: Add default for backend in model load request Should be None so pydantic doesn't complain. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 09:51:09 -04:00
Brian	b555eeb6e7	Merge pull request #339 from Maaaxiii/fix/tool-calling-embeddings fix: Aligned Parameter Name in chat completions generate_tool_calls	2025-05-11 20:41:58 -04:00
kingbri	f4adca1f3e	API: Remove default fallback from backend param Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-11 09:56:53 -04:00
Brian	3674d7b9b5	Merge pull request #341 from theroyallab/exl3 Exl3	2025-05-10 23:43:02 -04:00
kingbri	6379081dd8	Sampling: Make add_bos_token override concise Also set the default to None so text completions follows the same pattern. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-10 19:07:35 -04:00
kingbri	656af41b5d	Model: Always enable decode_special_tokens The frontend should handle the special tokens if they get emitted. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-09 22:25:50 -04:00
kingbri	83826b56be	Main: Remove unnecessary import Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-09 22:14:11 -04:00
kingbri	42346c6b39	Sampling: Remove skip_special_tokens This parameter is way too confusing and does not make sense in the modern LLM space. Change approved by all maintainers. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-09 22:11:33 -04:00
kingbri	25c77ebf77	Model: Remove exllamav2-specific version check No longer necessary thanks to the agnostic check. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-09 22:08:15 -04:00
kingbri	48ea1737cf	Startup: Check agnostically for inference deps If an inference dep isn't present, force exit the application. This occurs after all subcommands have been appropriately processed. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-09 21:59:00 -04:00
kingbri	33ac016023	Dependencies: Add ExllamaV3 v0.0.1 Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-09 21:42:07 -04:00
Brian	f26ca23f1a	Merge pull request #336 from DocShotgun/backend-detect Automatically select model backend based on config.json	2025-05-09 01:56:44 -04:00
Brian	02a8d68e17	Merge branch 'exl3' into backend-detect	2025-05-08 23:50:33 -04:00
kingbri	d5963007f0	Model: Add backend print Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-08 23:45:04 -04:00
kingbri	cfee16905b	Model: Migrate backend detection to a separate function Seemed out of place in the common load function. In addition, rename the transformers utils signature which actually takes a directory instead of a file. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-08 23:42:39 -04:00
Brian	527afc206b	Merge pull request #329 from DocShotgun/exl3 Exllamav3 cache quantization	2025-05-08 23:11:45 -04:00
kingbri	638eef401a	Model: Move cache creation to a common function Prevents repetitiveness while also creating a Cache class. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-08 23:10:03 -04:00
Maximilian Klem	22f7f1e1ec	fix: flipped parameter name with variable name	2025-05-07 21:04:30 +02:00
DocShotgun	f8070e7707	Model: Auto detect model backend from config * Use exllamav3 for exl3 models, exllamav2 otherwise	2025-05-06 18:51:58 -07:00
DocShotgun	9dcde59c57	Model: Check for unsupported cache mode in exllamav2	2025-05-06 01:18:15 -07:00
kingbri	bc0a84241a	API: Patch kobold generation call Calling the model requires different args now. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-05 22:11:21 -04:00
kingbri	b683545d0e	Config: Fix argparse help Adding a comma in the description converts the string to a tuple, which isn't parseable by argparse's help. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-05 21:52:30 -04:00
turboderp	ff38305145	Common: Fix exception f-string	2025-05-05 02:01:16 +02:00
DocShotgun	45b966363e	Tree: Format	2025-05-03 21:01:03 -07:00
DocShotgun	a635a719d7	Model: Enable draft model q-cache in Exl3 * Remove unneeded default fp16 cache layer import	2025-05-03 20:59:36 -07:00
DocShotgun	58e34ba4c5	Model: Exl3 cache quant settings lenient with whitespace	2025-05-03 20:35:35 -07:00
DocShotgun	68a660bdb3	Model: Initial Exl3 cache quantization support	2025-05-03 20:35:35 -07:00
turboderp	036af02bf6	Common: No default add_bos_token value for chat completion requests	2025-05-04 05:25:58 +02:00
turboderp	92ea7ee7cd	Model: Add draft model/speculative decoding	2025-05-04 01:27:42 +02:00
turboderp	1db2cb99cb	Model: Avoid initializing class variables	2025-05-04 01:26:42 +02:00
turboderp	0405a94a89	Model: Cast penalty range to int	2025-05-03 22:28:36 +02:00
turboderp	58c380b8ca	Model: Create generator on load	2025-05-03 18:33:37 +02:00
turboderp	0d949d00b9	Model: Set default max_batch_size	2025-05-03 18:33:37 +02:00
turboderp	8c75b29923	Model: Fix some warnings	2025-05-03 18:33:36 +02:00
kingbri	15cc480cb0	Exl3: Simplify add_bos_token handling Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:50:42 -04:00
randoentity	d8a8ccfc2a	Model: fix add_bos_token	2025-05-02 21:33:25 -04:00
kingbri	0d02af3c81	Model: Set model_dir on init Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
kingbri	c89bea030e	Model: Add template fetching to Exl3 Use the same functionality as exl2's loader. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
kingbri	e8f00412f6	Model: Fetch from generation_config and tokenizer_config in Exl3 Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
kingbri	59d081fe83	Common: Add hardware file Removed from a commit as well. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
kingbri	eca403a0e4	Model: Add Exllamav3 sampler File was not included in previous commit. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00

1 2 3 4 5 ...

998 commits