jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
kingbri	17f3dca6fc	Packaging: Add agnostic method to check version of packages Some packages such as ExllamaV2 and V3 require specific versions for the latest features. Rather than creating repetitive functions, create an agnostic function to check the installed package and then report to the user to upgrade. This is also sent to requests for loading and unloading, so keep the error short. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-17 01:04:24 -04:00
kingbri	084916c04f	Model: Fix autosplit reserve crash with GPU split ExllamaV3 does not accept autosplit_reserve and gpu_split at the same time. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-17 00:51:14 -04:00
kingbri	0858b6d4b2	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-17 00:46:40 -04:00
kingbri	390daeb92f	Model: Create universal HFModel class The HFModel class serves to coalesce all config files that contain random keys which are required for model usage. Adding this base class allows us to expand as HuggingFace randomly changes their JSON schemas over time, reducing the brunt that backend devs need to feel when their next model isn't supported. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-13 18:12:38 -04:00
kingbri	bd3fec929c	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 11:32:27 -04:00
kingbri	a524ac3c0f	Model: Fix cache mode again If statements can be difficult to work with. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 11:30:47 -04:00
kingbri	20cad851e9	Model: Fix param call Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 09:52:28 -04:00
kingbri	d15eb55f20	Model: Fix exl2 cache mode check FP16 was not included in the validation step. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-12 09:51:09 -04:00
kingbri	656af41b5d	Model: Always enable decode_special_tokens The frontend should handle the special tokens if they get emitted. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-09 22:25:50 -04:00
kingbri	42346c6b39	Sampling: Remove skip_special_tokens This parameter is way too confusing and does not make sense in the modern LLM space. Change approved by all maintainers. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-09 22:11:33 -04:00
kingbri	25c77ebf77	Model: Remove exllamav2-specific version check No longer necessary thanks to the agnostic check. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-09 22:08:15 -04:00
kingbri	638eef401a	Model: Move cache creation to a common function Prevents repetitiveness while also creating a Cache class. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-08 23:10:03 -04:00
DocShotgun	9dcde59c57	Model: Check for unsupported cache mode in exllamav2	2025-05-06 01:18:15 -07:00
DocShotgun	45b966363e	Tree: Format	2025-05-03 21:01:03 -07:00
DocShotgun	a635a719d7	Model: Enable draft model q-cache in Exl3 * Remove unneeded default fp16 cache layer import	2025-05-03 20:59:36 -07:00
DocShotgun	58e34ba4c5	Model: Exl3 cache quant settings lenient with whitespace	2025-05-03 20:35:35 -07:00
DocShotgun	68a660bdb3	Model: Initial Exl3 cache quantization support	2025-05-03 20:35:35 -07:00
turboderp	92ea7ee7cd	Model: Add draft model/speculative decoding	2025-05-04 01:27:42 +02:00
turboderp	1db2cb99cb	Model: Avoid initializing class variables	2025-05-04 01:26:42 +02:00
turboderp	0405a94a89	Model: Cast penalty range to int	2025-05-03 22:28:36 +02:00
turboderp	58c380b8ca	Model: Create generator on load	2025-05-03 18:33:37 +02:00
turboderp	0d949d00b9	Model: Set default max_batch_size	2025-05-03 18:33:37 +02:00
turboderp	8c75b29923	Model: Fix some warnings	2025-05-03 18:33:36 +02:00
kingbri	15cc480cb0	Exl3: Simplify add_bos_token handling Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:50:42 -04:00
randoentity	d8a8ccfc2a	Model: fix add_bos_token	2025-05-02 21:33:25 -04:00
kingbri	0d02af3c81	Model: Set model_dir on init Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
kingbri	c89bea030e	Model: Add template fetching to Exl3 Use the same functionality as exl2's loader. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
kingbri	e8f00412f6	Model: Fetch from generation_config and tokenizer_config in Exl3 Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
kingbri	eca403a0e4	Model: Add Exllamav3 sampler File was not included in previous commit. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
kingbri	bdc5189a4b	Exl3: Add chunk size, cache size, and model info Use the same algorithm for estimating and adjusting cache size based on multiples of 256 and above max seq len. Same applies for chunk size. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
kingbri	303e2dde12	Model: Correct exl3 generation, add concurrency, and cleanup Fixes application of sampler parameters by adding a new sampler builder interface. Also expose the generator class-wide and add wait_for_jobs. Finally, allow inline loading to specify the backend. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
randoentity	c744790f14	fixup: add sampler logs Also passing sampler to job with this, no idea if this is correct	2025-05-02 21:33:25 -04:00
randoentity	b35c48da37	fixup: some metrics	2025-05-02 21:33:25 -04:00
randoentity	c0f268f33e	fixup: autosplit, start work on metrics	2025-05-02 21:33:25 -04:00
randoentity	306fc7cd15	fixup: autosplit reserve this probably breaks v2 support	2025-05-02 21:33:25 -04:00
randoentity	acb3adb953	fixup: auto split	2025-05-02 21:33:25 -04:00
randoentity	14fb573371	fixup: max_seq_len Whoops	2025-05-02 21:33:25 -04:00
randoentity	daae9ec43d	Exl3: Couldn't wait Just copied some stuff around and it ended up working for basic use.	2025-05-02 21:33:25 -04:00
kingbri	b4ff2f23cf	Exl3: Add token encode, decode, and special token fetch Base class methods Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:32:53 -04:00
kingbri	0c1d794390	Model: Add exl3 and associated load functions Initial exl3 compat and loading functionality. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:32:39 -04:00
kingbri	242f6b7d2a	Model: Simplify add_bos_token handling Set add_bos_token to True by default in the tokenizer_config stub. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:32:28 -04:00
kingbri	4cb3e5d5b1	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 00:23:15 -04:00
kingbri	47cb2a0de9	Model: Add TokenizerConfig stub and add_eos_token fallback This stub fetches the add_eos_token field from the HF tokenizer config. Ideally, this should be in the backend rather than tabby. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 00:08:01 -04:00
kingbri	aa657fa6e9	API: Ignore add_bos_token in chat completions When fetching special tokens from the model, don't factor in the add_bos_token and ban_eos_token parameters as switches. In addition, change the internal handling of add_bos_token to an optional boolean. This allows us to fallback to the model when selecting whether or not to add the BOS token, especially for chat completions. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-01 22:51:15 -04:00
kingbri	b43f0983c8	Model: Fix max_seq_len fallbacks The rope alpha calculation caused an error if max seq len isn't provided. This is because the model's max sequence length was not stored as the target for alpha calculation. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-28 14:09:31 -04:00
kingbri	f070587e9f	Model: Add proper jobs cleanup and fix var calls Jobs should be started and immediately cleaned up when calling the generation stream. Expose a stream_generate function and append this to the base class since it's more idiomatic than generate_gen. The exl2 container's generate_gen function is now internal. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-24 21:30:55 -04:00
kingbri	7e007f0761	Model: Handle finish chunks and logprobs in separate functions Helps split up and trim the generate_gen function. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-24 21:19:03 -04:00
kingbri	f2c7da2faf	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-21 23:21:26 -04:00
kingbri	3f09fcd8c9	Model: Make model params return a model card The model card is a unified structure for sharing model params. Rather than kwargs, use this instead. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-21 23:15:46 -04:00
kingbri	13beef8021	Model: Move find_template function to templating Makes sense to extract to a utility function instead. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-20 18:27:53 -04:00

1 2 3 4 5 ...

290 commits