jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
DocShotgun	9dcde59c57	Model: Check for unsupported cache mode in exllamav2	2025-05-06 01:18:15 -07:00
DocShotgun	45b966363e	Tree: Format	2025-05-03 21:01:03 -07:00
DocShotgun	a635a719d7	Model: Enable draft model q-cache in Exl3 * Remove unneeded default fp16 cache layer import	2025-05-03 20:59:36 -07:00
DocShotgun	58e34ba4c5	Model: Exl3 cache quant settings lenient with whitespace	2025-05-03 20:35:35 -07:00
DocShotgun	68a660bdb3	Model: Initial Exl3 cache quantization support	2025-05-03 20:35:35 -07:00
turboderp	92ea7ee7cd	Model: Add draft model/speculative decoding	2025-05-04 01:27:42 +02:00
turboderp	1db2cb99cb	Model: Avoid initializing class variables	2025-05-04 01:26:42 +02:00
turboderp	0405a94a89	Model: Cast penalty range to int	2025-05-03 22:28:36 +02:00
turboderp	58c380b8ca	Model: Create generator on load	2025-05-03 18:33:37 +02:00
turboderp	0d949d00b9	Model: Set default max_batch_size	2025-05-03 18:33:37 +02:00
turboderp	8c75b29923	Model: Fix some warnings	2025-05-03 18:33:36 +02:00
kingbri	15cc480cb0	Exl3: Simplify add_bos_token handling Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:50:42 -04:00
randoentity	d8a8ccfc2a	Model: fix add_bos_token	2025-05-02 21:33:25 -04:00
kingbri	0d02af3c81	Model: Set model_dir on init Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
kingbri	c89bea030e	Model: Add template fetching to Exl3 Use the same functionality as exl2's loader. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
kingbri	e8f00412f6	Model: Fetch from generation_config and tokenizer_config in Exl3 Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
kingbri	eca403a0e4	Model: Add Exllamav3 sampler File was not included in previous commit. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
kingbri	bdc5189a4b	Exl3: Add chunk size, cache size, and model info Use the same algorithm for estimating and adjusting cache size based on multiples of 256 and above max seq len. Same applies for chunk size. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
kingbri	303e2dde12	Model: Correct exl3 generation, add concurrency, and cleanup Fixes application of sampler parameters by adding a new sampler builder interface. Also expose the generator class-wide and add wait_for_jobs. Finally, allow inline loading to specify the backend. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
randoentity	c744790f14	fixup: add sampler logs Also passing sampler to job with this, no idea if this is correct	2025-05-02 21:33:25 -04:00
randoentity	b35c48da37	fixup: some metrics	2025-05-02 21:33:25 -04:00
randoentity	c0f268f33e	fixup: autosplit, start work on metrics	2025-05-02 21:33:25 -04:00
randoentity	306fc7cd15	fixup: autosplit reserve this probably breaks v2 support	2025-05-02 21:33:25 -04:00
randoentity	acb3adb953	fixup: auto split	2025-05-02 21:33:25 -04:00
randoentity	14fb573371	fixup: max_seq_len Whoops	2025-05-02 21:33:25 -04:00
randoentity	daae9ec43d	Exl3: Couldn't wait Just copied some stuff around and it ended up working for basic use.	2025-05-02 21:33:25 -04:00
kingbri	b4ff2f23cf	Exl3: Add token encode, decode, and special token fetch Base class methods Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:32:53 -04:00
kingbri	0c1d794390	Model: Add exl3 and associated load functions Initial exl3 compat and loading functionality. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:32:39 -04:00
kingbri	242f6b7d2a	Model: Simplify add_bos_token handling Set add_bos_token to True by default in the tokenizer_config stub. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:32:28 -04:00
kingbri	4cb3e5d5b1	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 00:23:15 -04:00
kingbri	47cb2a0de9	Model: Add TokenizerConfig stub and add_eos_token fallback This stub fetches the add_eos_token field from the HF tokenizer config. Ideally, this should be in the backend rather than tabby. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 00:08:01 -04:00
kingbri	aa657fa6e9	API: Ignore add_bos_token in chat completions When fetching special tokens from the model, don't factor in the add_bos_token and ban_eos_token parameters as switches. In addition, change the internal handling of add_bos_token to an optional boolean. This allows us to fallback to the model when selecting whether or not to add the BOS token, especially for chat completions. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-01 22:51:15 -04:00
kingbri	b43f0983c8	Model: Fix max_seq_len fallbacks The rope alpha calculation caused an error if max seq len isn't provided. This is because the model's max sequence length was not stored as the target for alpha calculation. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-28 14:09:31 -04:00
kingbri	f070587e9f	Model: Add proper jobs cleanup and fix var calls Jobs should be started and immediately cleaned up when calling the generation stream. Expose a stream_generate function and append this to the base class since it's more idiomatic than generate_gen. The exl2 container's generate_gen function is now internal. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-24 21:30:55 -04:00
kingbri	7e007f0761	Model: Handle finish chunks and logprobs in separate functions Helps split up and trim the generate_gen function. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-24 21:19:03 -04:00
kingbri	f2c7da2faf	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-21 23:21:26 -04:00
kingbri	3f09fcd8c9	Model: Make model params return a model card The model card is a unified structure for sharing model params. Rather than kwargs, use this instead. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-21 23:15:46 -04:00
kingbri	13beef8021	Model: Move find_template function to templating Makes sense to extract to a utility function instead. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-20 18:27:53 -04:00
kingbri	8e238fa8f6	Model: Move calculate_rope_alpha from backend Makes more sense to use as a utility function. Also clarify how the vars are set. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-20 18:20:19 -04:00
kingbri	b751e0a1d5	Model: Move inline overrides to common This is applied across containers. Doesn't make sense to put this method in the backend. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-20 17:51:57 -04:00
kingbri	034682fcf1	Backends: Add base model container Base class for all model containers. Used in the shared model file for interface. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-20 17:24:10 -04:00
kingbri	f15ac1f69d	Model: Reject model requests when unloading If a model is being unloaded, that means its being shut down and no requests should be accepted from then on. Also, remove model_is_loaded since we simply check if the container is None now. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-19 22:34:06 -04:00
kingbri	3f1d5d396e	Model: Store active jobs in tabby Rather than relying on the generator, use tabby to store the active job IDs. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-16 13:17:55 -04:00
kingbri	1afc9b983e	Model: Remove generate_window Not required since we error with exceeding the max_seq_len Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-16 12:59:02 -04:00
kingbri	2f5235e1a3	Model: Extract settings creation to a separate function Maybe move this out of the class entirely, but for now, it makes sense to encapsulate this logic. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-16 12:57:27 -04:00
kingbri	5697204e47	Merge branch 'main' into model-rewrite	2025-04-16 02:15:46 -04:00
kingbri	6bb5f8f599	Sampling: Rewrite mirostat_mode parameter Apparently the "mirostat" parameter has been updated by frontends to pass a number. ExllamaV2 expects a boolean, but most pass a number anyway, so just alias mirostat_mode and mirostat together. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-16 02:13:55 -04:00
kingbri	3084ef9fa1	Model + API: Migrate to use BaseSamplerParams kwargs is pretty ugly when figuring out which arguments to use. The base requests falls back to defaults anyways, so pass in the params object as is. However, since Python's typing isn't like TypeScript where types can be transformed, the type hinting has a possiblity of None showing up despite there always being a value for some params. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-16 00:50:05 -04:00
kingbri	dcb36e9ab2	Model: Remove extra unwraps The base sampler request already specifies the defaults, so don't unwrap in this way. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-15 23:38:46 -04:00
kingbri	11ed3cf5ee	Model: Cleanup logging and remove extraneous declarations Log the parameters passed into the generate gen function rather than the generation settings to reduce complexity. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-15 23:31:12 -04:00

1 2 3 4 5 ...

278 commits