jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
kingbri	d5963007f0	Model: Add backend print Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-08 23:45:04 -04:00
kingbri	cfee16905b	Model: Migrate backend detection to a separate function Seemed out of place in the common load function. In addition, rename the transformers utils signature which actually takes a directory instead of a file. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-08 23:42:39 -04:00
DocShotgun	f8070e7707	Model: Auto detect model backend from config * Use exllamav3 for exl3 models, exllamav2 otherwise	2025-05-06 18:51:58 -07:00
kingbri	bc0a84241a	API: Patch kobold generation call Calling the model requires different args now. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-05 22:11:21 -04:00
kingbri	b683545d0e	Config: Fix argparse help Adding a comma in the description converts the string to a tuple, which isn't parseable by argparse's help. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-05 21:52:30 -04:00
turboderp	ff38305145	Common: Fix exception f-string	2025-05-05 02:01:16 +02:00
turboderp	036af02bf6	Common: No default add_bos_token value for chat completion requests	2025-05-04 05:25:58 +02:00
turboderp	92ea7ee7cd	Model: Add draft model/speculative decoding	2025-05-04 01:27:42 +02:00
turboderp	1db2cb99cb	Model: Avoid initializing class variables	2025-05-04 01:26:42 +02:00
turboderp	0405a94a89	Model: Cast penalty range to int	2025-05-03 22:28:36 +02:00
turboderp	58c380b8ca	Model: Create generator on load	2025-05-03 18:33:37 +02:00
turboderp	0d949d00b9	Model: Set default max_batch_size	2025-05-03 18:33:37 +02:00
turboderp	8c75b29923	Model: Fix some warnings	2025-05-03 18:33:36 +02:00
kingbri	15cc480cb0	Exl3: Simplify add_bos_token handling Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:50:42 -04:00
randoentity	d8a8ccfc2a	Model: fix add_bos_token	2025-05-02 21:33:25 -04:00
kingbri	0d02af3c81	Model: Set model_dir on init Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
kingbri	c89bea030e	Model: Add template fetching to Exl3 Use the same functionality as exl2's loader. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
kingbri	e8f00412f6	Model: Fetch from generation_config and tokenizer_config in Exl3 Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
kingbri	59d081fe83	Common: Add hardware file Removed from a commit as well. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
kingbri	eca403a0e4	Model: Add Exllamav3 sampler File was not included in previous commit. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
kingbri	bdc5189a4b	Exl3: Add chunk size, cache size, and model info Use the same algorithm for estimating and adjusting cache size based on multiples of 256 and above max seq len. Same applies for chunk size. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
kingbri	303e2dde12	Model: Correct exl3 generation, add concurrency, and cleanup Fixes application of sampler parameters by adding a new sampler builder interface. Also expose the generator class-wide and add wait_for_jobs. Finally, allow inline loading to specify the backend. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
randoentity	c744790f14	fixup: add sampler logs Also passing sampler to job with this, no idea if this is correct	2025-05-02 21:33:25 -04:00
randoentity	b35c48da37	fixup: some metrics	2025-05-02 21:33:25 -04:00
randoentity	c0f268f33e	fixup: autosplit, start work on metrics	2025-05-02 21:33:25 -04:00
randoentity	306fc7cd15	fixup: autosplit reserve this probably breaks v2 support	2025-05-02 21:33:25 -04:00
randoentity	acb3adb953	fixup: auto split	2025-05-02 21:33:25 -04:00
randoentity	14fb573371	fixup: max_seq_len Whoops	2025-05-02 21:33:25 -04:00
randoentity	daae9ec43d	Exl3: Couldn't wait Just copied some stuff around and it ended up working for basic use.	2025-05-02 21:33:25 -04:00
kingbri	b4ff2f23cf	Exl3: Add token encode, decode, and special token fetch Base class methods Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:32:53 -04:00
kingbri	0c1d794390	Model: Add exl3 and associated load functions Initial exl3 compat and loading functionality. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:32:39 -04:00
kingbri	7c6a053747	Model: Add option to select backend Changing the backend switches the container that's used. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:32:39 -04:00
kingbri	242f6b7d2a	Model: Simplify add_bos_token handling Set add_bos_token to True by default in the tokenizer_config stub. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:32:28 -04:00
kingbri	4cb3e5d5b1	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 00:23:15 -04:00
kingbri	47cb2a0de9	Model: Add TokenizerConfig stub and add_eos_token fallback This stub fetches the add_eos_token field from the HF tokenizer config. Ideally, this should be in the backend rather than tabby. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 00:08:01 -04:00
kingbri	aa657fa6e9	API: Ignore add_bos_token in chat completions When fetching special tokens from the model, don't factor in the add_bos_token and ban_eos_token parameters as switches. In addition, change the internal handling of add_bos_token to an optional boolean. This allows us to fallback to the model when selecting whether or not to add the BOS token, especially for chat completions. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-01 22:51:15 -04:00
kingbri	3960612d38	API: Format and fix message naming Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-28 22:36:30 -04:00
kingbri	9157be3e34	API: Append task index to generations with n > 1 Since jobs are tracked via request IDs now, each generation task should be uniquely identified in the event of cancellation. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-28 22:29:48 -04:00
kingbri	b43f0983c8	Model: Fix max_seq_len fallbacks The rope alpha calculation caused an error if max seq len isn't provided. This is because the model's max sequence length was not stored as the target for alpha calculation. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-28 14:09:31 -04:00
kingbri	755f98a338	Docker: Move to venv for running Newer versions of Python don't allow system package installation unless --break-system-packages are specified. I'd like to avoid this if possible. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-27 00:38:07 -04:00
kingbri	f70eb11db3	Docker: Use python 3.12 Ubuntu 24.04 ships with 3.12 by default. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-27 00:24:32 -04:00
kingbri	09ddfa8ffb	Docker: Update to Cuda 12.8 and Ubuntu 24.04 Use more modern versions of dependencies for the containerized image. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-26 21:29:36 -04:00
kingbri	2b3ed3fc79	Dependencies: Switch back to official exl2 wheels These wheels are built properly and have the correct version and filename. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-26 21:27:28 -04:00
Brian	b081aa9fa3	Merge pull request #322 from theroyallab/model-rewrite Model rewrite	2025-04-26 02:15:48 -04:00
kingbri	3649d3bb51	Tree: Format + Lint Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-26 02:14:30 -04:00
kingbri	eb435f79e3	Dependencies (TEMP): Use my wheels for exl2 Use these until exl2 updates its wheels to have the version equal the filename. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-26 02:11:33 -04:00
kingbri	f4757d31bd	Model: Raise a 503 exception with model checks Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-25 00:00:15 -04:00
kingbri	136c8139f9	Dependencies: Update PyTorch, Exllamav2, and FA2 PyTorch: v2.7.0 on cuda 128 + ROCm 6.3 Exllamav2: v0.2.9 FA2: v2.7.4.post1 on cuda 128 Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-24 21:52:48 -04:00
kingbri	f070587e9f	Model: Add proper jobs cleanup and fix var calls Jobs should be started and immediately cleaned up when calling the generation stream. Expose a stream_generate function and append this to the base class since it's more idiomatic than generate_gen. The exl2 container's generate_gen function is now internal. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-24 21:30:55 -04:00
kingbri	7e007f0761	Model: Handle finish chunks and logprobs in separate functions Helps split up and trim the generate_gen function. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-24 21:19:03 -04:00

1 2 3 4 5 ...

968 commits