jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
randoentity	c744790f14	fixup: add sampler logs Also passing sampler to job with this, no idea if this is correct	2025-05-02 21:33:25 -04:00
randoentity	b35c48da37	fixup: some metrics	2025-05-02 21:33:25 -04:00
randoentity	c0f268f33e	fixup: autosplit, start work on metrics	2025-05-02 21:33:25 -04:00
randoentity	306fc7cd15	fixup: autosplit reserve this probably breaks v2 support	2025-05-02 21:33:25 -04:00
randoentity	acb3adb953	fixup: auto split	2025-05-02 21:33:25 -04:00
randoentity	14fb573371	fixup: max_seq_len Whoops	2025-05-02 21:33:25 -04:00
randoentity	daae9ec43d	Exl3: Couldn't wait Just copied some stuff around and it ended up working for basic use.	2025-05-02 21:33:25 -04:00
kingbri	b4ff2f23cf	Exl3: Add token encode, decode, and special token fetch Base class methods Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:32:53 -04:00
kingbri	0c1d794390	Model: Add exl3 and associated load functions Initial exl3 compat and loading functionality. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:32:39 -04:00
kingbri	7c6a053747	Model: Add option to select backend Changing the backend switches the container that's used. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:32:39 -04:00
kingbri	242f6b7d2a	Model: Simplify add_bos_token handling Set add_bos_token to True by default in the tokenizer_config stub. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:32:28 -04:00
kingbri	4cb3e5d5b1	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 00:23:15 -04:00
kingbri	47cb2a0de9	Model: Add TokenizerConfig stub and add_eos_token fallback This stub fetches the add_eos_token field from the HF tokenizer config. Ideally, this should be in the backend rather than tabby. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 00:08:01 -04:00
kingbri	aa657fa6e9	API: Ignore add_bos_token in chat completions When fetching special tokens from the model, don't factor in the add_bos_token and ban_eos_token parameters as switches. In addition, change the internal handling of add_bos_token to an optional boolean. This allows us to fallback to the model when selecting whether or not to add the BOS token, especially for chat completions. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-01 22:51:15 -04:00
kingbri	3960612d38	API: Format and fix message naming Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-28 22:36:30 -04:00
kingbri	9157be3e34	API: Append task index to generations with n > 1 Since jobs are tracked via request IDs now, each generation task should be uniquely identified in the event of cancellation. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-28 22:29:48 -04:00
kingbri	b43f0983c8	Model: Fix max_seq_len fallbacks The rope alpha calculation caused an error if max seq len isn't provided. This is because the model's max sequence length was not stored as the target for alpha calculation. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-28 14:09:31 -04:00
kingbri	755f98a338	Docker: Move to venv for running Newer versions of Python don't allow system package installation unless --break-system-packages are specified. I'd like to avoid this if possible. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-27 00:38:07 -04:00
kingbri	f70eb11db3	Docker: Use python 3.12 Ubuntu 24.04 ships with 3.12 by default. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-27 00:24:32 -04:00
kingbri	09ddfa8ffb	Docker: Update to Cuda 12.8 and Ubuntu 24.04 Use more modern versions of dependencies for the containerized image. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-26 21:29:36 -04:00
kingbri	2b3ed3fc79	Dependencies: Switch back to official exl2 wheels These wheels are built properly and have the correct version and filename. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-26 21:27:28 -04:00
Brian	b081aa9fa3	Merge pull request #322 from theroyallab/model-rewrite Model rewrite	2025-04-26 02:15:48 -04:00
kingbri	3649d3bb51	Tree: Format + Lint Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-26 02:14:30 -04:00
kingbri	eb435f79e3	Dependencies (TEMP): Use my wheels for exl2 Use these until exl2 updates its wheels to have the version equal the filename. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-26 02:11:33 -04:00
kingbri	f4757d31bd	Model: Raise a 503 exception with model checks Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-25 00:00:15 -04:00
kingbri	136c8139f9	Dependencies: Update PyTorch, Exllamav2, and FA2 PyTorch: v2.7.0 on cuda 128 + ROCm 6.3 Exllamav2: v0.2.9 FA2: v2.7.4.post1 on cuda 128 Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-24 21:52:48 -04:00
kingbri	f070587e9f	Model: Add proper jobs cleanup and fix var calls Jobs should be started and immediately cleaned up when calling the generation stream. Expose a stream_generate function and append this to the base class since it's more idiomatic than generate_gen. The exl2 container's generate_gen function is now internal. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-24 21:30:55 -04:00
kingbri	7e007f0761	Model: Handle finish chunks and logprobs in separate functions Helps split up and trim the generate_gen function. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-24 21:19:03 -04:00
David Allada	bc1bef3324	FIx logs path	2025-04-22 21:14:45 -04:00
kingbri	f2c7da2faf	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-21 23:21:26 -04:00
kingbri	3f09fcd8c9	Model: Make model params return a model card The model card is a unified structure for sharing model params. Rather than kwargs, use this instead. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-21 23:15:46 -04:00
kingbri	9834c7f99b	Dependencies: Ungate numpy numpy v2 now works with Torch Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-21 23:14:14 -04:00
kingbri	d26260b332	Model: Add fixes for kwargs and add note for migration One goal is to try migrating away from kwargs and use the ModelLoadRequest instead. However, Pydantic doesn't support async validators making parsing of the inline config impossible due to its use of aiofiles. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-21 22:39:07 -04:00
Brian	93854a3107	Merge pull request #320 from Vhallo/model-rewrite Fix RoPE Ratio	2025-04-21 10:55:55 -04:00
Vhallo	1aefa01a68	Fix RoPE Ratio	2025-04-21 01:46:18 +02:00
kingbri	13beef8021	Model: Move find_template function to templating Makes sense to extract to a utility function instead. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-20 18:27:53 -04:00
kingbri	8e238fa8f6	Model: Move calculate_rope_alpha from backend Makes more sense to use as a utility function. Also clarify how the vars are set. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-20 18:20:19 -04:00
kingbri	027ffce05d	Utils: Remove unused defer utils These did not work anyways Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-20 17:59:09 -04:00
kingbri	b751e0a1d5	Model: Move inline overrides to common This is applied across containers. Doesn't make sense to put this method in the backend. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-20 17:51:57 -04:00
kingbri	034682fcf1	Backends: Add base model container Base class for all model containers. Used in the shared model file for interface. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-20 17:24:10 -04:00
kingbri	f15ac1f69d	Model: Reject model requests when unloading If a model is being unloaded, that means its being shut down and no requests should be accepted from then on. Also, remove model_is_loaded since we simply check if the container is None now. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-19 22:34:06 -04:00
kingbri	552a64c723	Model: Have load take the highest priority The admin takes priority over the regular user. Therefore, if a model is loading, ignore all incoming generation requests Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-18 22:08:48 -04:00
kingbri	3f1d5d396e	Model: Store active jobs in tabby Rather than relying on the generator, use tabby to store the active job IDs. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-16 13:17:55 -04:00
kingbri	1afc9b983e	Model: Remove generate_window Not required since we error with exceeding the max_seq_len Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-16 12:59:02 -04:00
kingbri	2f5235e1a3	Model: Extract settings creation to a separate function Maybe move this out of the class entirely, but for now, it makes sense to encapsulate this logic. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-16 12:57:27 -04:00
kingbri	5697204e47	Merge branch 'main' into model-rewrite	2025-04-16 02:15:46 -04:00
kingbri	6bb5f8f599	Sampling: Rewrite mirostat_mode parameter Apparently the "mirostat" parameter has been updated by frontends to pass a number. ExllamaV2 expects a boolean, but most pass a number anyway, so just alias mirostat_mode and mirostat together. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-16 02:13:55 -04:00
kingbri	3084ef9fa1	Model + API: Migrate to use BaseSamplerParams kwargs is pretty ugly when figuring out which arguments to use. The base requests falls back to defaults anyways, so pass in the params object as is. However, since Python's typing isn't like TypeScript where types can be transformed, the type hinting has a possiblity of None showing up despite there always being a value for some params. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-16 00:50:05 -04:00
kingbri	dcb36e9ab2	Model: Remove extra unwraps The base sampler request already specifies the defaults, so don't unwrap in this way. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-15 23:38:46 -04:00
kingbri	11ed3cf5ee	Model: Cleanup logging and remove extraneous declarations Log the parameters passed into the generate gen function rather than the generation settings to reduce complexity. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-15 23:31:12 -04:00

1 2 3 4 5 ...

1051 commits