jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
kingbri	067d63773e	Config: Move sampling higher in the list This has become a bigger priority with addition of the safe_defaults noob proofing. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-18 22:55:03 -04:00
DocShotgun	6fb0c2cdbd	Config: Update description for override_preset default * We provide safe_defaults as a default in config_sample.yml but not internally	2025-08-18 12:39:52 -07:00
DocShotgun	998abe5ad1	Config: Enable safe sampler overrides by default * Provides safe fallback samplers, intended for better out-of-the-box support for clients that do not pass sampler params	2025-08-18 12:32:28 -07:00
kingbri	43f9483bc4	Model: Add tensor_parallel_backend option This allows for users to use nccl or native depending on the GPU setup. NCCL is only available with Linux built wheels. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-08-17 22:35:10 -04:00
DocShotgun	81a115b781	Templating: Support chat_template.jinja	2025-08-03 16:10:08 -07:00
DocShotgun	102af306e5	Config: Remove developer arg cuda_malloc_backend * cudaMallocAsync is now enabled by default on supported configurations	2025-08-01 10:59:13 -07:00
kingbri	879f4cee7e	API: Modify tool calling for wider compat When revisiting tool calls, the formats have more or less become standard. For greater compatibility with templates, primarily use the message.tools parameter and remove the extra custom metadata that is no longer required. However, unlike other backends, tabbyAPI still uses template metadata to declare what the tool start string is. This allows for template-level customization along with giving more power to the user while the server exists to consume rather than work on a case-by-case basis. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-05 14:28:12 -04:00
kingbri	d23fefbecd	API + Model: Fix application of defaults use_as_default was not being properly applied into model overrides. For compartmentalization's sake, apply all overrides in a single function to avoid clutter. In addition, fix where the traditional /v1/model/load endpoint checks for draft options. These can be applied via an inline config, so let any failures fallthrough. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-03 14:37:34 -04:00
kingbri	d339139fb6	Config: Deep merge model overrides Anything below the first level of kwargs was not being merged properly. A more bulletproof solution would be to refactor the loading code to separate draft and normal model parameters. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-03 12:17:09 -04:00
kingbri	0152a1665b	Downloader: Switch to use API sizes Rather than relying on Content-Length which can be unreliable, ping the API to get file sizes and work from there. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-06-30 12:49:53 -04:00
kingbri	03ff4c3128	Downloader: Handle if Content-Length is undefined Usually, the client and server both are aware of the file size by sending a Content-Length header. However, HuggingFace has changed their headers and now does not always send Content-Length. In this case, show an indeterminate progressbar and mark as complete once the download finishes. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-06-30 11:43:22 -04:00
kingbri	2913ce29fc	API: Add timings to usage stats It's useful for the client to know what the T/s and total time for generation are per-request. Works with both completions and chat completions. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-06-17 22:54:51 -04:00
kingbri	5d94d4d022	Merge branch 'main' into breaking	2025-06-17 22:24:32 -04:00
turboderp	122d87ac36	Tree: Format	2025-06-15 19:33:14 +02:00
turboderp	21c5af48e1	Tree: Format	2025-06-15 19:30:38 +02:00
turboderp	1c9891bf04	Exl3: Add vision capability	2025-06-15 19:22:51 +02:00
turboderp	4605c0f6bd	Common: Refactor get_image to common functions	2025-06-15 19:20:36 +02:00
kingbri	2096c9bad2	Model: Default max_seq_len to 4096 A common problem in TabbyAPI is that users who want to get up and running with a model always had issues with max_seq_len causing OOMs. This is because model devs set max context values in the millions which requires a lot of VRAM. To idiot-proof first time setup, make the fallback default 4096 so users can run their models. If a user still wants to use the model's max_seq_len, set it to -1. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-06-13 14:57:24 -04:00
kingbri	322f9b773a	Model: Migrate inline config to new format This matches config.yml and all model overrides should go under the "model" block. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-06-13 14:57:24 -04:00
kingbri	0ea56382f0	Dependencies: Fix unsupported dependency error Log the package name provided to the check function. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-06-13 14:57:02 -04:00
kingbri	0c4cc1eba3	Model: Add prompt logging to ExllamaV3 Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-17 22:05:18 -04:00
Brian	81170eee00	Merge pull request #312 from davidallada/add-file-based-logging Add file based logging in addition to the normal console logs	2025-05-17 01:24:19 -04:00
kingbri	17f3dca6fc	Packaging: Add agnostic method to check version of packages Some packages such as ExllamaV2 and V3 require specific versions for the latest features. Rather than creating repetitive functions, create an agnostic function to check the installed package and then report to the user to upgrade. This is also sent to requests for loading and unloading, so keep the error short. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-17 01:04:24 -04:00
kingbri	0858b6d4b2	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-17 00:46:40 -04:00
kingbri	390daeb92f	Model: Create universal HFModel class The HFModel class serves to coalesce all config files that contain random keys which are required for model usage. Adding this base class allows us to expand as HuggingFace randomly changes their JSON schemas over time, reducing the brunt that backend devs need to feel when their next model isn't supported. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-13 18:12:38 -04:00
kingbri	6379081dd8	Sampling: Make add_bos_token override concise Also set the default to None so text completions follows the same pattern. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-10 19:07:35 -04:00
kingbri	42346c6b39	Sampling: Remove skip_special_tokens This parameter is way too confusing and does not make sense in the modern LLM space. Change approved by all maintainers. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-09 22:11:33 -04:00
kingbri	48ea1737cf	Startup: Check agnostically for inference deps If an inference dep isn't present, force exit the application. This occurs after all subcommands have been appropriately processed. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-09 21:59:00 -04:00
Brian	02a8d68e17	Merge branch 'exl3' into backend-detect	2025-05-08 23:50:33 -04:00
kingbri	d5963007f0	Model: Add backend print Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-08 23:45:04 -04:00
kingbri	cfee16905b	Model: Migrate backend detection to a separate function Seemed out of place in the common load function. In addition, rename the transformers utils signature which actually takes a directory instead of a file. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-08 23:42:39 -04:00
Brian	527afc206b	Merge pull request #329 from DocShotgun/exl3 Exllamav3 cache quantization	2025-05-08 23:11:45 -04:00
DocShotgun	f8070e7707	Model: Auto detect model backend from config * Use exllamav3 for exl3 models, exllamav2 otherwise	2025-05-06 18:51:58 -07:00
DocShotgun	9dcde59c57	Model: Check for unsupported cache mode in exllamav2	2025-05-06 01:18:15 -07:00
kingbri	b683545d0e	Config: Fix argparse help Adding a comma in the description converts the string to a tuple, which isn't parseable by argparse's help. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-05 21:52:30 -04:00
turboderp	ff38305145	Common: Fix exception f-string	2025-05-05 02:01:16 +02:00
DocShotgun	58e34ba4c5	Model: Exl3 cache quant settings lenient with whitespace	2025-05-03 20:35:35 -07:00
DocShotgun	68a660bdb3	Model: Initial Exl3 cache quantization support	2025-05-03 20:35:35 -07:00
kingbri	59d081fe83	Common: Add hardware file Removed from a commit as well. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
kingbri	303e2dde12	Model: Correct exl3 generation, add concurrency, and cleanup Fixes application of sampler parameters by adding a new sampler builder interface. Also expose the generator class-wide and add wait_for_jobs. Finally, allow inline loading to specify the backend. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
randoentity	306fc7cd15	fixup: autosplit reserve this probably breaks v2 support	2025-05-02 21:33:25 -04:00
kingbri	0c1d794390	Model: Add exl3 and associated load functions Initial exl3 compat and loading functionality. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:32:39 -04:00
kingbri	7c6a053747	Model: Add option to select backend Changing the backend switches the container that's used. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:32:39 -04:00
kingbri	242f6b7d2a	Model: Simplify add_bos_token handling Set add_bos_token to True by default in the tokenizer_config stub. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:32:28 -04:00
kingbri	47cb2a0de9	Model: Add TokenizerConfig stub and add_eos_token fallback This stub fetches the add_eos_token field from the HF tokenizer config. Ideally, this should be in the backend rather than tabby. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 00:08:01 -04:00
kingbri	3649d3bb51	Tree: Format + Lint Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-26 02:14:30 -04:00
kingbri	f4757d31bd	Model: Raise a 503 exception with model checks Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-25 00:00:15 -04:00
kingbri	f070587e9f	Model: Add proper jobs cleanup and fix var calls Jobs should be started and immediately cleaned up when calling the generation stream. Expose a stream_generate function and append this to the base class since it's more idiomatic than generate_gen. The exl2 container's generate_gen function is now internal. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-24 21:30:55 -04:00
David Allada	bc1bef3324	FIx logs path	2025-04-22 21:14:45 -04:00
kingbri	f2c7da2faf	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-21 23:21:26 -04:00

1 2 3 4 5 ...

312 commits