jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
kingbri	bdc5189a4b	Exl3: Add chunk size, cache size, and model info Use the same algorithm for estimating and adjusting cache size based on multiples of 256 and above max seq len. Same applies for chunk size. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-05-02 21:33:25 -04:00
kingbri	0dcbb7a722	Dependencies: Update torch, exllamav2, and flash-attn Torch - 2.6.0 ExllamaV2 - 0.2.8 Flash-attn - 2.7.4.post1 Cuda wheels are now 12.4 instead of 12.1, feature names need to be migrated over. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-09 01:27:48 -05:00
TerminalMan	3aeddc5255	fix issues with optional dependencies (#204 ) * fix issues with optional dependencies * format document * Tree: Format and comment	2024-09-19 22:24:55 -04:00
turboderp	318c425d84	Bump exllamav2 to 0.2.2	2024-09-14 21:43:26 +02:00
kingbri	cf97113868	Dependencies: Update Exllamav2 v0.2.1 Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-08 21:12:31 -04:00
kingbri	565b0300d6	Dependencies: Update Exllamav2 v0.1.9 Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-22 14:15:19 -04:00
kingbri	8ff2586d45	Start: Fix pip update, method calls, and logging platform.system() was not called in some places, breaking the ternary on Windows. Pip's --upgrade flag does not actually update dependencies to their latest versions. That's what the --upgrade-strategy eager flag is for. Tell the user where their start preferences are coming from. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-04 10:30:26 -04:00
kingbri	b6d2676f1c	Start: Give the user a hint when a module can't be imported If an ImportError or ModuleNotFoundError is raised, tell the user to run the update scripts. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 21:59:06 -04:00
kingbri	073e9fa6f0	Dependencies: Bump ExllamaV2 v0.1.7 Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-11 14:22:50 -04:00
kingbri	773639ea89	Model: Fix flash-attn checks If flash attention is already turned off by exllamaV2 itself, don't try creating a paged generator. Also condense all the redundant logic into one if statement. Also check arch_compat_overrides to see if flash attention should be disabled for a model arch (ex. Gemma 2) Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-06 20:58:24 -04:00
kingbri	c5ea2abe24	Dependencies: Update ExllamaV2 v0.1.6 Signed-off-by: kingbri <bdashore3@proton.me>	2024-06-23 21:45:04 -04:00
kingbri	c575105e41	ExllamaV2: Cleanup log placements Move the large import errors into the check functions themselves. This helps reduce the difficulty in interpreting where errors are coming from. Signed-off-by: kingbri <bdashore3@proton.me>	2024-06-16 00:16:03 -04:00
DocShotgun	156b74f3f0	Revision to paged attention checks (#133 ) * Model: Clean up paged attention checks * Model: Move cache_size checks after paged attn checks Cache size is only relevant in paged mode * Model: Fix no_flash_attention * Model: Remove no_flash_attention Ability to use flash attention is auto-detected, so this flag is unneeded. Uninstall flash attention to disable it on supported hardware.	2024-06-09 17:28:11 +02:00
DocShotgun	55d979b7a5	Update dependencies, support Python 3.12, update for exl2 0.1.5 (#134 ) * Dependencies: Add wheels for Python 3.12 * Model: Switch fp8 cache to Q8 cache * Model: Add ability to set draft model cache mode * Dependencies: Bump exllamav2 to 0.1.5 * Model: Support Q6 cache * Config: Add Q6 cache and draft_cache_mode to config sample	2024-06-09 17:27:39 +02:00
turboderp	e889fa3efe	Bump exllamav2 to v0.1.4 (#128 )	2024-06-04 02:32:08 +02:00
kingbri	19961f4126	Dependencies: Update ExllamaV2 v0.1.1 Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-27 13:38:07 -04:00
kingbri	47582c2440	Dependencies: Update ExllamaV2 v0.1.0 Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-25 21:16:14 -04:00
kingbri	cd78728a77	Dependencies: Update ExllamaV2 v0.0.21 Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-11 19:26:03 -04:00
kingbri	0e015ad58e	Dependencies: Update ExllamaV2 v0.0.20 ROCm 6.0 is now required Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-28 11:06:59 -04:00
kingbri	30c4554572	Requirements: Update Exllamav2 v0.0.18 Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-07 18:00:56 -04:00
kingbri	6ecce1604b	Model: Fix log if exl2 version is too low Switch to pyproject syntax. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-31 23:11:21 -04:00
kingbri	f534930270	Dependencies: Bump Exllamav2 v0.0.17 Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-31 23:10:28 -04:00
kingbri	7020a0a2d1	Dependencies: Update Exllamav2 v0.0.16 Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-20 15:21:37 -04:00
kingbri	228c227c1e	Logging: Switch to loguru Loguru is a flexible logger that allows for easier hooking and imports into Rich with no problems. Also makes progress bars stick to the bottom of the terminal window. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-08 01:00:48 -05:00
kingbri	39617adb65	Requirements: Update Exllamav2 v0.0.15 Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-06 22:29:55 -05:00
kingbri	ccd41d720d	Requirements: Bump ExllamaV2 v0.0.14 Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-24 12:26:08 -05:00
kingbri	ea00a6bd45	Requirements: Update Exllamav2 Update to v0.0.13.post2 Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-14 21:51:25 -05:00
kingbri	9f1d891490	Packages: Fix exllamav2 version check Post versions are ok to use for checking if the user is on the correct exllamav2 wheel. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-10 14:00:26 -05:00
kingbri	6eeb62b82c	Requirements: Update exllamav2, torch, and FA2 Torch to 2.2, exllamav2 to 0.0.13, FA2 to 2.4.2 on Windows and 2.5.2 on Linux. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-02 23:53:42 -05:00
kingbri	1919bf7705	Launch: Make exllamav2 requirement more friendly Add the ability to use an unsafe config flag if needed and migrate the exl2 check to a different file within the exl2 backend code. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-02 23:36:17 -05:00

30 commits