tabbyAPI-ollama

History

kingbri 871c89063d Model: Add Tensor Parallel support Use the tensor parallel loader when the flag is enabled. The new loader has its own autosplit implementation, so gpu_split_auto isn't valid here. Also make it easier to determine which cache type to use rather than multiple if/else statements. Signed-off-by: kingbri <bdashore3@proton.me>		2024-08-22 14:15:19 -04:00
..
args.py	Model: Add Tensor Parallel support	2024-08-22 14:15:19 -04:00
auth.py	Auth: Fix disable auth when checking for key permissions	2024-07-26 15:04:29 -04:00
concurrency.py	API + Model: Add blocks and checks for various load requests	2024-05-25 21:16:14 -04:00
config.py	Embeddings: Update config, args, and parameter names	2024-07-30 15:32:26 -04:00
downloader.py	Downloader: Make timeout configurable	2024-07-23 21:42:38 -04:00
gen_logging.py	Model: Attach request ID to logs	2024-08-01 00:25:54 -04:00
logger.py	API: Add HuggingFace downloader	2024-04-29 01:15:02 -04:00
model.py	Model: Bypass lock checks when shutting down	2024-08-03 16:05:34 -04:00
networking.py	API: Add request logging	2024-07-22 21:40:00 -04:00
sampling.py	[WIP] OpenAI Tools Support/Function calling (#154 )	2024-08-17 00:16:25 -04:00
signals.py	Model: Bypass lock checks when shutting down	2024-08-03 16:05:34 -04:00
templating.py	Templates: Switch to async jinja engine	2024-08-17 12:03:41 -04:00
transformers_utils.py	Tree: Format	2024-07-26 18:33:04 -04:00
utils.py	Model: Add support for HuggingFace config and bad_words_ids	2024-07-26 18:23:22 -04:00