jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
kingbri	830301b2b4	Actions: Update and add Wiki publish Publishes the github wiki and runs these in concurrency groups to avoid spawning multiple actions at a time. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-17 23:47:38 -05:00
kingbri	5614b342a7	Tree: Migrate docs into repository This will auto-publish to the Github wiki via an action. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-17 23:39:35 -05:00
kingbri	9f649647f0	Model + API: GPU split updates and fixes For the TP loader, GPU split cannot be an empty array. However, defaulting the parameter to an empty array makes it easier to calculate the device list. Therefore, cast an empty array to None using falsy comparisons at load time. Also add draft_gpu_split to the load request. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-15 21:50:14 -05:00
Brian	304df16543	Update README.md	2025-02-15 12:14:06 -05:00
Brian	ba9fae808e	Merge pull request #281 from mefich/main Add strftime_now to Jinja2 to use with Granite3 models	2025-02-13 22:52:51 -05:00
kingbri	7f6294a96d	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-13 22:42:59 -05:00
mefich	3b482d80e4	Add strftime_now to Jijnja2 to use with Granite3 models Granite3 default template uses strftime_now function. Currently Jinja2 raises an exception because strftime_now is undefined and /v1/chat/completions endpoint doesn't work with these models when a template from the model metadata is used.	2025-02-13 18:08:24 +02:00
Brian	2e491472d1	Merge pull request #254 from lucyknada/main add draft_gpu_split option for spec decoding	2025-02-11 16:48:03 -05:00
kingbri	e290b88568	Args: Expose api-servers to subcommands This is required for the export-openapi action. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-10 23:39:46 -05:00
kingbri	153dac496c	Args: Fix imports and handling of export openapi The api-servers arg is passed when running subcommands, so use that instead of replicating the arg again. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-10 23:19:44 -05:00
kingbri	30ab8e04b9	Args: Add subcommands to run actions Migrate OpenAPI and sample config export to subcommands "export-openapi" and "export-config". Also add a "download" subcommand that passes args to the TabbyAPI downloader. This allows models to be downloaded via the API and CLI args. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-10 23:14:22 -05:00
kingbri	30f02e5453	Main: Remove uvloop/winloop from experimental status Uvloop/Winloop does provide advantages to asyncio vs the standard Proactor loop, so remove experimental status. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-10 21:30:48 -05:00
kingbri	0dcbb7a722	Dependencies: Update torch, exllamav2, and flash-attn Torch - 2.6.0 ExllamaV2 - 0.2.8 Flash-attn - 2.7.4.post1 Cuda wheels are now 12.4 instead of 12.1, feature names need to be migrated over. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-09 01:27:48 -05:00
kingbri	beb6d8faa5	Model: Adjust draft_gpu_split and add to config The previous code overrode the existing gpu split and device idx values. This now sets an independent draft_gpu_split value and adjusts the gpu_devices check only if the draft_gpu_split array is larger than the gpu_split array. Draft gpu split is not Tensor Parallel, and defaults to gpu_split_auto if a split is not provided. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-08 16:09:46 -05:00
kingbri	bd8256d168	Merge branch 'main' into draft-split	2025-02-08 15:10:44 -05:00
kingbri	dcbf2de9e5	Logger: Add timestamps Was against this for a while due to the length of timestamps clogging the console, but it makes sense to know when something goes wrong. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-07 18:40:28 -05:00
kingbri	54fda0dc09	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-07 18:03:33 -05:00
kingbri	96e8375ec8	Multimodal: Fix memory leak with MMEmbeddings On a basic python class, class attributes are handled by reference, meaning that every instance of embeddings would attach to that reference and allocate more memory. Switch to a Pydantic class and factory methods when instantiating. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-02 12:21:19 -05:00
kingbri	bd16681825	Start: Mark cuda 11.8 as unsupported Temporary until existing cuda 11.8 scripts can be migrated to cuda 12. Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2025-01-12 21:50:41 -05:00
Brian	566e5b5937	Merge pull request #271 from lifo9/bump-formatron Bump formatron to `0.4.11`	2025-01-07 23:19:35 -05:00
Jakub Filo	f8d9cfb5fd	Bump formatron to 0.4.11	2025-01-08 00:48:25 +01:00
kingbri	cfb439c0e6	Dependencies: Update exllamav2 and pytorch for ROCm Exllama v0.2.7, pytorch v2.5.1 across all cards. AMD now requires ROCm 6.2 Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2025-01-01 16:22:10 -05:00
kingbri	6da65a8fd3	Embeddings: Fix base64 return A base64 embedding can be a string post-encoding. Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2025-01-01 16:15:12 -05:00
kingbri	245bd5c008	Templates: Alter chatml_with_headers to fit huggingface spec The previous template was compatible with Jinja2 in Python, but it was not cross-platform compatible according to HF's standards. Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-30 14:00:44 -05:00
Brian	709493837b	Merge pull request #264 from DocShotgun/robust-length-checking Robust request length checking in generator	2024-12-26 23:37:53 -05:00
kingbri	b994aae995	Model: Cleanup generation length and page checks Reduce the amount of if statements and combine parts of code. Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-26 23:13:08 -05:00
kingbri	ba2579ff74	Merge branch 'main' into robust-length-checks	2024-12-26 18:00:26 -05:00
kingbri	7878d351a7	Endpoints: Add props endpoint and add more values to model params The props endpoint is a standard used by llamacpp APIs which returns various properties of a model to a server. It's still recommended to use /v1/model to get all the parameters a TabbyAPI model has. Also include the contents of a prompt template when fetching the current model. Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-26 17:32:19 -05:00
kingbri	fa8035ef72	Dependencies: Update sse-starlette and formatron Also pin newer versions of dependencies and fix an import from sse-starlette Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-21 23:14:55 -05:00
kingbri	b579fd46b7	Dependencies: Remove outlines from optional check Outlines is no longer a dependency that's used in TabbyAPI. Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-18 11:56:40 -05:00
DocShotgun	4d11323c17	Tree: Format	2024-12-17 09:37:33 -08:00
DocShotgun	5da335eb3d	Model: Robust request length checking in generator * Ensure that length of positive/negative prompt + max_tokens does not exceed max_seq_len * Ensure that total required pages for CFG request does not exceed allocated cache_size	2024-12-17 09:34:43 -08:00
kingbri	c23e406f2d	Sampling: Add max_completion_tokens Conforms with OAI's updated spec Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-13 01:02:37 -05:00
kingbri	bc3c154c96	Dependencies: Pin tokenizers Use a version greater than 0.20.0 for newer model support. Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-13 00:58:25 -05:00
Brian	1ba33bf646	Merge pull request #252 from DocShotgun/main Switch grammar backend to Formatron	2024-12-13 00:55:20 -05:00
kingbri	f25ac4b833	Dependencies: Update ExllamaV2 v0.2.6 Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-13 00:47:29 -05:00
kingbri	8df8ba3ddb	Dependencies: Update ExllamaV2 v0.2.6 Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-11 21:58:25 -05:00
DocShotgun	7f899734c0	Grammar: Cache the engine vocabulary * Avoid rebuilding the KBNF engine vocabulary on every grammar-enabled request	2024-12-05 21:36:37 -08:00
kingbri	8ccd7a12a2	Merge branch 'main' into formatron	2024-12-05 23:01:22 -05:00
kingbri	ac85e34356	Depenedencies: Update Torch, FA2, and Exl2 Torch: 2.5, FA2 2.7.0.post2, Exl2 v0.2.5 Don't update torch for rocm as exl2 isn't built for rocm 6.2 Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-03 22:57:00 -05:00
kingbri	ca86ab5477	Dependencies: Remove CUDA 11.8 Most software has moved to CUDA 12 and cards that aren't supported by 11.8 don't use tabby anyways. Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-03 22:37:03 -05:00
kingbri	3c4211c963	Dependencies: Ensure updated kbnf Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-02 15:10:20 -05:00
Brian	fe44e4a524	Merge pull request #253 from randoentity/workaround-toolcall workaround for tool calling	2024-11-28 23:30:00 -05:00
kingbri	2e06fb01d3	OAI: Pass mm_embeddings to tool call generation Don't exclude the vision embeddings when regenerating for a tool call. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-28 23:27:59 -05:00
Brian	b81dcdaf66	Merge pull request #232 from AlpinDale/serviceinfo_uri feat: add serviceinfo URI	2024-11-28 23:19:52 -05:00
kingbri	5fadaa728a	API: Move serviceinfo to core Best to expose this endpoint to all APIs as its an information endpoint. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-28 23:07:58 -05:00
lucy	ab1f4b7a6a	add draft_gpu_split option	2024-11-27 02:52:19 +01:00
DocShotgun	6f2dc2ea99	Grammar: Fix syntax, lint	2024-11-24 11:35:45 -08:00
DocShotgun	8f209efb99	Grammar: Clean up KBNF implementation * Also remove empty cache clear function	2024-11-24 10:44:45 -08:00
randoentity	a52610fb19	workaround for tool calling	2024-11-24 13:40:33 +01:00

1 2 3 4 5 ...

880 commits