jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
kingbri	30f02e5453	Main: Remove uvloop/winloop from experimental status Uvloop/Winloop does provide advantages to asyncio vs the standard Proactor loop, so remove experimental status. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-10 21:30:48 -05:00
kingbri	0dcbb7a722	Dependencies: Update torch, exllamav2, and flash-attn Torch - 2.6.0 ExllamaV2 - 0.2.8 Flash-attn - 2.7.4.post1 Cuda wheels are now 12.4 instead of 12.1, feature names need to be migrated over. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-09 01:27:48 -05:00
kingbri	beb6d8faa5	Model: Adjust draft_gpu_split and add to config The previous code overrode the existing gpu split and device idx values. This now sets an independent draft_gpu_split value and adjusts the gpu_devices check only if the draft_gpu_split array is larger than the gpu_split array. Draft gpu split is not Tensor Parallel, and defaults to gpu_split_auto if a split is not provided. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-08 16:09:46 -05:00
kingbri	bd8256d168	Merge branch 'main' into draft-split	2025-02-08 15:10:44 -05:00
kingbri	dcbf2de9e5	Logger: Add timestamps Was against this for a while due to the length of timestamps clogging the console, but it makes sense to know when something goes wrong. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-07 18:40:28 -05:00
kingbri	54fda0dc09	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-07 18:03:33 -05:00
kingbri	96e8375ec8	Multimodal: Fix memory leak with MMEmbeddings On a basic python class, class attributes are handled by reference, meaning that every instance of embeddings would attach to that reference and allocate more memory. Switch to a Pydantic class and factory methods when instantiating. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-02 12:21:19 -05:00
kingbri	bd16681825	Start: Mark cuda 11.8 as unsupported Temporary until existing cuda 11.8 scripts can be migrated to cuda 12. Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2025-01-12 21:50:41 -05:00
Brian	566e5b5937	Merge pull request #271 from lifo9/bump-formatron Bump formatron to `0.4.11`	2025-01-07 23:19:35 -05:00
Jakub Filo	f8d9cfb5fd	Bump formatron to 0.4.11	2025-01-08 00:48:25 +01:00
kingbri	cfb439c0e6	Dependencies: Update exllamav2 and pytorch for ROCm Exllama v0.2.7, pytorch v2.5.1 across all cards. AMD now requires ROCm 6.2 Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2025-01-01 16:22:10 -05:00
kingbri	6da65a8fd3	Embeddings: Fix base64 return A base64 embedding can be a string post-encoding. Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2025-01-01 16:15:12 -05:00
kingbri	245bd5c008	Templates: Alter chatml_with_headers to fit huggingface spec The previous template was compatible with Jinja2 in Python, but it was not cross-platform compatible according to HF's standards. Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-30 14:00:44 -05:00
Brian	709493837b	Merge pull request #264 from DocShotgun/robust-length-checking Robust request length checking in generator	2024-12-26 23:37:53 -05:00
kingbri	b994aae995	Model: Cleanup generation length and page checks Reduce the amount of if statements and combine parts of code. Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-26 23:13:08 -05:00
kingbri	ba2579ff74	Merge branch 'main' into robust-length-checks	2024-12-26 18:00:26 -05:00
kingbri	7878d351a7	Endpoints: Add props endpoint and add more values to model params The props endpoint is a standard used by llamacpp APIs which returns various properties of a model to a server. It's still recommended to use /v1/model to get all the parameters a TabbyAPI model has. Also include the contents of a prompt template when fetching the current model. Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-26 17:32:19 -05:00
kingbri	fa8035ef72	Dependencies: Update sse-starlette and formatron Also pin newer versions of dependencies and fix an import from sse-starlette Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-21 23:14:55 -05:00
kingbri	b579fd46b7	Dependencies: Remove outlines from optional check Outlines is no longer a dependency that's used in TabbyAPI. Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-18 11:56:40 -05:00
DocShotgun	4d11323c17	Tree: Format	2024-12-17 09:37:33 -08:00
DocShotgun	5da335eb3d	Model: Robust request length checking in generator * Ensure that length of positive/negative prompt + max_tokens does not exceed max_seq_len * Ensure that total required pages for CFG request does not exceed allocated cache_size	2024-12-17 09:34:43 -08:00
kingbri	c23e406f2d	Sampling: Add max_completion_tokens Conforms with OAI's updated spec Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-13 01:02:37 -05:00
kingbri	bc3c154c96	Dependencies: Pin tokenizers Use a version greater than 0.20.0 for newer model support. Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-13 00:58:25 -05:00
Brian	1ba33bf646	Merge pull request #252 from DocShotgun/main Switch grammar backend to Formatron	2024-12-13 00:55:20 -05:00
kingbri	f25ac4b833	Dependencies: Update ExllamaV2 v0.2.6 Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-13 00:47:29 -05:00
kingbri	8df8ba3ddb	Dependencies: Update ExllamaV2 v0.2.6 Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-11 21:58:25 -05:00
DocShotgun	7f899734c0	Grammar: Cache the engine vocabulary * Avoid rebuilding the KBNF engine vocabulary on every grammar-enabled request	2024-12-05 21:36:37 -08:00
kingbri	8ccd7a12a2	Merge branch 'main' into formatron	2024-12-05 23:01:22 -05:00
kingbri	ac85e34356	Depenedencies: Update Torch, FA2, and Exl2 Torch: 2.5, FA2 2.7.0.post2, Exl2 v0.2.5 Don't update torch for rocm as exl2 isn't built for rocm 6.2 Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-03 22:57:00 -05:00
kingbri	ca86ab5477	Dependencies: Remove CUDA 11.8 Most software has moved to CUDA 12 and cards that aren't supported by 11.8 don't use tabby anyways. Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-03 22:37:03 -05:00
kingbri	3c4211c963	Dependencies: Ensure updated kbnf Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-02 15:10:20 -05:00
Brian	fe44e4a524	Merge pull request #253 from randoentity/workaround-toolcall workaround for tool calling	2024-11-28 23:30:00 -05:00
kingbri	2e06fb01d3	OAI: Pass mm_embeddings to tool call generation Don't exclude the vision embeddings when regenerating for a tool call. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-28 23:27:59 -05:00
Brian	b81dcdaf66	Merge pull request #232 from AlpinDale/serviceinfo_uri feat: add serviceinfo URI	2024-11-28 23:19:52 -05:00
kingbri	5fadaa728a	API: Move serviceinfo to core Best to expose this endpoint to all APIs as its an information endpoint. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-28 23:07:58 -05:00
lucy	ab1f4b7a6a	add draft_gpu_split option	2024-11-27 02:52:19 +01:00
DocShotgun	6f2dc2ea99	Grammar: Fix syntax, lint	2024-11-24 11:35:45 -08:00
DocShotgun	8f209efb99	Grammar: Clean up KBNF implementation * Also remove empty cache clear function	2024-11-24 10:44:45 -08:00
randoentity	a52610fb19	workaround for tool calling	2024-11-24 13:40:33 +01:00
DocShotgun	a9f39bcff3	Grammar: Preliminary Formatron KBNF support	2024-11-23 12:05:41 -08:00
DocShotgun	0836a9317f	Grammar: Initial Formatron regex and JSON schema implementation * Replace LMFE's regex and JSON schema filters with Formatron's * Remove Outlines EBNF filter in preparation for Formatron KBNF filter * TODO: Implement Formatron KBNF filter	2024-11-23 10:27:37 -08:00
kingbri	aa4ccd03d4	Infinity: Use a runtime type hint for engine Remove the antipattern of the conditional type for the Async engine and use string-based type inference. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-22 18:06:08 -05:00
kingbri	242ff4f892	Dependencies: Fix OpenAPI generation The vision module from the ExllamaV2 backend is used in files outside the backends contained folder. Therefore, import ExllamaV2 as an optional dependency here. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-22 17:59:20 -05:00
kingbri	9cd7fcaf99	Pyproject: Add pillow to deps Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-22 17:48:56 -05:00
Brian	9c8186c138	Merge pull request #249 from theroyallab/vision Vision	2024-11-22 17:45:49 -05:00
kingbri	388d36e6bd	OAI: Fix chat completion list parsing The strings weren't being concatenated properly. Only add the combined text if the chat completion type is a List. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-22 17:30:29 -05:00
kingbri	eadc71a4c3	Model: Add unload and error messages for vision If vision is enabled and the model doesn't support it, send an error asking the user to reload. Also, add a method to unload the vision tower. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-22 14:25:03 -05:00
kingbri	c49047eea1	Model: Fix load packets The model_type internal reference was changed to an enum for a more extendable loading process. Return the current model type when loading a new model. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-21 18:06:47 -05:00
kingbri	0ab393f09c	Model: Set vision load to False by default Mistake in unwrapping. Vision should be false to allow normal model loading when the flag isn't provided. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-21 17:54:42 -05:00
kingbri	902045edbb	API: Fix chat completion formatting flow Previously, the flow for parsing chat completion messages and rendering from the prompt template was disconnected between endpoints. Now, create a common function to render and handle everything appropriately afterwards. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-21 17:51:14 -05:00

... 3 4 5 6 7 ...

1069 commits