jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
kingbri	b579fd46b7	Dependencies: Remove outlines from optional check Outlines is no longer a dependency that's used in TabbyAPI. Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-18 11:56:40 -05:00
DocShotgun	4d11323c17	Tree: Format	2024-12-17 09:37:33 -08:00
DocShotgun	5da335eb3d	Model: Robust request length checking in generator * Ensure that length of positive/negative prompt + max_tokens does not exceed max_seq_len * Ensure that total required pages for CFG request does not exceed allocated cache_size	2024-12-17 09:34:43 -08:00
kingbri	c23e406f2d	Sampling: Add max_completion_tokens Conforms with OAI's updated spec Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-13 01:02:37 -05:00
kingbri	bc3c154c96	Dependencies: Pin tokenizers Use a version greater than 0.20.0 for newer model support. Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-13 00:58:25 -05:00
Brian	1ba33bf646	Merge pull request #252 from DocShotgun/main Switch grammar backend to Formatron	2024-12-13 00:55:20 -05:00
kingbri	f25ac4b833	Dependencies: Update ExllamaV2 v0.2.6 Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-13 00:47:29 -05:00
kingbri	8df8ba3ddb	Dependencies: Update ExllamaV2 v0.2.6 Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-11 21:58:25 -05:00
DocShotgun	7f899734c0	Grammar: Cache the engine vocabulary * Avoid rebuilding the KBNF engine vocabulary on every grammar-enabled request	2024-12-05 21:36:37 -08:00
kingbri	8ccd7a12a2	Merge branch 'main' into formatron	2024-12-05 23:01:22 -05:00
kingbri	ac85e34356	Depenedencies: Update Torch, FA2, and Exl2 Torch: 2.5, FA2 2.7.0.post2, Exl2 v0.2.5 Don't update torch for rocm as exl2 isn't built for rocm 6.2 Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-03 22:57:00 -05:00
kingbri	ca86ab5477	Dependencies: Remove CUDA 11.8 Most software has moved to CUDA 12 and cards that aren't supported by 11.8 don't use tabby anyways. Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-03 22:37:03 -05:00
kingbri	3c4211c963	Dependencies: Ensure updated kbnf Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-02 15:10:20 -05:00
Brian	fe44e4a524	Merge pull request #253 from randoentity/workaround-toolcall workaround for tool calling	2024-11-28 23:30:00 -05:00
kingbri	2e06fb01d3	OAI: Pass mm_embeddings to tool call generation Don't exclude the vision embeddings when regenerating for a tool call. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-28 23:27:59 -05:00
Brian	b81dcdaf66	Merge pull request #232 from AlpinDale/serviceinfo_uri feat: add serviceinfo URI	2024-11-28 23:19:52 -05:00
kingbri	5fadaa728a	API: Move serviceinfo to core Best to expose this endpoint to all APIs as its an information endpoint. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-28 23:07:58 -05:00
lucy	ab1f4b7a6a	add draft_gpu_split option	2024-11-27 02:52:19 +01:00
DocShotgun	6f2dc2ea99	Grammar: Fix syntax, lint	2024-11-24 11:35:45 -08:00
DocShotgun	8f209efb99	Grammar: Clean up KBNF implementation * Also remove empty cache clear function	2024-11-24 10:44:45 -08:00
randoentity	a52610fb19	workaround for tool calling	2024-11-24 13:40:33 +01:00
DocShotgun	a9f39bcff3	Grammar: Preliminary Formatron KBNF support	2024-11-23 12:05:41 -08:00
DocShotgun	0836a9317f	Grammar: Initial Formatron regex and JSON schema implementation * Replace LMFE's regex and JSON schema filters with Formatron's * Remove Outlines EBNF filter in preparation for Formatron KBNF filter * TODO: Implement Formatron KBNF filter	2024-11-23 10:27:37 -08:00
kingbri	aa4ccd03d4	Infinity: Use a runtime type hint for engine Remove the antipattern of the conditional type for the Async engine and use string-based type inference. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-22 18:06:08 -05:00
kingbri	242ff4f892	Dependencies: Fix OpenAPI generation The vision module from the ExllamaV2 backend is used in files outside the backends contained folder. Therefore, import ExllamaV2 as an optional dependency here. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-22 17:59:20 -05:00
kingbri	9cd7fcaf99	Pyproject: Add pillow to deps Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-22 17:48:56 -05:00
Brian	9c8186c138	Merge pull request #249 from theroyallab/vision Vision	2024-11-22 17:45:49 -05:00
kingbri	388d36e6bd	OAI: Fix chat completion list parsing The strings weren't being concatenated properly. Only add the combined text if the chat completion type is a List. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-22 17:30:29 -05:00
kingbri	eadc71a4c3	Model: Add unload and error messages for vision If vision is enabled and the model doesn't support it, send an error asking the user to reload. Also, add a method to unload the vision tower. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-22 14:25:03 -05:00
kingbri	c49047eea1	Model: Fix load packets The model_type internal reference was changed to an enum for a more extendable loading process. Return the current model type when loading a new model. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-21 18:06:47 -05:00
kingbri	0ab393f09c	Model: Set vision load to False by default Mistake in unwrapping. Vision should be false to allow normal model loading when the flag isn't provided. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-21 17:54:42 -05:00
kingbri	902045edbb	API: Fix chat completion formatting flow Previously, the flow for parsing chat completion messages and rendering from the prompt template was disconnected between endpoints. Now, create a common function to render and handle everything appropriately afterwards. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-21 17:51:14 -05:00
kingbri	c652a6e030	API: Transform multimodal into an actual class Migrate the add method into the class itself. Also, a BaseModel isn't needed here since this isn't a serialized class. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-20 00:06:20 -05:00
kingbri	8ffc636dce	OAI: Strictly type chat completions Previously, the messages were a list of dicts. These are untyped and don't provide strict hinting. Add types for chat completion messages and reformat existing code. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-19 23:18:18 -05:00
kingbri	0fadb1e5e8	Merge branch 'main' into vision	2024-11-19 21:19:21 -05:00
DocShotgun	731a345cfc	OAI: Keep behavior consistent between chat completion and encode * When vision is not enabled, only the first text block is kept in message.content if it is a list	2024-11-19 12:40:00 -08:00
DocShotgun	27d9af50a8	API: Report whether vision is enabled	2024-11-19 12:29:25 -08:00
DocShotgun	5611365c07	OAI: Allow /v1/encode endpoint to handle vision requests * More robust checks for OAI chat completion message lists on /v1/encode endpoint * Added TODO to support other aspects of chat completions * Fix oversight where embeddings was not defined in advance on /v1/chat/completions endpoint	2024-11-19 11:14:37 -08:00
DocShotgun	c42655336b	Config: Add option to disable fetching content from URLs	2024-11-17 23:05:17 -08:00
Brian	a69f86098a	Merge pull request #243 from DocShotgun/chunk-size-fix Enforce chunk_size as multiple of 256	2024-11-18 00:40:36 -05:00
DocShotgun	dd41eec8a4	OAI: Initial vision support in OAI chat completions * Support image_url inputs containing URLs or base64 strings following OAI vision spec * Use async lru cache for image embeddings * Add generic wrapper class for multimodal embeddings	2024-11-17 21:23:09 -08:00
kingbri	bd9e78e19e	API: Add inline exception for dummy models If an API key sends a dummy model, it shouldn't error as the server is catering to clients that expect specific OAI model names. This is a problem with inline model loading since these names would error by default. Therefore, add an exception if the provided name is in the dummy model names (which also doubles as inline strict exceptions). However, the dummy model names weren't configurable, so add a new option to specify exception names, otherwise the default is gpt-3.5-turbo. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-17 21:15:45 -05:00
DocShotgun	5fa298e601	Vision: Define basic utils for ExLlamaV2 vision	2024-11-16 23:25:22 -08:00
kingbri	b94c646210	Embeddings: Add string input as an option Used in OAI's API Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-16 23:48:31 -05:00
kingbri	f9fffd42e0	OAI: Fix inline model loading errors when disabled The admin key check was running even if inline loading was disabled. Fix this bug, but also preserve the existing permission system when inline loading is enabled. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-16 23:28:44 -05:00
Brian	dfc889952a	Merge pull request #244 from DocShotgun/draft-flash-attn-fix Fix draft model non-FA2 fallback	2024-11-16 21:23:42 -05:00
DocShotgun	5bb46df3c3	Model: Fix draft model non-FA2 fallback	2024-11-15 21:04:25 -08:00
DocShotgun	37cc701137	Model: Enforce chunk_size as multiple of 256	2024-11-15 20:35:18 -08:00
kingbri	101ebd658a	Docker: Add extras to dockerfile Adds support for all features when pulling the image Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-15 18:16:48 -05:00
kingbri	69838e92ca	Dependencies: Update ExllamaV2 v0.2.4 Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-13 22:16:11 -05:00

... 3 4 5 6 7 ...

1051 commits