jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
kingbri	3f09fcd8c9	Model: Make model params return a model card The model card is a unified structure for sharing model params. Rather than kwargs, use this instead. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-21 23:15:46 -04:00
kingbri	3084ef9fa1	Model + API: Migrate to use BaseSamplerParams kwargs is pretty ugly when figuring out which arguments to use. The base requests falls back to defaults anyways, so pass in the params object as is. However, since Python's typing isn't like TypeScript where types can be transformed, the type hinting has a possiblity of None showing up despite there always being a value for some params. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-16 00:50:05 -04:00
Andrew Phillips	436ce752da	Support more common tool variables in templates (tools, message.tool_calls) (#308 ) * Add non-JSON version of `tools` and `functions` to `template_vars`. Increase the compatibility with VLLM templates which use a non-JSON tools object. * Add list of tool template variables to the documentation * Use Jinja templates to provide `tools_json` and `functions_json` This should be functionally equivelant, but the JSON won't be produced unless it's needed. * Make message.tool_calls match the JSON from ToolCallProcessor * Log something when generating tool calls * Add template for Qwen QwQ 32b * Only log if tool calls have been detected * API: Fix tool call variable assignments Jinja functions do not run when variables are called. Use json.dumps instead. In addition, log the request ID when stating that a tool call was fired. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com> * Add `ToolCallProcessor.dump()` to get the list of processed dicts * Remove qwen_qwq_32b.jinja This will be added to the following repository at a later date: https://github.com/theroyallab/llm-prompt-templates --------- Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com> Co-authored-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-23 13:23:00 -04:00
kingbri	79f9c6e854	Model: Remove num_experts_per_token This shouldn't even be an exposed option since changing it always breaks inference with the model. Let the model's config.json handle it. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-19 11:52:10 -04:00
Benjamin Oldenburg	a20abe2d33	Bugfix: Chat completion requests fail with UnboundLocalError: finish_reason variable not initialized (#307 ) * fix issue #306 * removed whitespaces for ruff	2025-03-15 20:31:21 -04:00
kingbri	d98c0bd3f6	API: Add tools class Was mistakenly not added in PR 302. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-14 15:07:11 -04:00
Benjamin Oldenburg	a2a14ea114	Fix Tool Call JSON Serialization Error (#302 ) * Fix Tool Call JSON Serialization Error * Incorporate changes from PR 292 kingbri note: Adjusts the tool JSON formation and incorporates finish reasons. Added both authors as co-authors due to edits on this commit from the original PR. Co-Authored-by: David Allada <dallada1@vt.edu> Co-Authored-by: Benjamin Oldenburg <benjamin.oldenburg@ordis.co.th> Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com> * API: Cleanup tool call JSON parsing Split pre and post-processing of tool calls to its own class. This cleans up the chat_completion utility module and also fixes the JSON serialization bug. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com> --------- Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com> Co-authored-by: David Allada <dallada1@vt.edu> Co-authored-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-14 15:01:33 -04:00
kingbri	35fe372f2b	Embeddings: Handle case if embedding input is passed as a string Infinity expects a list when embedding, so convert to a list if the input is a string. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-23 00:39:21 -05:00
kingbri	9f649647f0	Model + API: GPU split updates and fixes For the TP loader, GPU split cannot be an empty array. However, defaulting the parameter to an empty array makes it easier to calculate the device list. Therefore, cast an empty array to None using falsy comparisons at load time. Also add draft_gpu_split to the load request. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-15 21:50:14 -05:00
kingbri	e290b88568	Args: Expose api-servers to subcommands This is required for the export-openapi action. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-10 23:39:46 -05:00
kingbri	6da65a8fd3	Embeddings: Fix base64 return A base64 embedding can be a string post-encoding. Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2025-01-01 16:15:12 -05:00
kingbri	7878d351a7	Endpoints: Add props endpoint and add more values to model params The props endpoint is a standard used by llamacpp APIs which returns various properties of a model to a server. It's still recommended to use /v1/model to get all the parameters a TabbyAPI model has. Also include the contents of a prompt template when fetching the current model. Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-26 17:32:19 -05:00
kingbri	fa8035ef72	Dependencies: Update sse-starlette and formatron Also pin newer versions of dependencies and fix an import from sse-starlette Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-21 23:14:55 -05:00
Brian	fe44e4a524	Merge pull request #253 from randoentity/workaround-toolcall workaround for tool calling	2024-11-28 23:30:00 -05:00
kingbri	2e06fb01d3	OAI: Pass mm_embeddings to tool call generation Don't exclude the vision embeddings when regenerating for a tool call. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-28 23:27:59 -05:00
Brian	b81dcdaf66	Merge pull request #232 from AlpinDale/serviceinfo_uri feat: add serviceinfo URI	2024-11-28 23:19:52 -05:00
kingbri	5fadaa728a	API: Move serviceinfo to core Best to expose this endpoint to all APIs as its an information endpoint. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-28 23:07:58 -05:00
randoentity	a52610fb19	workaround for tool calling	2024-11-24 13:40:33 +01:00
kingbri	388d36e6bd	OAI: Fix chat completion list parsing The strings weren't being concatenated properly. Only add the combined text if the chat completion type is a List. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-22 17:30:29 -05:00
kingbri	902045edbb	API: Fix chat completion formatting flow Previously, the flow for parsing chat completion messages and rendering from the prompt template was disconnected between endpoints. Now, create a common function to render and handle everything appropriately afterwards. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-21 17:51:14 -05:00
kingbri	c652a6e030	API: Transform multimodal into an actual class Migrate the add method into the class itself. Also, a BaseModel isn't needed here since this isn't a serialized class. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-20 00:06:20 -05:00
kingbri	8ffc636dce	OAI: Strictly type chat completions Previously, the messages were a list of dicts. These are untyped and don't provide strict hinting. Add types for chat completion messages and reformat existing code. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-19 23:18:18 -05:00
kingbri	0fadb1e5e8	Merge branch 'main' into vision	2024-11-19 21:19:21 -05:00
DocShotgun	731a345cfc	OAI: Keep behavior consistent between chat completion and encode * When vision is not enabled, only the first text block is kept in message.content if it is a list	2024-11-19 12:40:00 -08:00
DocShotgun	27d9af50a8	API: Report whether vision is enabled	2024-11-19 12:29:25 -08:00
DocShotgun	5611365c07	OAI: Allow /v1/encode endpoint to handle vision requests * More robust checks for OAI chat completion message lists on /v1/encode endpoint * Added TODO to support other aspects of chat completions * Fix oversight where embeddings was not defined in advance on /v1/chat/completions endpoint	2024-11-19 11:14:37 -08:00
DocShotgun	dd41eec8a4	OAI: Initial vision support in OAI chat completions * Support image_url inputs containing URLs or base64 strings following OAI vision spec * Use async lru cache for image embeddings * Add generic wrapper class for multimodal embeddings	2024-11-17 21:23:09 -08:00
kingbri	bd9e78e19e	API: Add inline exception for dummy models If an API key sends a dummy model, it shouldn't error as the server is catering to clients that expect specific OAI model names. This is a problem with inline model loading since these names would error by default. Therefore, add an exception if the provided name is in the dummy model names (which also doubles as inline strict exceptions). However, the dummy model names weren't configurable, so add a new option to specify exception names, otherwise the default is gpt-3.5-turbo. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-17 21:15:45 -05:00
kingbri	b94c646210	Embeddings: Add string input as an option Used in OAI's API Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-16 23:48:31 -05:00
kingbri	f9fffd42e0	OAI: Fix inline model loading errors when disabled The admin key check was running even if inline loading was disabled. Fix this bug, but also preserve the existing permission system when inline loading is enabled. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-16 23:28:44 -05:00
kingbri	69ac0eb8aa	Model: Add vision loading support Adds the ability to load vision parts of text + image models. Requires an explicit flag in config because there isn't a way to automatically determine whether the vision tower should be used. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-11 12:10:11 -05:00
AlpinDale	c9ff8ef2c2	upgrade to v0.2	2024-11-04 13:28:04 +00:00
AlpinDale	1c9bc2d1af	feat: add serviceinfo URI	2024-11-04 12:35:08 +00:00
DocShotgun	603760cecb	Model: Remove override_base_seq_len	2024-10-30 10:03:08 +08:00
TerminalMan	7d18d2e2ca	Refactor the sampling class (#199 ) * improve validation * remove to_gen_params functions * update changes for all endpoint types * OAI: Fix calls to generation Chat completion and completion need to have prompt split out before pushing to the backend. Signed-off-by: kingbri <bdashore3@proton.me> * Sampling: Convert Top-K values of -1 to 0 Some OAI implementations use -1 as disabled instead of 0. Therefore, add a coalesce case. Signed-off-by: kingbri <bdashore3@proton.me> * Sampling: Format and space out Make the code more readable. Signed-off-by: kingbri <bdashore3@proton.me> * Sampling: Fix mirostat Field items are nested in data within a Pydantic FieldInfo Signed-off-by: kingbri <bdashore3@proton.me> * Sampling: Format Signed-off-by: kingbri <bdashore3@proton.me> * Sampling: Fix banned_tokens and allowed_tokens conversion If the provided string has whitespace, trim it before splitting. Signed-off-by: kingbri <bdashore3@proton.me> * Sampling: Add helpful log to dry_sequence_breakers Let the user know if the sequence errors out. Signed-off-by: kingbri <bdashore3@proton.me> * Sampling: Apply validators in right order Validators need to be applied in order from top to bottom, this is why the after validator was not being applied properly. Set the model to validate default params for sampler override purposes. This can be turned off if there are unclear errors. Signed-off-by: kingbri <bdashore3@proton.me> * Endpoints: Format Cleanup and semantically fix field validators Signed-off-by: kingbri <bdashore3@proton.me> * Kobold: Update validators and fix parameter application Validators on parent fields cannot see child fields. Therefore, validate using the child fields instead and alter the parent field data from there. Also fix badwordsids casting. Signed-off-by: kingbri <bdashore3@proton.me> * Sampling: Remove validate defaults and fix mirostat If a user sets an override to a non-default value, that's their own fault. Run validator on the actual mirostat_mode parameter rather than the alternate mirostat parameter. Signed-off-by: kingbri <bdashore3@proton.me> * Kobold: Rework badwordsids Currently, this serves to ban the EOS token. All other functionality was legacy, so remove it. Signed-off-by: kingbri <bdashore3@proton.me> * Model: Remove HuggingfaceConfig This was only necessary for badwordsids. All other fields are handled by exl2. Keep the class as a stub if it's needed again. Signed-off-by: kingbri <bdashore3@proton.me> * Kobold: Bump kcpp impersonation TabbyAPI supports XTC now. Signed-off-by: kingbri <bdashore3@proton.me> * Sampling: Change alias to validation_alias Reduces the probability for errors and makes the class consistent. Signed-off-by: kingbri <bdashore3@proton.me> * OAI: Use constraints for validation Instead of adding a model_validator, use greater than or equal to constraints provided by Pydantic. Signed-off-by: kingbri <bdashore3@proton.me> * Tree: Lint Signed-off-by: kingbri <bdashore3@proton.me> --------- Co-authored-by: SecretiveShell <84923604+SecretiveShell@users.noreply.github.com> Co-authored-by: kingbri <bdashore3@proton.me>	2024-10-27 11:43:41 -04:00
Brian Dashore	6e48bb420a	Model: Fix inline loading and draft key (#225 ) * Model: Fix inline loading and draft key There was a lack of foresight between the new config.yml and how it was structured. The "draft" key became "draft_model" without updating both the API request and inline loading keys. For the API requests, still support "draft" as legacy, but the "draft_model" key is preferred. Signed-off-by: kingbri <bdashore3@proton.me> * OAI: Add draft model dir to inline load Was not pushed before and caused errors of the kwargs being None. Signed-off-by: kingbri <bdashore3@proton.me> * Model: Fix draft args application Draft model args weren't applying since there was a reset due to how the old override behavior worked. Signed-off-by: kingbri <bdashore3@proton.me> * OAI: Change embedding model load params Use embedding_model_name to be inline with the config. Signed-off-by: kingbri <bdashore3@proton.me> * API: Fix parameter for draft model load Alias name to draft_model_name. Signed-off-by: kingbri <bdashore3@proton.me> * API: Fix parameter for template switch Add prompt_template_name to be more descriptive. Signed-off-by: kingbri <bdashore3@proton.me> * API: Fix parameter for model load Alias name to model_name for config parity. Signed-off-by: kingbri <bdashore3@proton.me> * API: Add alias documentation Signed-off-by: kingbri <bdashore3@proton.me> --------- Signed-off-by: kingbri <bdashore3@proton.me>	2024-10-24 23:35:05 -04:00
kingbri	126a44483c	Tree: Remove fasttensors Now a noop in upstream. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-30 00:18:47 -04:00
kingbri	fb903ecddf	OAI: Relax role requirement for chat completion message lists Make it so any message role can be parsed from a list. Not really sure why this is the case because system and assistant shouldn't be sending data other than text, but it also doesn't make much sense to be extremely strict with roles either. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-22 22:42:06 -04:00
TerminalMan	2cda890deb	Add health check monitoring for EXL2 errors (#206 ) * Add health check monitoring for EXL2 errors * Health: Format and change status code A status code of 503 makes more sense to use. ---------	2024-09-22 21:40:36 -04:00
TerminalMan	3aeddc5255	fix issues with optional dependencies (#204 ) * fix issues with optional dependencies * format document * Tree: Format and comment	2024-09-19 22:24:55 -04:00
TerminalMan	bb4dd7200e	fix defaults for api_servers	2024-09-17 15:41:32 +01:00
kingbri	daa57ceada	API: Upgrade config declarations Some were using the old unwrap methods. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-17 00:42:39 -04:00
kingbri	26ad0ef744	API: Fix model info reporting A deprecated preferences global var was being referenced. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-16 22:42:59 -04:00
TerminalMan	e11d80b285	fix missing rename	2024-09-12 23:32:41 +01:00
TerminalMan	c6f9806ec6	remove unused imports	2024-09-11 18:00:29 +01:00
TerminalMan	e8fcecd56a	Merge remote-tracking branch 'upstream/main' into HEAD	2024-09-11 15:57:18 +01:00
kingbri	e00eb09ef3	OAI: Add cancellation with inline load When the request is cancelled, cancel the load task. In addition, when checking if a model container exists, also check if the model is fully loaded. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-11 00:08:55 -04:00
kingbri	b9e5693c1b	API + Model: Apply config.yml defaults for all load paths There are two ways to load a model: 1. Via the load endpoint 2. Inline with a completion The defaults were not applying on the inline load, so rewrite to fix that. However, while doing this, set up a defaults dictionary rather than comparing it at runtime and remove the pydantic default lambda on all the model load fields. This makes the code cleaner and establishes a clear config tree for loading models. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-10 23:35:35 -04:00
kingbri	2c3bc71afa	Tree: Switch to asynchronous file handling Using aiofiles, there's no longer a possiblity of blocking file operations that can hang up the event loop. In addition, partially migrate classes to use asynchronous init instead of the normal python magic method. The only exception is config, since that's handled in the synchonous init before the event loop starts. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-10 16:45:14 -04:00
kingbri	54bfb770af	API: Fix template switch endpoint Forwards a Path instead of a string and adheres to the new pathfinding system. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-10 12:22:07 -04:00

1 2 3 4

175 commits