jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
kingbri	e00eb09ef3	OAI: Add cancellation with inline load When the request is cancelled, cancel the load task. In addition, when checking if a model container exists, also check if the model is fully loaded. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-11 00:08:55 -04:00
kingbri	b9e5693c1b	API + Model: Apply config.yml defaults for all load paths There are two ways to load a model: 1. Via the load endpoint 2. Inline with a completion The defaults were not applying on the inline load, so rewrite to fix that. However, while doing this, set up a defaults dictionary rather than comparing it at runtime and remove the pydantic default lambda on all the model load fields. This makes the code cleaner and establishes a clear config tree for loading models. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-10 23:35:35 -04:00
kingbri	2c3bc71afa	Tree: Switch to asynchronous file handling Using aiofiles, there's no longer a possiblity of blocking file operations that can hang up the event loop. In addition, partially migrate classes to use asynchronous init instead of the normal python magic method. The only exception is config, since that's handled in the synchonous init before the event loop starts. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-10 16:45:14 -04:00
kingbri	54bfb770af	API: Fix template switch endpoint Forwards a Path instead of a string and adheres to the new pathfinding system. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-10 12:22:07 -04:00
Cohee	63476041d1	Properly specify config value in the error message	2024-09-08 22:02:49 +03:00
Brian Dashore	0c74cd80ea	Merge pull request #191 from SecretiveShell/list-draft-models fix function arguments for get_model_list	2024-09-07 22:29:05 -04:00
kingbri	b576a2f116	API: Bump sent koboldcpp version Unlock DRY on lite UI. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-07 21:45:51 -04:00
TerminalMan	d57a3b459c	fix function arguments for get_model_list	2024-09-07 18:27:10 +01:00
kingbri	2f45e978c5	API: Fix merge overwrite The completions utils did not take the new imports. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-05 18:04:53 -04:00
Brian Dashore	ec7f64d530	Merge pull request #185 from SecretiveShell/refactor-config-loading Refactor config loading	2024-09-05 18:00:32 -04:00
kingbri	93872b34d7	Config: Migrate to global class instead of dicts The config categories can have defined separation, but preserve the dynamic nature of adding new config options by making all the internal class vars as dictionaries. This was necessary since storing global callbacks stored a state of the previous global_config var that wasn't populated. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-04 23:18:47 -04:00
kingbri	9c10789ca1	API: Error on invalid key permissions and cleanup format If a user requesting a model change isn't admin, error. Better to place the load function before the generate functions. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-04 21:44:14 -04:00
kingbri	21f14d4318	API: Update inline load - Add a config flag - Migrate support to /v1/completions - Unify the load function Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-03 23:37:28 -04:00
kingbri	dd30d6592a	Merge branch 'main' of https://github.com/theroyallab/tabbyapi into inline	2024-09-03 18:03:17 -04:00
kingbri	8854269121	API: Fix current model list return Check if the container actually exists in the match before returning the value of the directory. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-01 10:54:01 -04:00
kingbri	4aebe8a2a5	Config: Use an explicit "auto" value for rope_alpha Using "auto" for rope alpha removes ambiguity on how to explicitly enable automatic rope calculation. The same behavior of None -> auto calculate still exists, but can be overwritten if a model's tabby_config.yml includes `rope_alpha`. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-31 22:59:56 -04:00
kingbri	a96fa5f138	API: Don't fallback to default values on model load request It's best to pass them down the config stack. API/User config.yml -> model config.yml -> model config.json -> fallback. Doing this allows for seamless flow and yielding control to each member in the stack. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-31 22:59:56 -04:00
kingbri	dd55b99af5	Model: Store directory paths Storing a pathlib type makes it easier to manipulate the model directory path in the long run without constantly fetching it from the config. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-31 22:59:56 -04:00
kingbri	a00d972054	Server: Remove unused comments Leftovers from the new API server log system. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-27 21:45:51 -04:00
TerminalMan	80198ca056	API: Add /v1/health endpoint (#178 ) * Add healthcheck - localhost only /healthcheck endpoint - cURL healthcheck in docker compose file * Update Healthcheck Response - change endpoint to /health - remove localhost restriction - add docstring * move healthcheck definition to top of the file - make the healthcheck show up first in the openAPI spec * Tree: Format	2024-08-27 21:37:41 -04:00
Ben Gitter	045bc98333	Remove rouge print statements within chat_completion.py (#174 ) * rouge prompt print * remove print pt2 * Print Removal Final	2024-08-23 21:28:37 -04:00
kingbri	871c89063d	Model: Add Tensor Parallel support Use the tensor parallel loader when the flag is enabled. The new loader has its own autosplit implementation, so gpu_split_auto isn't valid here. Also make it easier to determine which cache type to use rather than multiple if/else statements. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-22 14:15:19 -04:00
kingbri	a51acb9db4	Templates: Switch to async jinja engine This prevents any possible blocking of the event loop due to template rendering. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-17 12:03:41 -04:00
kingbri	b4752c1e62	Templates: Revert to load metadata on runtime Metadata is generated via a template's module. This requires a single iteration through the template. If a template tries to access a passed variable that doesn't exist, it will error. Therefore, generate the metadata at runtime to prevent these errors from happening. To optimize further, cache the metadata after the first generation to prevent the expensive call of making a template module. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-17 11:44:42 -04:00
Ben Gitter	70b9fc95de	[WIP] OpenAI Tools Support/Function calling (#154 ) * returning stop str if exists from gen * added chat template for firefunctionv2 * pulling tool vars from template * adding parsing for tool inputs/outputs * passing tool data from endpoint to chat template, adding tool_start to the stop list * loosened typing on the response tool call, leaning more on the user supplying a quality schema if they want a particular format * non streaming generation prototype * cleaning template * Continued work with type, ingestion into template, and chat template for fire func * Correction - streaming toolcall comes back as delta obj not inside chatcomprespchoice per chat_completion_chunk.py inside OAI lib. * Ruff Formating * Moved stop string and tool updates out of prompt creation func Updated tool pydantic to match OAI Support for streaming Updated generate tool calls to use flag within chat_template and insert tool reminder * Llama 3.1 chat templates Updated fire func template * renamed llama3.1 to chatml_with_headers.. * update name of template * Support for calling a tool start token rather than the string. Simplified tool_params Warning when gen_settings are being overidden becuase user set temp to 0 Corrected schema and tools to correct types for function args. Str for some reason * draft groq tool use model template * changed headers to vars for readablity (but mostly because some models are weird about newlines after headers, so this is an easier way to change globally) * Clean up comments and code in chat comp * Post processed tool call to meet OAI spec rather than forcing model to write json in a string in the middle of the call. * changes example back to args as json rather than string of json * Standardize chat templates to each other * cleaning/rewording * stop elements can also be ints (tokens) * Cleaning/formatting * added special tokens for tools and tool_response as specified in description * Cleaning * removing aux templates - going to live in llm-promp-templates repo instead * Tree: Format Signed-off-by: kingbri <bdashore3@proton.me> * Chat Completions: Don't include internal tool variables in OpenAPI Use SkipJsonSchema to supress inclusion with the OpenAPI JSON. The location of these variables may need to be changed in the future. Signed-off-by: kingbri <bdashore3@proton.me> * Templates: Deserialize metadata on template load Since we're only looking for specific template variables that are static in the template, it makes more sense to render when the template is initialized. Signed-off-by: kingbri <bdashore3@proton.me> * Tools: Fix comments Adhere to the format style of comments in the rest of the project. Signed-off-by: kingbri <bdashore3@proton.me> --------- Co-authored-by: Ben Gitter <gitterbd@gmail.com> Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-17 00:16:25 -04:00
Bartowski	c75e911f07	Merge branch 'main' into main	2024-08-14 16:16:15 -04:00
kingbri	3e42211c3e	Config: Embeddings: Make embeddings_device a default when API loading When loading from the API, the fallback for embeddings_device will be the same as the config. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-01 13:59:49 -04:00
kingbri	54aeebaec1	API: Fix return of current embeddings model Return a ModelCard instead of a ModelList. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-01 13:43:31 -04:00
Brian Dashore	1bf062559d	Merge pull request #158 from AlpinDale/embeddings feat: add embeddings support via Infinity-emb	2024-07-31 20:33:12 -04:00
kingbri	dc3dcc9c0d	Embeddings: Update config, args, and parameter names Use embeddings_device as the parameter for device to remove ambiguity. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 15:32:26 -04:00
kingbri	bfa011e0ce	Embeddings: Add model management Embedding models are managed on a separate backend, but are run in parallel with the model itself. Therefore, manage this in a separate container with separate routes. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 15:19:27 -04:00
kingbri	f13d0fb8b3	Embeddings: Add model load checks Same as the normal model container. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 11:17:36 -04:00
kingbri	fbf1455db1	Embeddings: Migrate and organize Infinity Use Infinity as a separate backend and handle the model within the common module. This separates out the embeddings model from the endpoint which allows for model loading/unloading in core. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 11:00:23 -04:00
kingbri	ac1afcc588	Embeddings: Use response classes instead of dicts Follows the existing code style. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-29 14:15:40 -04:00
kingbri	3f21d9ef96	Embeddings: Switch to Infinity Infinity-emb is an async batching engine for embeddings. This is preferable to sentence-transformers since it handles scalable usecases without the need for external thread intervention. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-29 13:42:03 -04:00
kingbri	c9a5d2c363	OAI: Refactor embeddings Move files and rewrite routes to adhere to Tabby's code style. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-28 14:10:51 -04:00
kingbri	7b8b3fe23d	Kobold: Fix max length type Was mistakenly a string instead of an integer. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 23:00:26 -04:00
kingbri	e3226ed930	Kobold: Add untracked file Model types weren't added. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 22:57:55 -04:00
kingbri	3038f668e8	Kobold: Add extra routes for horde compatability Needed to connect to horde. Also do some reordering to clean the router file up. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 22:55:54 -04:00
kingbri	2773517a16	API: Add setup function to routers This helps prepare the router before exposing it to the parent app. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 22:24:33 -04:00
Brian Dashore	6365427d38	Merge pull request #155 from Vhallo/main Simple Typo Fix	2024-07-26 21:35:50 -04:00
kingbri	884b6f5ecd	API: Add log options for initialization Make each API log their respective URLs to help inform users. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 21:32:05 -04:00
kingbri	e8fc13a1f6	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 18:33:04 -04:00
kingbri	ea80b62e30	Sampling: Reorder aliased params and add kobold aliases Also add dynatemp range which is an alternative way of calculating min and max temp. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 18:32:33 -04:00
kingbri	7522b1447b	Model: Add support for HuggingFace config and bad_words_ids This is necessary for Kobold's API. Current models use bad_words_ids in generation_config.json, but for some reason, they're also present in the model's config.json. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 18:23:22 -04:00
kingbri	545e26608f	Kobold: Move params to aliases Some of the parameters the API provides are aliases for their OAI equivalents. It makes more sense to move them to the common file. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 16:46:54 -04:00
kingbri	b7cb6f0b91	API: Add KoboldAI server Used for interacting with applications that use KoboldAI's API such as horde. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 16:37:30 -04:00
AlpinDale	5adfab1cbd	ruff: formatting	2024-07-26 02:53:14 +00:00
AlpinDale	f20cd330ef	feat: add embeddings support via sentence-transformers	2024-07-26 02:45:07 +00:00
kingbri	5c082b7e8c	Async: Add option to use Uvloop/Winloop These are faster event loops for asyncio which should improve overall performance. Gate these under an experimental flag for now to stress test these loops. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-24 18:59:20 -04:00

1 2 3

128 commits