jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
kingbri	24ea85b3c5	Tree: Use safe loader for YAML Loaders that read use a safe type while loaders that write use both round-trip and safe options. Also don't create module-level parsers where they're not needed. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-18 19:26:51 -04:00
TerminalMan	6c7542de9f	migrate all yaml loaders to ruamel.yaml	2024-09-18 11:33:15 +01:00
kingbri	2a41910931	Model: Remove dev wheel setting checks Removes TP and DRY sampler checks since those are in stable. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-14 22:14:43 -04:00
turboderp	318c425d84	Bump exllamav2 to 0.2.2	2024-09-14 21:43:26 +02:00
turboderp	c66fe8e947	Grammar: Add custom ExLlamaV2TokenEnforcerFilter class	2024-09-14 21:42:53 +02:00
kingbri	2c3bc71afa	Tree: Switch to asynchronous file handling Using aiofiles, there's no longer a possiblity of blocking file operations that can hang up the event loop. In addition, partially migrate classes to use asynchronous init instead of the normal python magic method. The only exception is config, since that's handled in the synchonous init before the event loop starts. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-10 16:45:14 -04:00
kingbri	cf97113868	Dependencies: Update Exllamav2 v0.2.1 Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-08 21:12:31 -04:00
kingbri	acd3eb1140	Model: Add model folder template support Like tabby_config.yml in the model's folder, a custom template can also be provided via tabby_template.yml in addition to the existing templates folder. The config.yml always takes priority. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-07 22:20:38 -04:00
kingbri	4f5ca7a4c7	Sampling: Update overrides and params Re-order to make more sense. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-07 12:48:59 -04:00
kingbri	ae37f3f332	Sampling: Update DRY Switch to new parameters and remove dry_max_ngram as that's not supposed to be changed. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-07 12:39:14 -04:00
kingbri	05c3f1194f	Sampling: Add rudimentary DRY support Adds DRY support based on the current exl2 dev API. Only change for optimization is dry_max_ngram instead of using a closed range. Currently, DRY range is aliased to dry_max_ngram. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-07 00:48:42 -04:00
Brian Dashore	8524999284	Merge pull request #184 from SecretiveShell/Infinity-Embed-TODO Complete conditional infinity import TODO	2024-09-04 21:47:49 -04:00
Jake	42a42caf43	remove logging - remove logging statements - format code with ruff	2024-09-04 16:14:09 +01:00
kingbri	4bf1a71d7b	Model: Fix model override application for draft args These have to be merged beforehand and the updated version needs to be re-fetched. It's possible to prevent the fetch of draft_args in the beginning of init. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-31 22:59:56 -04:00
kingbri	4aebe8a2a5	Config: Use an explicit "auto" value for rope_alpha Using "auto" for rope alpha removes ambiguity on how to explicitly enable automatic rope calculation. The same behavior of None -> auto calculate still exists, but can be overwritten if a model's tabby_config.yml includes `rope_alpha`. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-31 22:59:56 -04:00
kingbri	a96fa5f138	API: Don't fallback to default values on model load request It's best to pass them down the config stack. API/User config.yml -> model config.yml -> model config.json -> fallback. Doing this allows for seamless flow and yielding control to each member in the stack. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-31 22:59:56 -04:00
kingbri	4452d6f665	Model: Add support for overridable model config.yml Like config.json in a model folder, providing a tabby_config.yml will serve as a layer between user provided kwargs and the config.json values. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-31 22:59:56 -04:00
kingbri	dd55b99af5	Model: Store directory paths Storing a pathlib type makes it easier to manipulate the model directory path in the long run without constantly fetching it from the config. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-31 22:59:56 -04:00
kingbri	523709741c	Model: Reorder how configs are set up Initialize the Exllama classes first then add user-specific params. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-31 22:59:56 -04:00
TerminalMan	43104e0d19	Complete conditional infinity import TODO - add logging - change declaration order	2024-08-31 21:48:43 +01:00
kingbri	21712578cf	API: Add allowed_tokens support This is the opposite of banned tokens. Exllama specific implementation of #181. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-29 21:44:42 -04:00
kingbri	10d9419f90	Model: Add BOS token to prompt logs If add_bos_token is enabled, the BOS token gets appended to the logged prompt if logging is enabled. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-29 21:15:09 -04:00
kingbri	4958c06813	Model: Remove and format comments The comment in __init__ was outdated and all the kwargs are the config options anyways. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-27 21:43:40 -04:00
turboderp	fe3253f3a9	Model: Account for tokenizer lazy init	2024-08-23 23:51:53 +02:00
turboderp	a676c4bf38	Model: Formatting	2024-08-23 11:15:30 +02:00
turboderp	a3733caeda	Model: Fix draft model cache initialization	2024-08-23 11:08:49 +02:00
kingbri	565b0300d6	Dependencies: Update Exllamav2 v0.1.9 Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-22 14:15:19 -04:00
kingbri	078fbf1080	Model: Add quantized cache support for tensor parallel Newer versions of exl2 v1.9-dev have quantized cache implemented. Add those APIs. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-22 14:15:19 -04:00
kingbri	871c89063d	Model: Add Tensor Parallel support Use the tensor parallel loader when the flag is enabled. The new loader has its own autosplit implementation, so gpu_split_auto isn't valid here. Also make it easier to determine which cache type to use rather than multiple if/else statements. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-22 14:15:19 -04:00
kingbri	5002617eac	Model: Split cache creation into a common function Unifies the switch statement across both draft and model caches. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-22 14:15:19 -04:00
Ben Gitter	70b9fc95de	[WIP] OpenAI Tools Support/Function calling (#154 ) * returning stop str if exists from gen * added chat template for firefunctionv2 * pulling tool vars from template * adding parsing for tool inputs/outputs * passing tool data from endpoint to chat template, adding tool_start to the stop list * loosened typing on the response tool call, leaning more on the user supplying a quality schema if they want a particular format * non streaming generation prototype * cleaning template * Continued work with type, ingestion into template, and chat template for fire func * Correction - streaming toolcall comes back as delta obj not inside chatcomprespchoice per chat_completion_chunk.py inside OAI lib. * Ruff Formating * Moved stop string and tool updates out of prompt creation func Updated tool pydantic to match OAI Support for streaming Updated generate tool calls to use flag within chat_template and insert tool reminder * Llama 3.1 chat templates Updated fire func template * renamed llama3.1 to chatml_with_headers.. * update name of template * Support for calling a tool start token rather than the string. Simplified tool_params Warning when gen_settings are being overidden becuase user set temp to 0 Corrected schema and tools to correct types for function args. Str for some reason * draft groq tool use model template * changed headers to vars for readablity (but mostly because some models are weird about newlines after headers, so this is an easier way to change globally) * Clean up comments and code in chat comp * Post processed tool call to meet OAI spec rather than forcing model to write json in a string in the middle of the call. * changes example back to args as json rather than string of json * Standardize chat templates to each other * cleaning/rewording * stop elements can also be ints (tokens) * Cleaning/formatting * added special tokens for tools and tool_response as specified in description * Cleaning * removing aux templates - going to live in llm-promp-templates repo instead * Tree: Format Signed-off-by: kingbri <bdashore3@proton.me> * Chat Completions: Don't include internal tool variables in OpenAPI Use SkipJsonSchema to supress inclusion with the OpenAPI JSON. The location of these variables may need to be changed in the future. Signed-off-by: kingbri <bdashore3@proton.me> * Templates: Deserialize metadata on template load Since we're only looking for specific template variables that are static in the template, it makes more sense to render when the template is initialized. Signed-off-by: kingbri <bdashore3@proton.me> * Tools: Fix comments Adhere to the format style of comments in the rest of the project. Signed-off-by: kingbri <bdashore3@proton.me> --------- Co-authored-by: Ben Gitter <gitterbd@gmail.com> Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-17 00:16:25 -04:00
kingbri	63650d2c3c	Model: Disable banned strings if grammar is used ExllamaV2 filters don't allow for rewinding which is what banned strings uses. Therefore, constrained generation via LMFE or outlines is not compatible for now. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-05 11:08:58 -04:00
kingbri	8ff2586d45	Start: Fix pip update, method calls, and logging platform.system() was not called in some places, breaking the ternary on Windows. Pip's --upgrade flag does not actually update dependencies to their latest versions. That's what the --upgrade-strategy eager flag is for. Tell the user where their start preferences are coming from. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-04 10:30:26 -04:00
kingbri	b6d2676f1c	Start: Give the user a hint when a module can't be imported If an ImportError or ModuleNotFoundError is raised, tell the user to run the update scripts. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 21:59:06 -04:00
kingbri	2a33ebbf29	Model: Bypass lock checks when shutting down Previously, when a SIGINT was emitted and a model load is running, the API didn't shut down until the load finished due to waitng for the lock. However, when shutting down, the lock doesn't matter since the process is being killed anyway. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 16:05:34 -04:00
kingbri	0bcb4e4a7d	Model: Attach request ID to logs If multiple logs come in at once, track which log corresponds to which request. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-01 00:25:54 -04:00
kingbri	9390d362dd	Model: Log generation params and metrics after the prompt/response A user's prompt and response can be large in the console. Therefore, always log the smaller payloads (ex. gen params + metrics) after the large chunks. However, it's recommended to keep prompt logging off anyways since it'll result in console spam. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-01 00:19:21 -04:00
Brian Dashore	1bf062559d	Merge pull request #158 from AlpinDale/embeddings feat: add embeddings support via Infinity-emb	2024-07-31 20:33:12 -04:00
kingbri	46304ce875	Model: Properly pass in max_batch_size from config The override wasn't being passed in before. Also, the default is now none since Exl2 can automatically calculate the max batch size. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 18:42:25 -04:00
kingbri	dc3dcc9c0d	Embeddings: Update config, args, and parameter names Use embeddings_device as the parameter for device to remove ambiguity. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 15:32:26 -04:00
kingbri	f13d0fb8b3	Embeddings: Add model load checks Same as the normal model container. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 11:17:36 -04:00
kingbri	01c7702859	Signal: Fix async signal handling Run unload async functions before exiting the program. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 11:11:05 -04:00
kingbri	fbf1455db1	Embeddings: Migrate and organize Infinity Use Infinity as a separate backend and handle the model within the common module. This separates out the embeddings model from the endpoint which allows for model loading/unloading in core. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 11:00:23 -04:00
kingbri	7522b1447b	Model: Add support for HuggingFace config and bad_words_ids This is necessary for Kobold's API. Current models use bad_words_ids in generation_config.json, but for some reason, they're also present in the model's config.json. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 18:23:22 -04:00
kingbri	b7cb6f0b91	API: Add KoboldAI server Used for interacting with applications that use KoboldAI's API such as horde. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 16:37:30 -04:00
kingbri	3e8ffebdd3	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-23 14:32:50 -04:00
kingbri	9ad69e8ab6	API: Migrate universal routes to core Place OAI specific routes in the appropriate folder. This is in preperation for adding new API servers that can be optionally enabled. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-23 14:08:48 -04:00
kingbri	191600a150	Revert "Model: Skip empty token chunks" This reverts commit `21516bd7b5`. This skips EOS and implementing it the proper way seems more costly than necessary. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-22 18:34:00 -04:00
kingbri	21516bd7b5	Model: Skip empty token chunks This helps make the generation loop more efficient by skipping past chunks that aren't providing any tokens anyways. The offset isn't affected. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-22 12:23:49 -04:00
kingbri	cae94b920c	API: Add ability to use request IDs Identify which request is being processed to help users disambiguate which logs correspond to which request. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-21 21:01:05 -04:00

1 2 3 4

184 commits