jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
TerminalMan	dc4946b565	make pydantic do all the validation	2024-09-13 10:21:27 +01:00
kingbri	d5b3fde319	Config: Fix descriptions Appending lines also requires a space between each one otherwise they'll squish together. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-12 22:43:30 -04:00
kingbri	21747bf9e4	Args: Switch to use model_field for everything Pydantic provides these helpers. Better to use these instead of the inspect lib. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-12 22:18:20 -04:00
TerminalMan	6e935c565e	remove private attributes in args	2024-09-13 00:37:17 +01:00
TerminalMan	eb5f42c845	add error message for invalid use_as_default	2024-09-12 23:48:24 +01:00
TerminalMan	8b48f00271	fix model names	2024-09-12 17:00:07 +01:00
TerminalMan	05f1c3e293	fix line lengths	2024-09-11 21:43:30 +01:00
TerminalMan	c6f9806ec6	remove unused imports	2024-09-11 18:00:29 +01:00
TerminalMan	0d7459191c	fix arg parser for dict types	2024-09-11 16:13:31 +01:00
TerminalMan	e8fcecd56a	Merge remote-tracking branch 'upstream/main' into HEAD	2024-09-11 15:57:18 +01:00
kingbri	b9e5693c1b	API + Model: Apply config.yml defaults for all load paths There are two ways to load a model: 1. Via the load endpoint 2. Inline with a completion The defaults were not applying on the inline load, so rewrite to fix that. However, while doing this, set up a defaults dictionary rather than comparing it at runtime and remove the pydantic default lambda on all the model load fields. This makes the code cleaner and establishes a clear config tree for loading models. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-10 23:35:35 -04:00
kingbri	7baef05b49	Transformers Utils: Fix file read Use asynchronous JSON reading Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-10 22:41:39 -04:00
kingbri	62beb2b1c8	Config: Fetch the correct dict for draft_model and lora Fixed fetching from the merged config instead of the sub-config Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-10 21:30:53 -04:00
kingbri	5e8ff9a004	Tree: Fix classmethod usage Instead of self, use cls which passes a type of the class. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-10 20:52:29 -04:00
kingbri	2c3bc71afa	Tree: Switch to asynchronous file handling Using aiofiles, there's no longer a possiblity of blocking file operations that can hang up the event loop. In addition, partially migrate classes to use asynchronous init instead of the normal python magic method. The only exception is config, since that's handled in the synchonous init before the event loop starts. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-10 16:45:14 -04:00
Ati Sharma	a370aeb15f	Fix tabby_config.py _from_file Update tabby_config.py to fix issue #196	2024-09-09 09:19:12 +01:00
kingbri	df11890851	Templating: Add loopcontrols extension Inbuilt jinja extension to allow for break and continue in loops. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-08 12:21:42 -04:00
kingbri	dffceab777	Sampling: Link dry_range Was not linked in the gen params dict. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-08 01:55:52 -04:00
kingbri	acd3eb1140	Model: Add model folder template support Like tabby_config.yml in the model's folder, a custom template can also be provided via tabby_template.yml in addition to the existing templates folder. The config.yml always takes priority. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-07 22:20:38 -04:00
kingbri	9c4a0e650f	Sampling: Fix override for DRY sequence breakers The common type should be an array of strings. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-07 21:38:50 -04:00
kingbri	4f5ca7a4c7	Sampling: Update overrides and params Re-order to make more sense. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-07 12:48:59 -04:00
kingbri	ae37f3f332	Sampling: Update DRY Switch to new parameters and remove dry_max_ngram as that's not supposed to be changed. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-07 12:39:14 -04:00
kingbri	05c3f1194f	Sampling: Add rudimentary DRY support Adds DRY support based on the current exl2 dev API. Only change for optimization is dry_max_ngram instead of using a closed range. Currently, DRY range is aliased to dry_max_ngram. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-07 00:48:42 -04:00
TerminalMan	420fd84f6b	add env var loading automation - load config from env vars (eg. TABBY_NETWORK_HOST) - remove print statements - improve command line args automation	2024-09-06 15:05:48 +01:00
TerminalMan	8e9344642e	patch pydantic config into old config - convert pydantic to dict to avoid errors with current files - fix formatting	2024-09-06 14:31:28 +01:00
Jake	36e991c16e	automate arg parse - generate arg parser dynamically - remove legavy parser code	2024-09-06 00:27:53 +01:00
kingbri	1c9991f79e	Config: Format and organize Rename some methods and change comments. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-05 17:59:18 -04:00
Jake	362b8d5818	config is now backed by pydantic (WIP) - add models for config options - add function to regenerate config.yml - replace references to config with pydantic compatible references - remove unnecessary unwrap() statements TODO: - auto generate env vars - auto generate argparse - test loading a model	2024-09-05 18:04:56 +01:00
Jake	cb91670c7a	fix command line args - move to a complet class singleton to avoid propagation errors - remove legacy load confing precedure	2024-09-05 15:33:00 +01:00
kingbri	93872b34d7	Config: Migrate to global class instead of dicts The config categories can have defined separation, but preserve the dynamic nature of adding new config options by making all the internal class vars as dictionaries. This was necessary since storing global callbacks stored a state of the previous global_config var that wasn't populated. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-04 23:18:47 -04:00
Jake	e772fa2981	Switch to internal dict merge implementation - remove deepmerge dependency - fix ruff formatting	2024-09-04 16:27:28 +01:00
Jake	ac4d9bba1c	refactor config functions - improve DRY	2024-09-04 12:49:22 +01:00
Jake	fa6404a95a	refactor config loading - improve DRY - alter logging - allow extensibility - add foundation for environment variables as config	2024-09-04 12:22:49 +01:00
kingbri	4aebe8a2a5	Config: Use an explicit "auto" value for rope_alpha Using "auto" for rope alpha removes ambiguity on how to explicitly enable automatic rope calculation. The same behavior of None -> auto calculate still exists, but can be overwritten if a model's tabby_config.yml includes `rope_alpha`. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-31 22:59:56 -04:00
kingbri	a96fa5f138	API: Don't fallback to default values on model load request It's best to pass them down the config stack. API/User config.yml -> model config.yml -> model config.json -> fallback. Doing this allows for seamless flow and yielding control to each member in the stack. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-31 22:59:56 -04:00
kingbri	dd55b99af5	Model: Store directory paths Storing a pathlib type makes it easier to manipulate the model directory path in the long run without constantly fetching it from the config. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-31 22:59:56 -04:00
kingbri	21712578cf	API: Add allowed_tokens support This is the opposite of banned tokens. Exllama specific implementation of #181. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-29 21:44:42 -04:00
kingbri	871c89063d	Model: Add Tensor Parallel support Use the tensor parallel loader when the flag is enabled. The new loader has its own autosplit implementation, so gpu_split_auto isn't valid here. Also make it easier to determine which cache type to use rather than multiple if/else statements. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-22 14:15:19 -04:00
kingbri	a51acb9db4	Templates: Switch to async jinja engine This prevents any possible blocking of the event loop due to template rendering. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-17 12:03:41 -04:00
kingbri	b4752c1e62	Templates: Revert to load metadata on runtime Metadata is generated via a template's module. This requires a single iteration through the template. If a template tries to access a passed variable that doesn't exist, it will error. Therefore, generate the metadata at runtime to prevent these errors from happening. To optimize further, cache the metadata after the first generation to prevent the expensive call of making a template module. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-17 11:44:42 -04:00
Ben Gitter	70b9fc95de	[WIP] OpenAI Tools Support/Function calling (#154 ) * returning stop str if exists from gen * added chat template for firefunctionv2 * pulling tool vars from template * adding parsing for tool inputs/outputs * passing tool data from endpoint to chat template, adding tool_start to the stop list * loosened typing on the response tool call, leaning more on the user supplying a quality schema if they want a particular format * non streaming generation prototype * cleaning template * Continued work with type, ingestion into template, and chat template for fire func * Correction - streaming toolcall comes back as delta obj not inside chatcomprespchoice per chat_completion_chunk.py inside OAI lib. * Ruff Formating * Moved stop string and tool updates out of prompt creation func Updated tool pydantic to match OAI Support for streaming Updated generate tool calls to use flag within chat_template and insert tool reminder * Llama 3.1 chat templates Updated fire func template * renamed llama3.1 to chatml_with_headers.. * update name of template * Support for calling a tool start token rather than the string. Simplified tool_params Warning when gen_settings are being overidden becuase user set temp to 0 Corrected schema and tools to correct types for function args. Str for some reason * draft groq tool use model template * changed headers to vars for readablity (but mostly because some models are weird about newlines after headers, so this is an easier way to change globally) * Clean up comments and code in chat comp * Post processed tool call to meet OAI spec rather than forcing model to write json in a string in the middle of the call. * changes example back to args as json rather than string of json * Standardize chat templates to each other * cleaning/rewording * stop elements can also be ints (tokens) * Cleaning/formatting * added special tokens for tools and tool_response as specified in description * Cleaning * removing aux templates - going to live in llm-promp-templates repo instead * Tree: Format Signed-off-by: kingbri <bdashore3@proton.me> * Chat Completions: Don't include internal tool variables in OpenAPI Use SkipJsonSchema to supress inclusion with the OpenAPI JSON. The location of these variables may need to be changed in the future. Signed-off-by: kingbri <bdashore3@proton.me> * Templates: Deserialize metadata on template load Since we're only looking for specific template variables that are static in the template, it makes more sense to render when the template is initialized. Signed-off-by: kingbri <bdashore3@proton.me> * Tools: Fix comments Adhere to the format style of comments in the rest of the project. Signed-off-by: kingbri <bdashore3@proton.me> --------- Co-authored-by: Ben Gitter <gitterbd@gmail.com> Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-17 00:16:25 -04:00
kingbri	685e3836e9	Args: Add api-servers to parser Also run OpenAPI export after args/config are parsed. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-08 16:32:29 -04:00
kingbri	b6d2676f1c	Start: Give the user a hint when a module can't be imported If an ImportError or ModuleNotFoundError is raised, tell the user to run the update scripts. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 21:59:06 -04:00
kingbri	2a33ebbf29	Model: Bypass lock checks when shutting down Previously, when a SIGINT was emitted and a model load is running, the API didn't shut down until the load finished due to waitng for the lock. However, when shutting down, the lock doesn't matter since the process is being killed anyway. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 16:05:34 -04:00
kingbri	7bf2b07d4c	Signals: Exit on async cleanup The async signal exit function should be the internal for exiting the program. In addition, prevent the handler from being called twice by adding a boolean. May become an asyncio event later on. In addition, make sure to skip_wait when running model.unload. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-02 15:11:57 -04:00
kingbri	3e42211c3e	Config: Embeddings: Make embeddings_device a default when API loading When loading from the API, the fallback for embeddings_device will be the same as the config. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-01 13:59:49 -04:00
kingbri	0bcb4e4a7d	Model: Attach request ID to logs If multiple logs come in at once, track which log corresponds to which request. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-01 00:25:54 -04:00
Brian Dashore	1bf062559d	Merge pull request #158 from AlpinDale/embeddings feat: add embeddings support via Infinity-emb	2024-07-31 20:33:12 -04:00
kingbri	dc3dcc9c0d	Embeddings: Update config, args, and parameter names Use embeddings_device as the parameter for device to remove ambiguity. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 15:32:26 -04:00
kingbri	bfa011e0ce	Embeddings: Add model management Embedding models are managed on a separate backend, but are run in parallel with the model itself. Therefore, manage this in a separate container with separate routes. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 15:19:27 -04:00

1 2 3 4 5 ...

275 commits