jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
TerminalMan	4b11cabbec	debloat docker build	2024-09-08 00:02:00 +01:00
kingbri	d34756dc98	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-05 18:05:59 -04:00
kingbri	2f45e978c5	API: Fix merge overwrite The completions utils did not take the new imports. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-05 18:04:53 -04:00
Brian Dashore	ec7f64d530	Merge pull request #185 from SecretiveShell/refactor-config-loading Refactor config loading	2024-09-05 18:00:32 -04:00
kingbri	1c9991f79e	Config: Format and organize Rename some methods and change comments. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-05 17:59:18 -04:00
Jake	cb91670c7a	fix command line args - move to a complet class singleton to avoid propagation errors - remove legacy load confing precedure	2024-09-05 15:33:00 +01:00
kingbri	98768bfa30	Docker: Re-add build block If a user wants to build from source, let them. But the default should fetch from the package registry. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-04 23:39:06 -04:00
kingbri	93872b34d7	Config: Migrate to global class instead of dicts The config categories can have defined separation, but preserve the dynamic nature of adding new config options by making all the internal class vars as dictionaries. This was necessary since storing global callbacks stored a state of the previous global_config var that wasn't populated. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-04 23:18:47 -04:00
Brian Dashore	3bc9bd09a0	Merge pull request #180 from SecretiveShell/main make docker-compose use prebuilt images	2024-09-04 21:48:18 -04:00
Brian Dashore	8524999284	Merge pull request #184 from SecretiveShell/Infinity-Embed-TODO Complete conditional infinity import TODO	2024-09-04 21:47:49 -04:00
Brian Dashore	03ff472149	Merge pull request #130 from bartowski1182/main WIP: Add 'model' argument to /v1/chat/completions to load a new model on the fly	2024-09-04 21:46:41 -04:00
kingbri	9c10789ca1	API: Error on invalid key permissions and cleanup format If a user requesting a model change isn't admin, error. Better to place the load function before the generate functions. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-04 21:44:14 -04:00
Jake	e772fa2981	Switch to internal dict merge implementation - remove deepmerge dependency - fix ruff formatting	2024-09-04 16:27:28 +01:00
Jake	42a42caf43	remove logging - remove logging statements - format code with ruff	2024-09-04 16:14:09 +01:00
Jake	ac4d9bba1c	refactor config functions - improve DRY	2024-09-04 12:49:22 +01:00
Jake	fa6404a95a	refactor config loading - improve DRY - alter logging - allow extensibility - add foundation for environment variables as config	2024-09-04 12:22:49 +01:00
kingbri	21f14d4318	API: Update inline load - Add a config flag - Migrate support to /v1/completions - Unify the load function Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-03 23:37:28 -04:00
kingbri	dd30d6592a	Merge branch 'main' of https://github.com/theroyallab/tabbyapi into inline	2024-09-03 18:03:17 -04:00
kingbri	8854269121	API: Fix current model list return Check if the container actually exists in the match before returning the value of the directory. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-01 10:54:01 -04:00
kingbri	4bf1a71d7b	Model: Fix model override application for draft args These have to be merged beforehand and the updated version needs to be re-fetched. It's possible to prevent the fetch of draft_args in the beginning of init. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-31 22:59:56 -04:00
kingbri	4aebe8a2a5	Config: Use an explicit "auto" value for rope_alpha Using "auto" for rope alpha removes ambiguity on how to explicitly enable automatic rope calculation. The same behavior of None -> auto calculate still exists, but can be overwritten if a model's tabby_config.yml includes `rope_alpha`. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-31 22:59:56 -04:00
kingbri	a96fa5f138	API: Don't fallback to default values on model load request It's best to pass them down the config stack. API/User config.yml -> model config.yml -> model config.json -> fallback. Doing this allows for seamless flow and yielding control to each member in the stack. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-31 22:59:56 -04:00
kingbri	4452d6f665	Model: Add support for overridable model config.yml Like config.json in a model folder, providing a tabby_config.yml will serve as a layer between user provided kwargs and the config.json values. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-31 22:59:56 -04:00
kingbri	dd55b99af5	Model: Store directory paths Storing a pathlib type makes it easier to manipulate the model directory path in the long run without constantly fetching it from the config. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-31 22:59:56 -04:00
kingbri	523709741c	Model: Reorder how configs are set up Initialize the Exllama classes first then add user-specific params. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-31 22:59:56 -04:00
TerminalMan	43104e0d19	Complete conditional infinity import TODO - add logging - change declaration order	2024-08-31 21:48:43 +01:00
kingbri	21712578cf	API: Add allowed_tokens support This is the opposite of banned tokens. Exllama specific implementation of #181. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-29 21:44:42 -04:00
kingbri	10d9419f90	Model: Add BOS token to prompt logs If add_bos_token is enabled, the BOS token gets appended to the logged prompt if logging is enabled. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-29 21:15:09 -04:00
TerminalMan	48d7674316	make docker-compose use prebuilt images - Docker compose uses the prebuilt images produced by the GitHub action added in `872eeed581`	2024-08-29 00:50:01 +01:00
kingbri	96fce34253	Dependencies: Update ExllamaV2 v0.2.0 Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-28 18:34:00 -04:00
kingbri	a00d972054	Server: Remove unused comments Leftovers from the new API server log system. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-27 21:45:51 -04:00
kingbri	4958c06813	Model: Remove and format comments The comment in __init__ was outdated and all the kwargs are the config options anyways. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-27 21:43:40 -04:00
TerminalMan	80198ca056	API: Add /v1/health endpoint (#178 ) * Add healthcheck - localhost only /healthcheck endpoint - cURL healthcheck in docker compose file * Update Healthcheck Response - change endpoint to /health - remove localhost restriction - add docstring * move healthcheck definition to top of the file - make the healthcheck show up first in the openAPI spec * Tree: Format	2024-08-27 21:37:41 -04:00
Amgad Hasan	872eeed581	Build and push docker image (#171 ) * Create docker-image.yml * Update docker-image.yml	2024-08-26 16:18:10 -04:00
Ben Gitter	045bc98333	Remove rouge print statements within chat_completion.py (#174 ) * rouge prompt print * remove print pt2 * Print Removal Final	2024-08-23 21:28:37 -04:00
turboderp	fe3253f3a9	Model: Account for tokenizer lazy init	2024-08-23 23:51:53 +02:00
turboderp	a676c4bf38	Model: Formatting	2024-08-23 11:15:30 +02:00
turboderp	a3733caeda	Model: Fix draft model cache initialization	2024-08-23 11:08:49 +02:00
kingbri	364032e39e	Config: Remove developement flag from tensor parallel Exists in stable ExllamaV2 version. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-22 14:15:19 -04:00
kingbri	565b0300d6	Dependencies: Update Exllamav2 v0.1.9 Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-22 14:15:19 -04:00
kingbri	078fbf1080	Model: Add quantized cache support for tensor parallel Newer versions of exl2 v1.9-dev have quantized cache implemented. Add those APIs. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-22 14:15:19 -04:00
kingbri	871c89063d	Model: Add Tensor Parallel support Use the tensor parallel loader when the flag is enabled. The new loader has its own autosplit implementation, so gpu_split_auto isn't valid here. Also make it easier to determine which cache type to use rather than multiple if/else statements. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-22 14:15:19 -04:00
kingbri	5002617eac	Model: Split cache creation into a common function Unifies the switch statement across both draft and model caches. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-22 14:15:19 -04:00
kingbri	ecaddec48a	Docker-compose: Add models to bind mounts At least one bind mount is required in the volumes YAML block otherwise the docker build fails. Models should be fine to default since it always exists. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-19 22:07:53 -04:00
Amgad Hasan	dae394050e	Improve docker deployment configuration (#163 )	2024-08-18 15:19:18 -04:00
kingbri	a51acb9db4	Templates: Switch to async jinja engine This prevents any possible blocking of the event loop due to template rendering. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-17 12:03:41 -04:00
kingbri	b4752c1e62	Templates: Revert to load metadata on runtime Metadata is generated via a template's module. This requires a single iteration through the template. If a template tries to access a passed variable that doesn't exist, it will error. Therefore, generate the metadata at runtime to prevent these errors from happening. To optimize further, cache the metadata after the first generation to prevent the expensive call of making a template module. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-17 11:44:42 -04:00
kingbri	617ac12150	Update README Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-17 00:35:42 -04:00
Ben Gitter	70b9fc95de	[WIP] OpenAI Tools Support/Function calling (#154 ) * returning stop str if exists from gen * added chat template for firefunctionv2 * pulling tool vars from template * adding parsing for tool inputs/outputs * passing tool data from endpoint to chat template, adding tool_start to the stop list * loosened typing on the response tool call, leaning more on the user supplying a quality schema if they want a particular format * non streaming generation prototype * cleaning template * Continued work with type, ingestion into template, and chat template for fire func * Correction - streaming toolcall comes back as delta obj not inside chatcomprespchoice per chat_completion_chunk.py inside OAI lib. * Ruff Formating * Moved stop string and tool updates out of prompt creation func Updated tool pydantic to match OAI Support for streaming Updated generate tool calls to use flag within chat_template and insert tool reminder * Llama 3.1 chat templates Updated fire func template * renamed llama3.1 to chatml_with_headers.. * update name of template * Support for calling a tool start token rather than the string. Simplified tool_params Warning when gen_settings are being overidden becuase user set temp to 0 Corrected schema and tools to correct types for function args. Str for some reason * draft groq tool use model template * changed headers to vars for readablity (but mostly because some models are weird about newlines after headers, so this is an easier way to change globally) * Clean up comments and code in chat comp * Post processed tool call to meet OAI spec rather than forcing model to write json in a string in the middle of the call. * changes example back to args as json rather than string of json * Standardize chat templates to each other * cleaning/rewording * stop elements can also be ints (tokens) * Cleaning/formatting * added special tokens for tools and tool_response as specified in description * Cleaning * removing aux templates - going to live in llm-promp-templates repo instead * Tree: Format Signed-off-by: kingbri <bdashore3@proton.me> * Chat Completions: Don't include internal tool variables in OpenAPI Use SkipJsonSchema to supress inclusion with the OpenAPI JSON. The location of these variables may need to be changed in the future. Signed-off-by: kingbri <bdashore3@proton.me> * Templates: Deserialize metadata on template load Since we're only looking for specific template variables that are static in the template, it makes more sense to render when the template is initialized. Signed-off-by: kingbri <bdashore3@proton.me> * Tools: Fix comments Adhere to the format style of comments in the rest of the project. Signed-off-by: kingbri <bdashore3@proton.me> --------- Co-authored-by: Ben Gitter <gitterbd@gmail.com> Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-17 00:16:25 -04:00
Bartowski	c75e911f07	Merge branch 'main' into main	2024-08-14 16:16:15 -04:00

1 2 3 4 5 ...

687 commits