jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
AlpinDale	fa47f51f85	feat: workflows for formatting/linting (#35 ) * add github workflows for pylint and yapf * yapf * docstrings for auth * fix auth.py * fix generators.py * fix gen_logging.py * fix main.py * fix model.py * fix templating.py * fix utils.py * update formatting.sh to include subdirs for pylint * fix model_test.py * fix wheel_test.py * rename utils to utils_oai * fix OAI/utils_oai.py * fix completion.py * fix token.py * fix lora.py * fix common.py * add pylintrc and fix model.py * finish up pylint * fix attribute error * main.py formatting * add formatting batch script * Main: Remove unnecessary global Linter suggestion. Signed-off-by: kingbri <bdashore3@proton.me> * switch to ruff * Formatting + Linting: Add ruff.toml Signed-off-by: kingbri <bdashore3@proton.me> * Formatting + Linting: Switch scripts to use ruff Also remove the file and recent file change functions from both scripts. Signed-off-by: kingbri <bdashore3@proton.me> * Tree: Format and lint Signed-off-by: kingbri <bdashore3@proton.me> * Scripts + Workflows: Format Signed-off-by: kingbri <bdashore3@proton.me> * Tree: Remove pylint flags We use ruff now Signed-off-by: kingbri <bdashore3@proton.me> * Tree: Format Signed-off-by: kingbri <bdashore3@proton.me> * Formatting: Line length is 88 Use the same value as Black. Signed-off-by: kingbri <bdashore3@proton.me> * Tree: Format Update to new line length rules. Signed-off-by: kingbri <bdashore3@proton.me> --------- Authored-by: AlpinDale <52078762+AlpinDale@users.noreply.github.com> Co-authored-by: kingbri <bdashore3@proton.me>	2023-12-22 16:20:35 +00:00
kingbri	ab10b263fd	Model: Add override base seq len Some models (such as mistral and mixtral) set their base sequence length to 32k due to assumptions of support for sliding window attention. Therefore, add this parameter to override the base sequence length of a model which helps with auto-calculation of rope alpha. If auto-calculation of rope alpha isn't being used, the max_seq_len parameter works fine as is. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-20 00:45:39 -05:00
kingbri	ce2602df9a	Model: Fix max seq len handling Previously, the max sequence length was overriden by the user's config and never took the model's config.json into account. Now, set the default to 4096, but include config.prepare when selecting the max sequence length. The yaml and API request now serve as overrides rather than parameters. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-19 23:37:52 -05:00
kingbri	e895eaa4bd	OAI: Clarify types in docs Adding field descriptions show which parameters are used solely for OAI compliance and not actually parsed in the model code. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	51ca1ff396	Tree: Switch to Pydantic 2 Pydantic 2 has more modern methods and stability compared to Pydantic 1 Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	ad8807a830	Model: Add support for num_experts_by_token New parameter that's safe to edit in exllamav2 v0.0.11. Only recommended for people who know what they're doing. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-17 18:03:01 -05:00
kingbri	70fbee3edd	OAI: Fix model parameter placement Accidentally edited the Model Card parameters vs the model load request ones. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-17 14:36:28 -05:00
kingbri	1d0bdfa77c	Model + OAI: Fix parameter parsing Rope alpha changes don't require removing the 1.0 default from Rope scale. Keep defaults when possible to avoid errors. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-17 14:28:18 -05:00
Veden	3e57125025	OAI: adding optional draft model properties for draft_rope alpha and scale (#28 ) * OAI: adding optional draft model properties for draft_rope alpha and scale * Forgot to set the properties to None	2023-12-17 19:23:45 +00:00
kingbri	1a331afe3a	OAI: Add cache_mode parameter to model Mistakenly forgot that the user can choose what cache mode to use when loading a model. Also add when fetching model info. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-16 02:47:50 -05:00
kingbri	ed868fd262	OAI: Remove unused parameters Seed and low_mem aren't used, so comment them out. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-15 14:56:43 -05:00
kingbri	083df7d585	Tree: Add generation logging support Generations can be logged in the console along with sampling parameters if the user enables it in config. Metrics are always logged at the end of each prompt. In addition, the model endpoint tells the user if they're being logged or not for transparancy purposes. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-12 23:43:35 -05:00
kingbri	db87efde4a	OAI: Add ability to specify fastchat prompt template Sometimes fastchat may not be able to detect the prompt template from the model path. Therefore, add the ability to set it in config.yml or via the request object itself. Also send the provided prompt template on model info request. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-10 15:43:58 -05:00
kingbri	fd9f3eac87	Model: Add params to current model endpoint Grabs the current model rope params, max seq len, and the draft model if applicable. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-10 00:40:56 -05:00
kingbri	f8e9e22c43	API: Fix model load endpoint with draft Draft wasn't being parsed correctly with the new changes which removed the draft_enabled bool. There's still some more work to be done with returning exceptions. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-06 18:05:55 -05:00
kingbri	f47919b1d3	API: Add draft model support Models can be loaded with a child object called "draft" in the POST request. Again, models need to be located within the draft model dir to get loaded. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-19 00:32:25 -05:00
kingbri	126afdfdc2	Model: Fix gpu split params GPU split auto is a bool and GPU split is an array of integers for GBs to allocate per GPU. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-15 00:55:15 -05:00
kingbri	4670a77c26	API: Don't use response_class This arg in routes caused many errors and isn't even needed for responses. Signed-off-by: kingbri <bdashore3@proton.me>	2023-11-14 22:09:26 -05:00

18 commits