jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
kingbri	80ef379721	Sampling: Add top-a support Currently in exllamav2 dev, but will be in the next release. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-22 23:50:24 -05:00
AlpinDale	6a5bbd217c	feat: logging (#39 ) * add logging * simplify the logger * formatting * final touches * fix format * Model: Add log to metrics Signed-off-by: kingbri <bdashore3@proton.me> --------- Authored-by: AlpinDale <52078762+AlpinDale@users.noreply.github.com>	2023-12-23 04:33:31 +00:00
Brian Dashore	f5314fcdad	Merge pull request #37 from DocShotgun/main Colab: Expose new config arguments	2023-12-22 12:07:52 -05:00
kingbri	71f6a586f1	Templates: Add error handling for template errors Similar to the transformers library, add an error handler when an exception is fired. This relays the error to the user. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-22 11:59:47 -05:00
AlpinDale	fa47f51f85	feat: workflows for formatting/linting (#35 ) * add github workflows for pylint and yapf * yapf * docstrings for auth * fix auth.py * fix generators.py * fix gen_logging.py * fix main.py * fix model.py * fix templating.py * fix utils.py * update formatting.sh to include subdirs for pylint * fix model_test.py * fix wheel_test.py * rename utils to utils_oai * fix OAI/utils_oai.py * fix completion.py * fix token.py * fix lora.py * fix common.py * add pylintrc and fix model.py * finish up pylint * fix attribute error * main.py formatting * add formatting batch script * Main: Remove unnecessary global Linter suggestion. Signed-off-by: kingbri <bdashore3@proton.me> * switch to ruff * Formatting + Linting: Add ruff.toml Signed-off-by: kingbri <bdashore3@proton.me> * Formatting + Linting: Switch scripts to use ruff Also remove the file and recent file change functions from both scripts. Signed-off-by: kingbri <bdashore3@proton.me> * Tree: Format and lint Signed-off-by: kingbri <bdashore3@proton.me> * Scripts + Workflows: Format Signed-off-by: kingbri <bdashore3@proton.me> * Tree: Remove pylint flags We use ruff now Signed-off-by: kingbri <bdashore3@proton.me> * Tree: Format Signed-off-by: kingbri <bdashore3@proton.me> * Formatting: Line length is 88 Use the same value as Black. Signed-off-by: kingbri <bdashore3@proton.me> * Tree: Format Update to new line length rules. Signed-off-by: kingbri <bdashore3@proton.me> --------- Authored-by: AlpinDale <52078762+AlpinDale@users.noreply.github.com> Co-authored-by: kingbri <bdashore3@proton.me>	2023-12-22 16:20:35 +00:00
kingbri	a14abfe21c	Templates: Support bos_token and eos_token fields These are commonly seen in huggingface provided chat templates and aren't that difficult to add in. For feature parity, honor the add_bos_token and ban_eos_token parameters when constructing the prompt. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-22 10:33:11 -05:00
DocShotgun	7967607f12	Colab: Expose new config arguments	2023-12-22 01:53:13 -08:00
Brian Dashore	2bf8087de3	Merge pull request #36 from veden/dev	2023-12-22 00:34:19 -05:00
Veden	91e6823b24	fixed method invocation in get_template_from_model_json	2023-12-21 21:25:59 -08:00
kingbri	8fa764bfbe	Auth: Add option to disable authentication This creates a massive security hole, but it's gated behind a flag for users who only use localhost. A warning will pop up when users disable authentication. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-21 23:40:16 -05:00
kingbri	99a798e117	API: Add auth enforcement to draft list This didn't have an API key gate. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-21 23:14:04 -05:00
kingbri	5d80a049ae	Templates: Switch to common function for JSON loading Fix redundancy in code when loading templates. However, loading a template from config.json may be a mistake since tokenizer_config.json is the main place where chat templates are stored. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-21 23:08:51 -05:00
kingbri	72e19dbc12	Config: Change default dirs in sample Models and draft models default to the models directory while loras default to the loras directory. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-21 22:35:03 -05:00
Brian Dashore	87a9dfc8c4	Merge pull request #34 from veden/dev Templates: Added automatic detection of chat templates from tokenizer_config.json	2023-12-21 22:34:53 -05:00
kingbri	1a8afcb6ad	Generator: Fix semaphore scheduling Non-streaming tasks were not regulated by the semaphore, causing these tasks to interfere with streaming generations. Add helper functions to take in both sync and async functions for callbacks and sequential blocking with the semaphore. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-21 21:39:45 -05:00
Aaron Veden	f53c98db94	Templates: Added automatic detection of chat templates from tokenizer_config.json	2023-12-20 22:45:55 -08:00
kingbri	bee758dae9	Config: Clarify rope parameters Blank = automatic calculation of alpha value. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-20 21:15:06 -05:00
kingbri	5728b9fffb	Model: Don't error out if a generation is empty When stream is false, the generation can be empty, which means that there's no chunks present in the final generation array, causing an error. Instead, return a dummy value if generation is falsy (empty array or None) Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-20 00:51:33 -05:00
kingbri	ab10b263fd	Model: Add override base seq len Some models (such as mistral and mixtral) set their base sequence length to 32k due to assumptions of support for sliding window attention. Therefore, add this parameter to override the base sequence length of a model which helps with auto-calculation of rope alpha. If auto-calculation of rope alpha isn't being used, the max_seq_len parameter works fine as is. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-20 00:45:39 -05:00
Brian Dashore	5368ed7b64	Merge pull request #31 from veryamazinglystupid/main cuda -> 12, pydantic error fixed.	2023-12-20 00:04:51 -05:00
kingbri	5fbb37405f	Colab: Remove the pydantic hotfix Requirements.txt is now pinned to install pydantic >= 2.0.0 Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-20 00:01:58 -05:00
kingbri	c9e43e51aa	API: Add route for draft model list Does the same thing as model list except with draft models. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-19 23:45:53 -05:00
kingbri	ce2602df9a	Model: Fix max seq len handling Previously, the max sequence length was overriden by the user's config and never took the model's config.json into account. Now, set the default to 4096, but include config.prepare when selecting the max sequence length. The yaml and API request now serve as overrides rather than parameters. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-19 23:37:52 -05:00
kingbri	d3246747c0	Templates: Attempt loading from model config Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-19 22:58:47 -05:00
kingbri	da69ad8cd3	Requirements: Pin versions for some dependencies Pydantic and Jinja2 need pinned versions. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-19 21:48:04 -05:00
kingbri	1fd38c61de	API: Remove model check dependency for lora list This isn't needed for listing stuff. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-19 21:35:29 -05:00
veryamazinglystupid	12bf7a0174	fix the colab, pydantic error :3	2023-12-19 19:46:57 +05:30
kingbri	0a144688c6	Templates: Add clarity statements Lets the user know if a file not found (OSError) occurs and prints the applied template on model load. Also fix some remaining references to fastchat. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-19 08:13:04 -05:00
kingbri	0d76ed9b8b	Revert "Start: Add an argument parser to batch file" This reverts commit `097c298c39`.	2023-12-19 00:01:27 -05:00
kingbri	45e2987622	Start: Fix batch file condition Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:57:30 -05:00
kingbri	097c298c39	Start: Add an argument parser to batch file Used for future arguments. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	c3f7898967	OAI: Add logit bias support Use exllamav2's token bias which is the functional equivalent of OAI's logit bias parameter. Strings are casted to integers on request and errors if an invalid value is passed. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	46f6dc824e	Scripts: Add requirements update to start script Also add an argument to skip the requirements if needed. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	1f2cc8a47b	Templates: Update folder Move README to the separate templates repo. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	bc21f0bbc0	OAI: Add field aliasing Repetition penalty range needs field aliases to support multiple parameter calls. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	124e39df26	Remove fschat from Dockerfile Fastchat is removed from all dependencies Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	de9a19b5d3	Templating: Add generation prompt appending Append generation prompts if given the flag on an OAI chat completion request. This appends the "assistant" message to the instruct prompt. Defaults to true since this is intended behavior. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	041070fd6e	Update gitignore Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	417cb958fa	Auth: Only regenerate auth on OSError OSError means that a file wasn't found, which means auth tokens should be rengenerated. Otherwise, fire the error and exit. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	a87e474660	OAI: Fix chat completion validation Validation wasn't properly run on older pydantic, so ChatCompletionRespChoice was being sent instead of a ChatCompletionMessage when streaming responses. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	7cbc08fc72	Templates: Add auto-detection from path This replicates FastChat's model path detection. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	e895eaa4bd	OAI: Clarify types in docs Adding field descriptions show which parameters are used solely for OAI compliance and not actually parsed in the model code. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	51ca1ff396	Tree: Switch to Pydantic 2 Pydantic 2 has more modern methods and stability compared to Pydantic 1 Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	f631dd6ff7	Templates: Switch to Jinja2 Jinja2 is a lightweight template parser that's used in Transformers for parsing chat completions. It's much more efficient than Fastchat and can be imported as part of requirements. Also allows for unblocking Pydantic's version. Users now have to provide their own template if needed. A separate repo may be usable for common prompt template storage. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-18 23:53:47 -05:00
kingbri	95fd0f075e	Model: Fix no flash attention Was being called wrong from config. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-17 23:31:58 -05:00
kingbri	ad8807a830	Model: Add support for num_experts_by_token New parameter that's safe to edit in exllamav2 v0.0.11. Only recommended for people who know what they're doing. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-17 18:03:01 -05:00
kingbri	70fbee3edd	OAI: Fix model parameter placement Accidentally edited the Model Card parameters vs the model load request ones. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-17 14:36:28 -05:00
kingbri	1d0bdfa77c	Model + OAI: Fix parameter parsing Rope alpha changes don't require removing the 1.0 default from Rope scale. Keep defaults when possible to avoid errors. Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-17 14:28:18 -05:00
Veden	3e57125025	OAI: adding optional draft model properties for draft_rope alpha and scale (#28 ) * OAI: adding optional draft model properties for draft_rope alpha and scale * Forgot to set the properties to None	2023-12-17 19:23:45 +00:00
kingbri	528d58f841	Requirements: Fix AMD Signed-off-by: kingbri <bdashore3@proton.me>	2023-12-17 00:45:43 -05:00

... 8 9 10 11 12 ...

641 commits