jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
DocShotgun	abe411c6fb	API + Model: Add support for regex pattern constraints Adds the ability to constrain generation via regex pattern using lm-format-enforcer.	2024-05-12 19:10:43 -07:00
Ycros	57525219d0	Fix: Properly handle banned_strings and decode_special tokens (#104 ) * Fix: Actually pass banned_strings to the generation call. * decode_special_tokens was missing as well. * syntax	2024-05-12 20:47:45 +00:00
Brian Dashore	611f00818b	Merge pull request #103 from DocShotgun/main Minor fixes for sampler override	2024-05-12 16:47:12 -04:00
DocShotgun	dad34237ba	Samplers: Add example override for generate_window	2024-05-12 00:39:01 -07:00
DocShotgun	9463ecfa40	Samplers: Minor fixes for sampler override * Add missing settings to sample_preset.yml * Fix override for skip_special_tokens	2024-05-12 00:31:31 -07:00
kingbri	c8ec742be9	Samplers: Expose skew sampling Skew is an extra unused sampler in ExllamaV2. Add it in for coverage. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-12 01:41:01 -04:00
kingbri	6f4012d20d	API: Add preset listing for sampler overrides Querying the overrides list endpoint now returns the selected preset and a list of presets to use. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-12 01:34:51 -04:00
kingbri	b4bc941cbe	Tree: Lint Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-11 22:42:39 -04:00
kingbri	2da3fb2caf	Start: Bump ROCm error version ROCm support is for 6.0 now. Update that. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-11 21:57:51 -04:00
kingbri	7bebc085ec	Model: Remove legacy checks v0.0.21 has these features implemented. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-11 19:26:23 -04:00
kingbri	cd78728a77	Dependencies: Update ExllamaV2 v0.0.21 Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-11 19:26:03 -04:00
Brian Dashore	5432f523cb	Merge pull request #102 from DocShotgun/main Add support for min_tokens and banned_strings	2024-05-10 21:21:57 -04:00
kingbri	366d57cf45	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-10 21:20:41 -04:00
kingbri	7eee936a3f	Model: Remove old code and fix API handling skip_special_tokens is in stable exl2. Also default the parameters if they are not present in the function signature. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-10 21:20:00 -04:00
DocShotgun	c0b631ba92	API: Add banned_strings From exllamav2: List of strings that the generator will refuse to output. As soon as a partial match happens, a checkpoint is saved that the generator can rewind to if need be. Subsequent tokens are then held until the full string is resolved (match or no match) and either emitted or discarded, accordingly.	2024-05-10 13:53:55 -07:00
DocShotgun	a1df22668b	API: Add min_tokens Bans the EOS token until the generation reaches a minimum length. This will not prevent the model from otherwise ending the generation early by outputting other stop conditions.	2024-05-10 12:30:17 -07:00
Brian Dashore	643b53e347	Create FUNDING.yml Add ko-fi link. Signed-off-by: kingbri <bdashore3@gmail.com>	2024-05-09 19:00:41 +00:00
Brian Dashore	c4f7af160e	Merge pull request #101 from Bakharovsky/fix_exllamav2_cuda_version Fix: the link to the exllamav2 build for cuda 11.8	2024-05-08 16:32:22 -04:00
Arseniy Bakharovsky	33c86be45c	Update pyproject.toml	2024-05-08 03:31:15 +04:00
kingbri	ae879a623f	Main: Add await to an async function load_loras wasn't properly updated. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-02 21:24:43 -04:00
kingbri	ab526f7278	Revert "API: Remove unncessary Optional signatures" This reverts commit `7556dcf134`. The Optionals allowed requests to send "null" in the body for optional parameters which should be allowed. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-02 21:23:48 -04:00
kingbri	7556dcf134	API: Remove unncessary Optional signatures Optional isn't necessary if the function signature has a default value. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-01 00:04:52 -04:00
kingbri	ae75db1829	Downloader: Cleanup on exception Otherwise a file exists error will show up if any exception happens but cancel. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-30 23:26:22 -04:00
kingbri	e4084b15c1	Downloader: Format Make a public function private. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-30 01:16:57 -04:00
kingbri	50e0b71690	Downloader: Fix handling of include pattern If an include or exclude pattern is provided, include should include all files by default. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-30 01:13:06 -04:00
kingbri	21a01741c9	Downloader: Add include and exclude parameters These both take an array of glob strings to state what files or directories to include or exclude when parsing the download list. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-30 00:58:54 -04:00
kingbri	c47869c606	Downloader: Fix fallback mechanisms Use None-ish coalescing instead of unwrap optional handling. This means that any value that is "empty" for python will default to the fallback. Ex. print("" or "test") will print out "test" Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-29 23:33:37 -04:00
kingbri	55ccd1baad	API: Add HuggingFace downloader Adds an asynchronous huggingface downloader that uses HF hub to fetch all repo files. The current HF hub package has a snapshot_download function that does not cancel on KeyboardInterrupt. Instead, make a downloader that uses the Rich progress bar styling along with a cancellable interface. Finally, link this to TabbyAPI. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-29 01:15:02 -04:00
kingbri	6114bfd221	API: Fix banned_tokens string when empty The string should not be parsed and any non-string elements should be removed as well. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-28 12:46:28 -04:00
kingbri	72dff0b6d5	Update README Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-28 11:26:20 -04:00
kingbri	fb01b164d8	Dependencies: Update flash attention 2 v2.5.8 Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-28 11:07:00 -04:00
kingbri	0e015ad58e	Dependencies: Update ExllamaV2 v0.0.20 ROCm 6.0 is now required Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-28 11:06:59 -04:00
kingbri	3de93d7c0a	Dependencies: Update torch v2.3.0 NOTE: ROCm is updated to v6.0 wheels Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-28 11:06:17 -04:00
kingbri	4daa6390a5	Dependencies: Unpin lm-format-enforcer It should be fine to use the stable version from now on. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-28 11:06:17 -04:00
kingbri	6f9da97114	API: Add banned_tokens Appends the banned tokens to the generation. This is equivalent of setting logit bias to -100 on a specific set of tokens. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-28 11:06:09 -04:00
kingbri	5750826120	Model: Remove extraneous print Was printing IDs by accident. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-25 18:49:09 -04:00
kingbri	fb1d2f34c1	OAI: Add response_prefix and fix BOS token issues in chat completions response_prefix is used to add a prefix before generating the next message. This is used in many cases such as continuining a prompt (see #96). Also if a template has BOS token specified, add_bos_token will append two BOS tokens. Add a check which strips a starting BOS token from the prompt if it exists. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-25 00:54:43 -04:00
kingbri	ed7cd3cb59	Network: Fix socket check timeout Make this a one second timeout to check if a socket is connected. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-22 21:33:41 -04:00
kingbri	1e56d43772	Dependencies: Update lm-format-enforcer v0.9.8 Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-22 21:33:28 -04:00
kingbri	88b0b6f4f1	Model: Cast autosplit_reserve to int Torch errors if float values are passed (because bytes are not float types). Therefore, overestimate and cast to an int type. Resolves #97 Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-21 23:49:01 -04:00
kingbri	cab789e685	Templates: Migrate to class Having many utility functions for initialization doesn't make much sense. Instead, handle anything regarding template creation inside the class which reduces the amount of function imports. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-21 23:28:14 -04:00
kingbri	9f93505bc1	OAI: Add skip_special_tokens parameter Allows the ability to decode special tokens if the user wishes. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-21 00:37:46 -04:00
kingbri	67f061859d	Tree: Add transformers_utils Part of commit `8824ea0205` Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-20 00:07:39 -04:00
kingbri	8824ea0205	Model: Add EOS token support from generation_config.json GenerationConfig is meant to override various parts of the model on generation within the transformers lib. Rather than implementing the entire GenerationConfig framework (since it's pretty redundant), add in multi eos_token support like VLLM. The GenerationConfig is used only for generation, but can be used for other uses if needed. If there's more necessary parameters in the future, add those in as well. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-19 22:52:32 -04:00
kingbri	933c5afef0	Dependencies: Update ExllamaV2 and lm-format-enforcer ExllamaV2: v0.0.19 lmfe: v0.9.6 Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-19 21:15:50 -04:00
kingbri	65871ebc0c	Docker: Add var to pull on build When building the Docker container, try pulling from the github repository to get the latest commit. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-19 21:06:34 -04:00
kingbri	209f0370b4	Docker: Switch image and copy config Automatically create a config.yml on build. Also use the cuda runtime image which is much lighter than the previous cuda devel image. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-15 18:01:56 -04:00
Brian Dashore	a2a2e4b866	Merge pull request #94 from pabl-o-ce/docker Dockerfile work with pyproject.toml	2024-04-15 18:01:09 -04:00
kingbri	515b3c2930	OAI: Tokenize chat completion messages Since chat completion messages are a structure, format the prompt before checking in the tokenizer. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-15 14:17:16 -04:00
kingbri	ed05f376d9	Dependencies: Switch to LM-format-enforcer fork LM format enforcer has some latency on token ingestion, so use an optimized fork instead. Also add this in as a base dependency since the size is small. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-14 11:59:49 -04:00

1 2 3 4 5 ...

464 commits