jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
kingbri	0e015ad58e	Dependencies: Update ExllamaV2 v0.0.20 ROCm 6.0 is now required Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-28 11:06:59 -04:00
kingbri	3de93d7c0a	Dependencies: Update torch v2.3.0 NOTE: ROCm is updated to v6.0 wheels Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-28 11:06:17 -04:00
kingbri	4daa6390a5	Dependencies: Unpin lm-format-enforcer It should be fine to use the stable version from now on. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-28 11:06:17 -04:00
kingbri	6f9da97114	API: Add banned_tokens Appends the banned tokens to the generation. This is equivalent of setting logit bias to -100 on a specific set of tokens. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-28 11:06:09 -04:00
kingbri	5750826120	Model: Remove extraneous print Was printing IDs by accident. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-25 18:49:09 -04:00
kingbri	fb1d2f34c1	OAI: Add response_prefix and fix BOS token issues in chat completions response_prefix is used to add a prefix before generating the next message. This is used in many cases such as continuining a prompt (see #96). Also if a template has BOS token specified, add_bos_token will append two BOS tokens. Add a check which strips a starting BOS token from the prompt if it exists. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-25 00:54:43 -04:00
kingbri	ed7cd3cb59	Network: Fix socket check timeout Make this a one second timeout to check if a socket is connected. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-22 21:33:41 -04:00
kingbri	1e56d43772	Dependencies: Update lm-format-enforcer v0.9.8 Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-22 21:33:28 -04:00
kingbri	88b0b6f4f1	Model: Cast autosplit_reserve to int Torch errors if float values are passed (because bytes are not float types). Therefore, overestimate and cast to an int type. Resolves #97 Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-21 23:49:01 -04:00
kingbri	cab789e685	Templates: Migrate to class Having many utility functions for initialization doesn't make much sense. Instead, handle anything regarding template creation inside the class which reduces the amount of function imports. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-21 23:28:14 -04:00
kingbri	9f93505bc1	OAI: Add skip_special_tokens parameter Allows the ability to decode special tokens if the user wishes. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-21 00:37:46 -04:00
kingbri	67f061859d	Tree: Add transformers_utils Part of commit `8824ea0205` Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-20 00:07:39 -04:00
kingbri	8824ea0205	Model: Add EOS token support from generation_config.json GenerationConfig is meant to override various parts of the model on generation within the transformers lib. Rather than implementing the entire GenerationConfig framework (since it's pretty redundant), add in multi eos_token support like VLLM. The GenerationConfig is used only for generation, but can be used for other uses if needed. If there's more necessary parameters in the future, add those in as well. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-19 22:52:32 -04:00
kingbri	933c5afef0	Dependencies: Update ExllamaV2 and lm-format-enforcer ExllamaV2: v0.0.19 lmfe: v0.9.6 Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-19 21:15:50 -04:00
kingbri	65871ebc0c	Docker: Add var to pull on build When building the Docker container, try pulling from the github repository to get the latest commit. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-19 21:06:34 -04:00
kingbri	209f0370b4	Docker: Switch image and copy config Automatically create a config.yml on build. Also use the cuda runtime image which is much lighter than the previous cuda devel image. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-15 18:01:56 -04:00
Brian Dashore	a2a2e4b866	Merge pull request #94 from pabl-o-ce/docker Dockerfile work with pyproject.toml	2024-04-15 18:01:09 -04:00
kingbri	515b3c2930	OAI: Tokenize chat completion messages Since chat completion messages are a structure, format the prompt before checking in the tokenizer. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-15 14:17:16 -04:00
kingbri	ed05f376d9	Dependencies: Switch to LM-format-enforcer fork LM format enforcer has some latency on token ingestion, so use an optimized fork instead. Also add this in as a base dependency since the size is small. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-14 11:59:49 -04:00
kingbri	3d14283fe0	Start: Lint Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-13 12:25:41 -04:00
kingbri	4d158dac90	Start: Fix when reading from gpu_lib file The wrong variable was being set, so fix that. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-13 12:24:30 -04:00
kingbri	2a0aaa2e8a	OAI: Add ability to pass extra vars in jinja templates A chat completion can now declare extra template_vars to pass when a template is rendered, opening up the possibility of using state outside of huggingface's parameters. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-11 09:49:25 -04:00
kingbri	b1f3baad74	OAI: Add response_format parameter response_format allows a user to request a valid, but arbitrary JSON object from the API. This is a new part of the OAI spec. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-09 21:33:31 -04:00
kingbri	de41e9f7e9	Start: Add gpu_lib argument Argument to override the selected GPU library. Useful for daemoniztion when running for the first time. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-08 23:33:19 -04:00
kingbri	d759a15559	Model: Fix chunk size handling Wrong class attribute name used for max_attention_size and fixes declaration of the draft model's chunk_size. Also expose the parameter to the end user in both config and model load. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-07 18:39:19 -04:00
kingbri	30c4554572	Requirements: Update Exllamav2 v0.0.18 Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-07 18:00:56 -04:00
kingbri	46ac3beea9	Templates: Support list style chat_template keys HuggingFace updated transformers to provide templates in a list for tokenizers. Update to support this new format. Providing the name of a template for the "prompt_template" value in config.yml will also look inside the template list. In addition, log if there's a template exception, but continue model loading since it shouldn't shut down the application. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-07 11:20:25 -04:00
kingbri	5bb4995a7c	API: Move OAI to APIRouter This makes the API more modular for other API implementations in the future. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-06 01:25:31 -04:00
kingbri	8bdc19124f	Start: Fix gpu lib when reading from file Readline doesn't strip out newlines or spaces. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-02 22:04:01 -04:00
Brian Dashore	cdb96e4f74	Merge pull request #93 from AlpinDale/chore/log-level chore: make log level configurable via env variable	2024-04-02 00:52:06 -04:00
kingbri	f9f8c97c6d	Templates: Fix stop_string parsing Template modules grab all set vars, including ones that use runtime vars. If a template var is set to a runtime var and a module is created, an UndefinedError fires. Use make_module instead to pass runtime vars when creating a template module. Resolves #92 Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-02 00:44:04 -04:00
PΔBLØ ᄃΞ	8a5a82baec	Update Dockerfile remove unnecessary apt install to just use one	2024-04-01 22:27:11 -05:00
PΔBLØ ᄃΞ	85271e2b7d	fix: Dockerfile work on pyproject.toml	2024-04-01 19:32:42 -05:00
AlpinDale	1650e6e640	ruff	2024-04-01 23:11:30 +00:00
AlpinDale	5e599ddbd4	typo	2024-04-01 23:08:28 +00:00
AlpinDale	6c4a1a9c70	make log level a global var	2024-04-01 23:07:30 +00:00
AlpinDale	031349133b	properly order imports	2024-04-01 23:03:16 +00:00
AlpinDale	e90ead3b35	chore: make log level configurable via env variable	2024-04-01 22:57:56 +00:00
kingbri	6ecce1604b	Model: Fix log if exl2 version is too low Switch to pyproject syntax. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-31 23:11:21 -04:00
kingbri	f534930270	Dependencies: Bump Exllamav2 v0.0.17 Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-31 23:10:28 -04:00
kingbri	d716527b92	Sampling: Add additive param to overrides Additive is used to add collections together. Currently, it's used for lists, but it can be used for dictionaries in the future. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-31 01:10:55 -04:00
kingbri	05b5700334	Dependencies: Update torch v2.2.2 Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-30 17:03:37 -04:00
kingbri	5c94894a1a	Dependencies: Update Flash Attention v2.5.6 Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-30 16:58:24 -04:00
kingbri	b11aac51e2	Model: Add torch.inference_mode() to generator function Provides a speedup to model forward. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-30 10:45:28 -04:00
kingbri	e8b6a02aa8	API: Move prompt template construction to utils Best to move the inner workings within its inner function. Also fix an edge case where stop strings can be a string rather than an array. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-29 02:24:13 -04:00
kingbri	190a0b26c3	Model: Fix generation when stream = false References #91. Check if the length of the generation array is > 0 after popping the finish reason. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-29 02:15:56 -04:00
kingbri	d4280e1378	Dependencies: Add pytorch-triton-rocm Required for AMD installs. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-28 11:02:56 -04:00
kingbri	271f5ba7a4	Templates: Modify alpaca and chatml Add the stop_strings metadata parameter. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-27 22:28:41 -04:00
kingbri	dc456f4cc2	Templates: Add stop_strings meta param Adding the stop_strings var to chat templates will allow for the template creator to specify stopping strings to add onto chat completions. Thes get appended with existing stopping strings that are passed in the API request. However, a sampler override with force: true will override all stopping strings. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-27 22:22:07 -04:00
kingbri	277c540c98	Colab: Update Switch to pyproject Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-24 21:48:48 -04:00

1 2 3 4 5 ...

433 commits