jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
kingbri	4d158dac90	Start: Fix when reading from gpu_lib file The wrong variable was being set, so fix that. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-13 12:24:30 -04:00
kingbri	2a0aaa2e8a	OAI: Add ability to pass extra vars in jinja templates A chat completion can now declare extra template_vars to pass when a template is rendered, opening up the possibility of using state outside of huggingface's parameters. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-11 09:49:25 -04:00
kingbri	b1f3baad74	OAI: Add response_format parameter response_format allows a user to request a valid, but arbitrary JSON object from the API. This is a new part of the OAI spec. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-09 21:33:31 -04:00
kingbri	de41e9f7e9	Start: Add gpu_lib argument Argument to override the selected GPU library. Useful for daemoniztion when running for the first time. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-08 23:33:19 -04:00
kingbri	d759a15559	Model: Fix chunk size handling Wrong class attribute name used for max_attention_size and fixes declaration of the draft model's chunk_size. Also expose the parameter to the end user in both config and model load. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-07 18:39:19 -04:00
kingbri	30c4554572	Requirements: Update Exllamav2 v0.0.18 Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-07 18:00:56 -04:00
kingbri	46ac3beea9	Templates: Support list style chat_template keys HuggingFace updated transformers to provide templates in a list for tokenizers. Update to support this new format. Providing the name of a template for the "prompt_template" value in config.yml will also look inside the template list. In addition, log if there's a template exception, but continue model loading since it shouldn't shut down the application. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-07 11:20:25 -04:00
kingbri	5bb4995a7c	API: Move OAI to APIRouter This makes the API more modular for other API implementations in the future. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-06 01:25:31 -04:00
kingbri	8bdc19124f	Start: Fix gpu lib when reading from file Readline doesn't strip out newlines or spaces. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-02 22:04:01 -04:00
Brian Dashore	cdb96e4f74	Merge pull request #93 from AlpinDale/chore/log-level chore: make log level configurable via env variable	2024-04-02 00:52:06 -04:00
kingbri	f9f8c97c6d	Templates: Fix stop_string parsing Template modules grab all set vars, including ones that use runtime vars. If a template var is set to a runtime var and a module is created, an UndefinedError fires. Use make_module instead to pass runtime vars when creating a template module. Resolves #92 Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-02 00:44:04 -04:00
AlpinDale	1650e6e640	ruff	2024-04-01 23:11:30 +00:00
AlpinDale	5e599ddbd4	typo	2024-04-01 23:08:28 +00:00
AlpinDale	6c4a1a9c70	make log level a global var	2024-04-01 23:07:30 +00:00
AlpinDale	031349133b	properly order imports	2024-04-01 23:03:16 +00:00
AlpinDale	e90ead3b35	chore: make log level configurable via env variable	2024-04-01 22:57:56 +00:00
kingbri	6ecce1604b	Model: Fix log if exl2 version is too low Switch to pyproject syntax. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-31 23:11:21 -04:00
kingbri	f534930270	Dependencies: Bump Exllamav2 v0.0.17 Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-31 23:10:28 -04:00
kingbri	d716527b92	Sampling: Add additive param to overrides Additive is used to add collections together. Currently, it's used for lists, but it can be used for dictionaries in the future. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-31 01:10:55 -04:00
kingbri	05b5700334	Dependencies: Update torch v2.2.2 Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-30 17:03:37 -04:00
kingbri	5c94894a1a	Dependencies: Update Flash Attention v2.5.6 Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-30 16:58:24 -04:00
kingbri	b11aac51e2	Model: Add torch.inference_mode() to generator function Provides a speedup to model forward. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-30 10:45:28 -04:00
kingbri	e8b6a02aa8	API: Move prompt template construction to utils Best to move the inner workings within its inner function. Also fix an edge case where stop strings can be a string rather than an array. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-29 02:24:13 -04:00
kingbri	190a0b26c3	Model: Fix generation when stream = false References #91. Check if the length of the generation array is > 0 after popping the finish reason. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-29 02:15:56 -04:00
kingbri	d4280e1378	Dependencies: Add pytorch-triton-rocm Required for AMD installs. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-28 11:02:56 -04:00
kingbri	271f5ba7a4	Templates: Modify alpaca and chatml Add the stop_strings metadata parameter. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-27 22:28:41 -04:00
kingbri	dc456f4cc2	Templates: Add stop_strings meta param Adding the stop_strings var to chat templates will allow for the template creator to specify stopping strings to add onto chat completions. Thes get appended with existing stopping strings that are passed in the API request. However, a sampler override with force: true will override all stopping strings. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-27 22:22:07 -04:00
kingbri	277c540c98	Colab: Update Switch to pyproject Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-24 21:48:48 -04:00
kingbri	db62d1e649	OAI: Log request errors to console Previously, some request errors were only sent to the client, but some clients don't log the full error, so log it in console. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-23 20:29:17 -04:00
kingbri	26496c4db2	Dependencies: Require tokenizers This is used for some models and isn't too big in size (compared to other huggingface dependencies), so include it by default. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-23 01:12:21 -04:00
kingbri	1755f284cf	Model: Prompt users to install extras if dependencies don't exist Ex: tokenizers, lmfe, outlines. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-22 22:13:55 -04:00
kingbri	f952b81ccf	API: Remove uvicorn signal handler injection This causes spamming of warn statements on SIGINT. The message also gets printed on a normal shutdown (that isn't in the middle of a request). Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-21 23:23:45 -04:00
kingbri	6dfcbbd813	Common: Migrate request utils to networking Helps organize the project better. Utils is meant to be for simple functions like unwrap. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-21 23:21:57 -04:00
kingbri	2961c5f3f9	API: Handle request disconnect on non-streaming gens Works the same way as streaming gens. If the request is cancelled, it will log an error to the user and release the semaphore if it's holding anything. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-21 23:12:59 -04:00
kingbri	44b7319710	Start: Print pip install command Helps for debugging. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-21 18:14:48 -04:00
kingbri	5055a98e41	Model: Wrap load in inference_mode Some tensors were being taken out of inference mode during each iteration of exllama's load_autosplit_gen. This causes errors since autograd is off. Therefore, make the shared load_gen_sync function have an overarching inference_mode context to prevent forward issues. This should allow for the generator to iterate across each thread call. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-21 18:06:50 -04:00
kingbri	37a80334a8	Dependencies: Add packaging This is a required dependency. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-21 11:27:27 -04:00
kingbri	56fdfb5f8e	OAI: Add stream to gen params Good for logging. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-21 00:55:44 -04:00
kingbri	69e41e994c	Model: Fix generation with non-streaming and logprobs Finish_reason was giving an empty offset. Fix this by grabbing the finish reason first and then handling the static generation as normal. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-21 00:47:24 -04:00
kingbri	345bcc30c7	Dependencies: Add extras feature Installs all optional dependencies to the venv. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-21 00:09:38 -04:00
kingbri	51b289cab2	Actions: Fix workflows Adopt to new pyproject install method Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-20 15:21:37 -04:00
kingbri	1e7cf1e5a4	Start: Prompt user for GPU/lib There is no platform agnostic way to fetch CUDA/ROCm's versions since environment variables change and users don't necessarily need CUDA or ROCm installed to run pytorch (pytorch installs the necessary libs if they don't exist). Therefore, prompt the user for their GPU lib and store the result in a textfile so the user doesn't need to constantly enter a preference. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-20 15:21:37 -04:00
kingbri	7e669527ed	Model: Fix tokenizer bugs Some tokenizer variables don't get cleaned up on init, so these can persist. Clean these up manually before creating a new tokenizer for now. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-20 15:21:37 -04:00
kingbri	07d9b7cf7b	Model: Add abort on generation When the model is processing a prompt, add the ability to abort on request cancellation. This is also a catch for a SIGINT. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-20 15:21:37 -04:00
kingbri	7020a0a2d1	Dependencies: Update Exllamav2 v0.0.16 Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-20 15:21:37 -04:00
kingbri	061e1d94c2	Ruff: Migrate to pyproject Removes unnecessary ruff.toml. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-20 15:21:37 -04:00
kingbri	1059101b23	Dependencies: Remove requirements-*.txt files Pyproject.toml replaces these files. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-20 15:21:37 -04:00
kingbri	72b08624a3	Start: Update to use pyproject Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-20 15:21:37 -04:00
kingbri	b1ca435695	Tree: Add pyproject.toml This will manage dependencies from now on since it's a more flexible file that's similar to other packaging utilities like npm and cargo. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-20 15:21:37 -04:00
kingbri	b74603db59	Model: Log metrics before yielding a stop Yielding the finish reason before the logging causes the function to terminate early. Instead, log before yielding and breaking out of the generation loop. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-20 01:17:04 -04:00

1 2 3 4 5 ...

411 commits