jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
DocShotgun	767e6a798a	API + Model: Add support for specifying k/v cache size	2024-05-26 14:17:01 -07:00
kingbri	9fbbc5afca	Tree: Swap from map to list comprehensions List comprehensions are the more "pythonic" way to approach mapping values to a list. They're also more flexible across different collection types rather than the inbuilt map method. It's best to keep one convention rather than splitting down two. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-25 21:16:14 -04:00
kingbri	43cd7f57e8	API + Model: Add blocks and checks for various load requests Add a sequential lock and wait until jobs are completed before executing any loading requests that directly alter the model. However, we also need to block any new requests that come in until the load is finished, so add a condition that triggers once the lock is free. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-25 21:16:14 -04:00
kingbri	06ff47e2b4	Model: Use true async jobs and add logprobs The new async dynamic job allows for native async support without the need of threading. Also add logprobs and metrics back to responses. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-25 21:16:14 -04:00
kingbri	c474076b22	Concurrency: Remove release_semaphore method At any point for any request cancellation, the semaphore will be decremented. This is an issue since an arbitrary request can desync the semaphore, causing multiple tasks to be processed at once and break generation. Remove this from the networking handlers and therefore, remove the release_semaphore function itself. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-19 10:42:26 -04:00
kingbri	b9fd8555fe	Sampling: Copy over iterable overrides If an override was iterable, any modifications to the returned value would alter the reference to the global storage dict. Therefore, copy the structure if it's an iterable so any modification won't alter the original override. Also apply this for the function that checks for forced overrides. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-17 21:38:28 -04:00
DocShotgun	abe411c6fb	API + Model: Add support for regex pattern constraints Adds the ability to constrain generation via regex pattern using lm-format-enforcer.	2024-05-12 19:10:43 -07:00
DocShotgun	9463ecfa40	Samplers: Minor fixes for sampler override * Add missing settings to sample_preset.yml * Fix override for skip_special_tokens	2024-05-12 00:31:31 -07:00
kingbri	c8ec742be9	Samplers: Expose skew sampling Skew is an extra unused sampler in ExllamaV2. Add it in for coverage. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-12 01:41:01 -04:00
kingbri	6f4012d20d	API: Add preset listing for sampler overrides Querying the overrides list endpoint now returns the selected preset and a list of presets to use. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-12 01:34:51 -04:00
DocShotgun	c0b631ba92	API: Add banned_strings From exllamav2: List of strings that the generator will refuse to output. As soon as a partial match happens, a checkpoint is saved that the generator can rewind to if need be. Subsequent tokens are then held until the full string is resolved (match or no match) and either emitted or discarded, accordingly.	2024-05-10 13:53:55 -07:00
DocShotgun	a1df22668b	API: Add min_tokens Bans the EOS token until the generation reaches a minimum length. This will not prevent the model from otherwise ending the generation early by outputting other stop conditions.	2024-05-10 12:30:17 -07:00
kingbri	ab526f7278	Revert "API: Remove unncessary Optional signatures" This reverts commit `7556dcf134`. The Optionals allowed requests to send "null" in the body for optional parameters which should be allowed. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-02 21:23:48 -04:00
kingbri	7556dcf134	API: Remove unncessary Optional signatures Optional isn't necessary if the function signature has a default value. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-01 00:04:52 -04:00
kingbri	ae75db1829	Downloader: Cleanup on exception Otherwise a file exists error will show up if any exception happens but cancel. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-30 23:26:22 -04:00
kingbri	e4084b15c1	Downloader: Format Make a public function private. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-30 01:16:57 -04:00
kingbri	50e0b71690	Downloader: Fix handling of include pattern If an include or exclude pattern is provided, include should include all files by default. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-30 01:13:06 -04:00
kingbri	21a01741c9	Downloader: Add include and exclude parameters These both take an array of glob strings to state what files or directories to include or exclude when parsing the download list. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-30 00:58:54 -04:00
kingbri	c47869c606	Downloader: Fix fallback mechanisms Use None-ish coalescing instead of unwrap optional handling. This means that any value that is "empty" for python will default to the fallback. Ex. print("" or "test") will print out "test" Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-29 23:33:37 -04:00
kingbri	55ccd1baad	API: Add HuggingFace downloader Adds an asynchronous huggingface downloader that uses HF hub to fetch all repo files. The current HF hub package has a snapshot_download function that does not cancel on KeyboardInterrupt. Instead, make a downloader that uses the Rich progress bar styling along with a cancellable interface. Finally, link this to TabbyAPI. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-29 01:15:02 -04:00
kingbri	6114bfd221	API: Fix banned_tokens string when empty The string should not be parsed and any non-string elements should be removed as well. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-28 12:46:28 -04:00
kingbri	6f9da97114	API: Add banned_tokens Appends the banned tokens to the generation. This is equivalent of setting logit bias to -100 on a specific set of tokens. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-28 11:06:09 -04:00
kingbri	ed7cd3cb59	Network: Fix socket check timeout Make this a one second timeout to check if a socket is connected. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-22 21:33:41 -04:00
kingbri	cab789e685	Templates: Migrate to class Having many utility functions for initialization doesn't make much sense. Instead, handle anything regarding template creation inside the class which reduces the amount of function imports. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-21 23:28:14 -04:00
kingbri	9f93505bc1	OAI: Add skip_special_tokens parameter Allows the ability to decode special tokens if the user wishes. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-21 00:37:46 -04:00
kingbri	67f061859d	Tree: Add transformers_utils Part of commit `8824ea0205` Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-20 00:07:39 -04:00
kingbri	46ac3beea9	Templates: Support list style chat_template keys HuggingFace updated transformers to provide templates in a list for tokenizers. Update to support this new format. Providing the name of a template for the "prompt_template" value in config.yml will also look inside the template list. In addition, log if there's a template exception, but continue model loading since it shouldn't shut down the application. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-07 11:20:25 -04:00
Brian Dashore	cdb96e4f74	Merge pull request #93 from AlpinDale/chore/log-level chore: make log level configurable via env variable	2024-04-02 00:52:06 -04:00
kingbri	f9f8c97c6d	Templates: Fix stop_string parsing Template modules grab all set vars, including ones that use runtime vars. If a template var is set to a runtime var and a module is created, an UndefinedError fires. Use make_module instead to pass runtime vars when creating a template module. Resolves #92 Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-02 00:44:04 -04:00
AlpinDale	1650e6e640	ruff	2024-04-01 23:11:30 +00:00
AlpinDale	5e599ddbd4	typo	2024-04-01 23:08:28 +00:00
AlpinDale	6c4a1a9c70	make log level a global var	2024-04-01 23:07:30 +00:00
AlpinDale	031349133b	properly order imports	2024-04-01 23:03:16 +00:00
AlpinDale	e90ead3b35	chore: make log level configurable via env variable	2024-04-01 22:57:56 +00:00
kingbri	d716527b92	Sampling: Add additive param to overrides Additive is used to add collections together. Currently, it's used for lists, but it can be used for dictionaries in the future. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-31 01:10:55 -04:00
kingbri	dc456f4cc2	Templates: Add stop_strings meta param Adding the stop_strings var to chat templates will allow for the template creator to specify stopping strings to add onto chat completions. Thes get appended with existing stopping strings that are passed in the API request. However, a sampler override with force: true will override all stopping strings. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-27 22:22:07 -04:00
kingbri	6dfcbbd813	Common: Migrate request utils to networking Helps organize the project better. Utils is meant to be for simple functions like unwrap. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-21 23:21:57 -04:00
kingbri	2961c5f3f9	API: Handle request disconnect on non-streaming gens Works the same way as streaming gens. If the request is cancelled, it will log an error to the user and release the semaphore if it's holding anything. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-21 23:12:59 -04:00
kingbri	09a4c79847	Model: Auto-scale max_tokens by default If max_tokens is None, it automatically scales to fill up the context. This does not mean the generation will fill up that context since EOS stops also exist. Originally suggested by #86 Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-18 22:54:59 -04:00
kingbri	25f5d4a690	API: Cleanup permission endpoint Don't return an OAI specific type from a common file. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-18 15:13:26 -04:00
kingbri	3c08f46c51	Endpoints: Add key permission checker This is a definite way to check if an authorized key is API or admin. The endpoint only runs if the key is valid in the first place to keep inline with the API's security model. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-18 00:53:27 -04:00
kingbri	14d8ec2007	Signal: Fix signal handlers for uvicorn Add the ability to override uvicorn's signal handler in addition to using main's signal handler for any SIGINTs before the API server starts. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-16 23:23:31 -04:00
kingbri	95e44c20d6	Model: Fix load if model didn't load properly If the model didn't load properly, the container still exists until unload is called. However, the name check still registered as true. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-16 23:23:31 -04:00
kingbri	2755fd1af0	API: Fix blocking iterator execution Run these iterators on the background thread. On startup, the API spawns a background thread as needed to run sync code on without blocking the event loop. Use asyncio's run_thread function since it allows for errors to be propegated. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-16 23:23:31 -04:00
kingbri	7fded4f183	Tree: Switch to async generators Async generation helps remove many roadblocks to managing tasks using threads. It should allow for abortables and modern-day paradigms. NOTE: Exllamav2 itself is not an asynchronous library. It's just been added into tabby's async nature to allow for a fast and concurrent API server. It's still being debated to run stream_ex in a separate thread or manually manage it using asyncio.sleep(0) Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-16 23:23:31 -04:00
kingbri	7006fa4cc8	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-13 23:33:18 -04:00
kingbri	efc01d947b	API + Model: Add speculative ngram decoding Speculative ngram decoding is like speculative decoding without the draft model. It's not as useful because it only decodes on predictable sequences, but it depends on the usecase. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-13 23:32:11 -04:00
kingbri	2ebefe8258	Logging: Move metrics to gen logging This didn't have a place in the generation function. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-13 23:13:55 -04:00
kingbri	1ec8eb9620	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-13 00:02:55 -04:00
kingbri	6f03be9523	API: Split functions into their own files Previously, generation function were bundled with the request function causing the overall code structure and API to look ugly and unreadable. Split these up and cleanup a lot of the methods that were previously overlooked in the API itself. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-12 23:59:30 -04:00

1 2

90 commits