Dynamic gen takes in filters differently. Adjust to set the filter list
per class rather than in the generation function.
Signed-off-by: kingbri <bdashore3@proton.me>
Adds basic support for ExllamaV2's dynamic generator. Can generate
a streaming and non-streaming completion.
Signed-off-by: kingbri <bdashore3@proton.me>
At any point for any request cancellation, the semaphore will be
decremented. This is an issue since an arbitrary request can desync
the semaphore, causing multiple tasks to be processed at once and
break generation.
Remove this from the networking handlers and therefore, remove the
release_semaphore function itself.
Signed-off-by: kingbri <bdashore3@proton.me>
If an override was iterable, any modifications to the returned value
would alter the reference to the global storage dict.
Therefore, copy the structure if it's an iterable so any modification
won't alter the original override. Also apply this for the function
that checks for forced overrides.
Signed-off-by: kingbri <bdashore3@proton.me>
skip_special_tokens is in stable exl2. Also default the parameters
if they are not present in the function signature.
Signed-off-by: kingbri <bdashore3@proton.me>
From exllamav2: List of strings that the generator will refuse to output. As soon as a partial match happens, a checkpoint is saved that the generator can rewind to if need be. Subsequent tokens are then held until the full string is resolved (match or no match) and either emitted or discarded, accordingly.
Bans the EOS token until the generation reaches a minimum length. This will not prevent the model from otherwise ending the generation early by outputting other stop conditions.
This reverts commit 7556dcf134.
The Optionals allowed requests to send "null" in the body for optional
parameters which should be allowed.
Signed-off-by: kingbri <bdashore3@proton.me>
These both take an array of glob strings to state what files or
directories to include or exclude when parsing the download list.
Signed-off-by: kingbri <bdashore3@proton.me>
Use None-ish coalescing instead of unwrap optional handling. This means
that any value that is "empty" for python will default to the fallback.
Ex. print("" or "test") will print out "test"
Signed-off-by: kingbri <bdashore3@proton.me>
Adds an asynchronous huggingface downloader that uses HF hub to fetch
all repo files. The current HF hub package has a snapshot_download
function that does not cancel on KeyboardInterrupt.
Instead, make a downloader that uses the Rich progress bar styling
along with a cancellable interface. Finally, link this to TabbyAPI.
Signed-off-by: kingbri <bdashore3@proton.me>
Appends the banned tokens to the generation. This is equivalent of
setting logit bias to -100 on a specific set of tokens.
Signed-off-by: kingbri <bdashore3@proton.me>
response_prefix is used to add a prefix before generating the next
message. This is used in many cases such as continuining a prompt
(see #96).
Also if a template has BOS token specified, add_bos_token will
append two BOS tokens. Add a check which strips a starting BOS token
from the prompt if it exists.
Signed-off-by: kingbri <bdashore3@proton.me>
Torch errors if float values are passed (because bytes are not float
types). Therefore, overestimate and cast to an int type.
Resolves#97
Signed-off-by: kingbri <bdashore3@proton.me>
Having many utility functions for initialization doesn't make much sense.
Instead, handle anything regarding template creation inside the
class which reduces the amount of function imports.
Signed-off-by: kingbri <bdashore3@proton.me>