Start scripts now don't update dependencies by default due to mishandling
caches from pip. Also add dedicated update scripts and save options
to a JSON file instead of a text one.
Signed-off-by: kingbri <bdashore3@proton.me>
The async signal exit function should be the internal for exiting
the program. In addition, prevent the handler from being called
twice by adding a boolean. May become an asyncio event later on.
In addition, make sure to skip_wait when running model.unload.
Signed-off-by: kingbri <bdashore3@proton.me>
A user's prompt and response can be large in the console. Therefore,
always log the smaller payloads (ex. gen params + metrics) after
the large chunks.
However, it's recommended to keep prompt logging off anyways since
it'll result in console spam.
Signed-off-by: kingbri <bdashore3@proton.me>
Installing directly from github causes pip's HTTP cache to not
recognize that the correct version of a package is already installed.
This causes a redownload.
When using the Start.bat script, it updates dependencies automatically
to keep users on the latest versions of a package for security reasons.
A simple pip cache website helps alleviate this problem and allows pip
to find the cached wheels when invoked with an upgrade argument.
Signed-off-by: kingbri <bdashore3@proton.me>
The override wasn't being passed in before. Also, the default is now
none since Exl2 can automatically calculate the max batch size.
Signed-off-by: kingbri <bdashore3@proton.me>
Embedding models are managed on a separate backend, but are run
in parallel with the model itself. Therefore, manage this in a separate
container with separate routes.
Signed-off-by: kingbri <bdashore3@proton.me>
Use Infinity as a separate backend and handle the model within the
common module. This separates out the embeddings model from the endpoint
which allows for model loading/unloading in core.
Signed-off-by: kingbri <bdashore3@proton.me>
Infinity-emb is an async batching engine for embeddings. This is
preferable to sentence-transformers since it handles scalable usecases
without the need for external thread intervention.
Signed-off-by: kingbri <bdashore3@proton.me>
This is necessary for Kobold's API. Current models use bad_words_ids
in generation_config.json, but for some reason, they're also present
in the model's config.json.
Signed-off-by: kingbri <bdashore3@proton.me>
Some of the parameters the API provides are aliases for their OAI
equivalents. It makes more sense to move them to the common file.
Signed-off-by: kingbri <bdashore3@proton.me>
Reduces dependency size since the full fastapi package isn't required.
Add httptools since it makes requests faster and it was installed
with fastapi previously.
Signed-off-by: kingbri <bdashore3@proton.me>
Realtime process priority assigns resources to point to tabby's
processes. Running as administrator will give realtime priority
while running as a normal user will set as high priority.
Signed-off-by: kingbri <bdashore3@proton.me>
These are faster event loops for asyncio which should improve overall
performance. Gate these under an experimental flag for now to stress
test these loops.
Signed-off-by: kingbri <bdashore3@proton.me>
Add an API parameter to set the timeout in seconds. Keep it to None
by default for uninterrupted downloads.
Signed-off-by: kingbri <bdashore3@proton.me>
This prevents TimeoutErrors from showing up. However, a longer
timeout may be necessary since this is in the API. Turning it off
for now will help resolve immediate errors.
Signed-off-by: kingbri <bdashore3@proton.me>