Commit graph

687 commits

Author SHA1 Message Date
TerminalMan
4b11cabbec debloat docker build 2024-09-08 00:02:00 +01:00
kingbri
d34756dc98 Tree: Format
Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-05 18:05:59 -04:00
kingbri
2f45e978c5 API: Fix merge overwrite
The completions utils did not take the new imports.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-05 18:04:53 -04:00
Brian Dashore
ec7f64d530
Merge pull request #185 from SecretiveShell/refactor-config-loading
Refactor config loading
2024-09-05 18:00:32 -04:00
kingbri
1c9991f79e Config: Format and organize
Rename some methods and change comments.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-05 17:59:18 -04:00
Jake
cb91670c7a fix command line args
- move to a complet class singleton to avoid propagation errors
- remove legacy load confing precedure
2024-09-05 15:33:00 +01:00
kingbri
98768bfa30 Docker: Re-add build block
If a user wants to build from source, let them. But the default
should fetch from the package registry.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-04 23:39:06 -04:00
kingbri
93872b34d7 Config: Migrate to global class instead of dicts
The config categories can have defined separation, but preserve
the dynamic nature of adding new config options by making all the
internal class vars as dictionaries.

This was necessary since storing global callbacks stored a state
of the previous global_config var that wasn't populated.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-04 23:18:47 -04:00
Brian Dashore
3bc9bd09a0
Merge pull request #180 from SecretiveShell/main
make docker-compose use prebuilt images
2024-09-04 21:48:18 -04:00
Brian Dashore
8524999284
Merge pull request #184 from SecretiveShell/Infinity-Embed-TODO
Complete conditional infinity import TODO
2024-09-04 21:47:49 -04:00
Brian Dashore
03ff472149
Merge pull request #130 from bartowski1182/main
WIP: Add 'model' argument to /v1/chat/completions to load a new model on the fly
2024-09-04 21:46:41 -04:00
kingbri
9c10789ca1 API: Error on invalid key permissions and cleanup format
If a user requesting a model change isn't admin, error.

Better to place the load function before the generate functions.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-04 21:44:14 -04:00
Jake
e772fa2981 Switch to internal dict merge implementation
- remove deepmerge dependency
- fix ruff formatting
2024-09-04 16:27:28 +01:00
Jake
42a42caf43 remove logging
- remove logging statements
- format code with ruff
2024-09-04 16:14:09 +01:00
Jake
ac4d9bba1c refactor config functions
- improve DRY
2024-09-04 12:49:22 +01:00
Jake
fa6404a95a refactor config loading
- improve DRY
- alter logging
- allow extensibility
- add foundation for environment variables as config
2024-09-04 12:22:49 +01:00
kingbri
21f14d4318 API: Update inline load
- Add a config flag
- Migrate support to /v1/completions
- Unify the load function

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-03 23:37:28 -04:00
kingbri
dd30d6592a Merge branch 'main' of https://github.com/theroyallab/tabbyapi into inline 2024-09-03 18:03:17 -04:00
kingbri
8854269121 API: Fix current model list return
Check if the container actually exists in the match before returning
the value of the directory.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-01 10:54:01 -04:00
kingbri
4bf1a71d7b Model: Fix model override application for draft args
These have to be merged beforehand and the updated version needs to be
re-fetched. It's possible to prevent the fetch of draft_args in the
beginning of init.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-31 22:59:56 -04:00
kingbri
4aebe8a2a5 Config: Use an explicit "auto" value for rope_alpha
Using "auto" for rope alpha removes ambiguity on how to explicitly
enable automatic rope calculation. The same behavior of None -> auto
calculate still exists, but can be overwritten if a model's tabby_config.yml
includes `rope_alpha`.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-31 22:59:56 -04:00
kingbri
a96fa5f138 API: Don't fallback to default values on model load request
It's best to pass them down the config stack.

API/User config.yml -> model config.yml -> model config.json -> fallback.

Doing this allows for seamless flow and yielding control to each
member in the stack.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-31 22:59:56 -04:00
kingbri
4452d6f665 Model: Add support for overridable model config.yml
Like config.json in a model folder, providing a tabby_config.yml
will serve as a layer between user provided kwargs and the config.json
values.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-31 22:59:56 -04:00
kingbri
dd55b99af5 Model: Store directory paths
Storing a pathlib type makes it easier to manipulate the model
directory path in the long run without constantly fetching it
from the config.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-31 22:59:56 -04:00
kingbri
523709741c Model: Reorder how configs are set up
Initialize the Exllama classes first then add user-specific params.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-31 22:59:56 -04:00
TerminalMan
43104e0d19
Complete conditional infinity import TODO
- add logging
- change declaration order
2024-08-31 21:48:43 +01:00
kingbri
21712578cf API: Add allowed_tokens support
This is the opposite of banned tokens. Exllama specific implementation
of #181.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-29 21:44:42 -04:00
kingbri
10d9419f90 Model: Add BOS token to prompt logs
If add_bos_token is enabled, the BOS token gets appended to the logged
prompt if logging is enabled.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-29 21:15:09 -04:00
TerminalMan
48d7674316
make docker-compose use prebuilt images
- Docker compose uses the prebuilt images produced by the GitHub action added in 872eeed581
2024-08-29 00:50:01 +01:00
kingbri
96fce34253 Dependencies: Update ExllamaV2
v0.2.0

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-28 18:34:00 -04:00
kingbri
a00d972054 Server: Remove unused comments
Leftovers from the new API server log system.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-27 21:45:51 -04:00
kingbri
4958c06813 Model: Remove and format comments
The comment in __init__ was outdated and all the kwargs are the
config options anyways.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-27 21:43:40 -04:00
TerminalMan
80198ca056
API: Add /v1/health endpoint (#178)
* Add healthcheck

- localhost only /healthcheck endpoint
- cURL healthcheck in docker compose file

* Update Healthcheck Response

- change endpoint to /health
- remove localhost restriction
- add docstring

* move healthcheck definition to top of the file

- make the healthcheck show up first in the openAPI spec

* Tree: Format
2024-08-27 21:37:41 -04:00
Amgad Hasan
872eeed581
Build and push docker image (#171)
* Create docker-image.yml

* Update docker-image.yml
2024-08-26 16:18:10 -04:00
Ben Gitter
045bc98333
Remove rouge print statements within chat_completion.py (#174)
* rouge prompt print

* remove print pt2

* Print Removal Final
2024-08-23 21:28:37 -04:00
turboderp
fe3253f3a9 Model: Account for tokenizer lazy init 2024-08-23 23:51:53 +02:00
turboderp
a676c4bf38 Model: Formatting 2024-08-23 11:15:30 +02:00
turboderp
a3733caeda Model: Fix draft model cache initialization 2024-08-23 11:08:49 +02:00
kingbri
364032e39e Config: Remove developement flag from tensor parallel
Exists in stable ExllamaV2 version.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-22 14:15:19 -04:00
kingbri
565b0300d6 Dependencies: Update Exllamav2
v0.1.9

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-22 14:15:19 -04:00
kingbri
078fbf1080 Model: Add quantized cache support for tensor parallel
Newer versions of exl2 v1.9-dev have quantized cache implemented. Add
those APIs.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-22 14:15:19 -04:00
kingbri
871c89063d Model: Add Tensor Parallel support
Use the tensor parallel loader when the flag is enabled. The new loader
has its own autosplit implementation, so gpu_split_auto isn't valid
here.

Also make it easier to determine which cache type to use rather than
multiple if/else statements.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-22 14:15:19 -04:00
kingbri
5002617eac Model: Split cache creation into a common function
Unifies the switch statement across both draft and model caches.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-22 14:15:19 -04:00
kingbri
ecaddec48a Docker-compose: Add models to bind mounts
At least one bind mount is required in the volumes YAML block otherwise
the docker build fails. Models should be fine to default since it always
exists.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-19 22:07:53 -04:00
Amgad Hasan
dae394050e
Improve docker deployment configuration (#163) 2024-08-18 15:19:18 -04:00
kingbri
a51acb9db4 Templates: Switch to async jinja engine
This prevents any possible blocking of the event loop due to template
rendering.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-17 12:03:41 -04:00
kingbri
b4752c1e62 Templates: Revert to load metadata on runtime
Metadata is generated via a template's module. This requires a single
iteration through the template. If a template tries to access a passed
variable that doesn't exist, it will error.

Therefore, generate the metadata at runtime to prevent these errors
from happening. To optimize further, cache the metadata after the
first generation to prevent the expensive call of making a template
module.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-17 11:44:42 -04:00
kingbri
617ac12150 Update README
Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-17 00:35:42 -04:00
Ben Gitter
70b9fc95de
[WIP] OpenAI Tools Support/Function calling (#154)
* returning stop str if exists from gen

* added chat template for firefunctionv2

* pulling tool vars from template

* adding parsing for tool inputs/outputs

* passing tool data from endpoint to chat template, adding tool_start to the stop list

* loosened typing on the response tool call, leaning more on the user supplying a quality schema if they want a particular format

* non streaming generation prototype

* cleaning template

* Continued work with type, ingestion into template, and chat template for fire func

* Correction - streaming toolcall comes back as delta obj not inside chatcomprespchoice per chat_completion_chunk.py inside OAI lib.

* Ruff Formating

* Moved stop string and tool updates out of prompt creation func

Updated tool pydantic to match OAI

Support for streaming

Updated generate tool calls to use flag within chat_template and insert tool reminder

* Llama 3.1 chat templates

Updated fire func template

* renamed llama3.1 to chatml_with_headers..

* update name of template

* Support for calling a tool start token rather than the string.

Simplified tool_params

Warning when gen_settings are being overidden becuase user set temp to 0

Corrected schema and tools to correct types for function args. Str for some reason

* draft groq tool use model template

* changed headers to vars for readablity (but mostly because some models are weird about newlines after headers, so this is an easier way to change globally)

* Clean up comments and code in chat comp

* Post processed tool call to meet OAI spec rather than forcing model to write json in a string in the middle of the call.

* changes example back to args as json rather than string of json

* Standardize chat templates to each other

* cleaning/rewording

* stop elements can also be ints (tokens)

* Cleaning/formatting

* added special tokens for tools and tool_response as specified in description

* Cleaning

* removing aux templates - going to live in llm-promp-templates repo instead

* Tree: Format

Signed-off-by: kingbri <bdashore3@proton.me>

* Chat Completions: Don't include internal tool variables in OpenAPI

Use SkipJsonSchema to supress inclusion with the OpenAPI JSON. The
location of these variables may need to be changed in the future.

Signed-off-by: kingbri <bdashore3@proton.me>

* Templates: Deserialize metadata on template load

Since we're only looking for specific template variables that are
static in the template, it makes more sense to render when the template
is initialized.

Signed-off-by: kingbri <bdashore3@proton.me>

* Tools: Fix comments

Adhere to the format style of comments in the rest of the project.

Signed-off-by: kingbri <bdashore3@proton.me>

---------

Co-authored-by: Ben Gitter <gitterbd@gmail.com>
Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-17 00:16:25 -04:00
Bartowski
c75e911f07
Merge branch 'main' into main 2024-08-14 16:16:15 -04:00