Commit graph

275 commits

Author SHA1 Message Date
TerminalMan
dc4946b565 make pydantic do all the validation 2024-09-13 10:21:27 +01:00
kingbri
d5b3fde319 Config: Fix descriptions
Appending lines also requires a space between each one otherwise
they'll squish together.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-12 22:43:30 -04:00
kingbri
21747bf9e4 Args: Switch to use model_field for everything
Pydantic provides these helpers. Better to use these instead of
the inspect lib.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-12 22:18:20 -04:00
TerminalMan
6e935c565e remove private attributes in args 2024-09-13 00:37:17 +01:00
TerminalMan
eb5f42c845 add error message for invalid use_as_default 2024-09-12 23:48:24 +01:00
TerminalMan
8b48f00271 fix model names 2024-09-12 17:00:07 +01:00
TerminalMan
05f1c3e293 fix line lengths 2024-09-11 21:43:30 +01:00
TerminalMan
c6f9806ec6 remove unused imports 2024-09-11 18:00:29 +01:00
TerminalMan
0d7459191c fix arg parser for dict types 2024-09-11 16:13:31 +01:00
TerminalMan
e8fcecd56a Merge remote-tracking branch 'upstream/main' into HEAD 2024-09-11 15:57:18 +01:00
kingbri
b9e5693c1b API + Model: Apply config.yml defaults for all load paths
There are two ways to load a model:
1. Via the load endpoint
2. Inline with a completion

The defaults were not applying on the inline load, so rewrite to fix
that. However, while doing this, set up a defaults dictionary rather
than comparing it at runtime and remove the pydantic default lambda
on all the model load fields.

This makes the code cleaner and establishes a clear config tree for
loading models.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-10 23:35:35 -04:00
kingbri
7baef05b49 Transformers Utils: Fix file read
Use asynchronous JSON reading

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-10 22:41:39 -04:00
kingbri
62beb2b1c8 Config: Fetch the correct dict for draft_model and lora
Fixed fetching from the merged config instead of the sub-config

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-10 21:30:53 -04:00
kingbri
5e8ff9a004 Tree: Fix classmethod usage
Instead of self, use cls which passes a type of the class.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-10 20:52:29 -04:00
kingbri
2c3bc71afa Tree: Switch to asynchronous file handling
Using aiofiles, there's no longer a possiblity of blocking file operations
that can hang up the event loop. In addition, partially migrate
classes to use asynchronous init instead of the normal python magic method.

The only exception is config, since that's handled in the synchonous
init before the event loop starts.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-10 16:45:14 -04:00
Ati Sharma
a370aeb15f
Fix tabby_config.py _from_file
Update tabby_config.py to fix issue #196
2024-09-09 09:19:12 +01:00
kingbri
df11890851 Templating: Add loopcontrols extension
Inbuilt jinja extension to allow for break and continue in loops.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-08 12:21:42 -04:00
kingbri
dffceab777 Sampling: Link dry_range
Was not linked in the gen params dict.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-08 01:55:52 -04:00
kingbri
acd3eb1140 Model: Add model folder template support
Like tabby_config.yml in the model's folder, a custom template can
also be provided via tabby_template.yml in addition to the existing
templates folder. The config.yml always takes priority.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-07 22:20:38 -04:00
kingbri
9c4a0e650f Sampling: Fix override for DRY sequence breakers
The common type should be an array of strings.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-07 21:38:50 -04:00
kingbri
4f5ca7a4c7 Sampling: Update overrides and params
Re-order to make more sense.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-07 12:48:59 -04:00
kingbri
ae37f3f332 Sampling: Update DRY
Switch to new parameters and remove dry_max_ngram as that's not supposed
to be changed.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-07 12:39:14 -04:00
kingbri
05c3f1194f Sampling: Add rudimentary DRY support
Adds DRY support based on the current exl2 dev API. Only change for
optimization is dry_max_ngram instead of using a closed range.

Currently, DRY range is aliased to dry_max_ngram.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-07 00:48:42 -04:00
TerminalMan
420fd84f6b add env var loading automation
- load config from env vars (eg. TABBY_NETWORK_HOST)
- remove print statements
- improve command line args automation
2024-09-06 15:05:48 +01:00
TerminalMan
8e9344642e patch pydantic config into old config
- convert pydantic to dict to avoid errors with current files
- fix formatting
2024-09-06 14:31:28 +01:00
Jake
36e991c16e automate arg parse
- generate arg parser dynamically
- remove legavy parser code
2024-09-06 00:27:53 +01:00
kingbri
1c9991f79e Config: Format and organize
Rename some methods and change comments.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-05 17:59:18 -04:00
Jake
362b8d5818 config is now backed by pydantic (WIP)
- add models for config options
- add function to regenerate config.yml
- replace references to config with pydantic compatible references
- remove unnecessary unwrap() statements

TODO:

- auto generate env vars
- auto generate argparse
- test loading a model
2024-09-05 18:04:56 +01:00
Jake
cb91670c7a fix command line args
- move to a complet class singleton to avoid propagation errors
- remove legacy load confing precedure
2024-09-05 15:33:00 +01:00
kingbri
93872b34d7 Config: Migrate to global class instead of dicts
The config categories can have defined separation, but preserve
the dynamic nature of adding new config options by making all the
internal class vars as dictionaries.

This was necessary since storing global callbacks stored a state
of the previous global_config var that wasn't populated.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-04 23:18:47 -04:00
Jake
e772fa2981 Switch to internal dict merge implementation
- remove deepmerge dependency
- fix ruff formatting
2024-09-04 16:27:28 +01:00
Jake
ac4d9bba1c refactor config functions
- improve DRY
2024-09-04 12:49:22 +01:00
Jake
fa6404a95a refactor config loading
- improve DRY
- alter logging
- allow extensibility
- add foundation for environment variables as config
2024-09-04 12:22:49 +01:00
kingbri
4aebe8a2a5 Config: Use an explicit "auto" value for rope_alpha
Using "auto" for rope alpha removes ambiguity on how to explicitly
enable automatic rope calculation. The same behavior of None -> auto
calculate still exists, but can be overwritten if a model's tabby_config.yml
includes `rope_alpha`.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-31 22:59:56 -04:00
kingbri
a96fa5f138 API: Don't fallback to default values on model load request
It's best to pass them down the config stack.

API/User config.yml -> model config.yml -> model config.json -> fallback.

Doing this allows for seamless flow and yielding control to each
member in the stack.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-31 22:59:56 -04:00
kingbri
dd55b99af5 Model: Store directory paths
Storing a pathlib type makes it easier to manipulate the model
directory path in the long run without constantly fetching it
from the config.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-31 22:59:56 -04:00
kingbri
21712578cf API: Add allowed_tokens support
This is the opposite of banned tokens. Exllama specific implementation
of #181.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-29 21:44:42 -04:00
kingbri
871c89063d Model: Add Tensor Parallel support
Use the tensor parallel loader when the flag is enabled. The new loader
has its own autosplit implementation, so gpu_split_auto isn't valid
here.

Also make it easier to determine which cache type to use rather than
multiple if/else statements.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-22 14:15:19 -04:00
kingbri
a51acb9db4 Templates: Switch to async jinja engine
This prevents any possible blocking of the event loop due to template
rendering.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-17 12:03:41 -04:00
kingbri
b4752c1e62 Templates: Revert to load metadata on runtime
Metadata is generated via a template's module. This requires a single
iteration through the template. If a template tries to access a passed
variable that doesn't exist, it will error.

Therefore, generate the metadata at runtime to prevent these errors
from happening. To optimize further, cache the metadata after the
first generation to prevent the expensive call of making a template
module.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-17 11:44:42 -04:00
Ben Gitter
70b9fc95de
[WIP] OpenAI Tools Support/Function calling (#154)
* returning stop str if exists from gen

* added chat template for firefunctionv2

* pulling tool vars from template

* adding parsing for tool inputs/outputs

* passing tool data from endpoint to chat template, adding tool_start to the stop list

* loosened typing on the response tool call, leaning more on the user supplying a quality schema if they want a particular format

* non streaming generation prototype

* cleaning template

* Continued work with type, ingestion into template, and chat template for fire func

* Correction - streaming toolcall comes back as delta obj not inside chatcomprespchoice per chat_completion_chunk.py inside OAI lib.

* Ruff Formating

* Moved stop string and tool updates out of prompt creation func

Updated tool pydantic to match OAI

Support for streaming

Updated generate tool calls to use flag within chat_template and insert tool reminder

* Llama 3.1 chat templates

Updated fire func template

* renamed llama3.1 to chatml_with_headers..

* update name of template

* Support for calling a tool start token rather than the string.

Simplified tool_params

Warning when gen_settings are being overidden becuase user set temp to 0

Corrected schema and tools to correct types for function args. Str for some reason

* draft groq tool use model template

* changed headers to vars for readablity (but mostly because some models are weird about newlines after headers, so this is an easier way to change globally)

* Clean up comments and code in chat comp

* Post processed tool call to meet OAI spec rather than forcing model to write json in a string in the middle of the call.

* changes example back to args as json rather than string of json

* Standardize chat templates to each other

* cleaning/rewording

* stop elements can also be ints (tokens)

* Cleaning/formatting

* added special tokens for tools and tool_response as specified in description

* Cleaning

* removing aux templates - going to live in llm-promp-templates repo instead

* Tree: Format

Signed-off-by: kingbri <bdashore3@proton.me>

* Chat Completions: Don't include internal tool variables in OpenAPI

Use SkipJsonSchema to supress inclusion with the OpenAPI JSON. The
location of these variables may need to be changed in the future.

Signed-off-by: kingbri <bdashore3@proton.me>

* Templates: Deserialize metadata on template load

Since we're only looking for specific template variables that are
static in the template, it makes more sense to render when the template
is initialized.

Signed-off-by: kingbri <bdashore3@proton.me>

* Tools: Fix comments

Adhere to the format style of comments in the rest of the project.

Signed-off-by: kingbri <bdashore3@proton.me>

---------

Co-authored-by: Ben Gitter <gitterbd@gmail.com>
Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-17 00:16:25 -04:00
kingbri
685e3836e9 Args: Add api-servers to parser
Also run OpenAPI export after args/config are parsed.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-08 16:32:29 -04:00
kingbri
b6d2676f1c Start: Give the user a hint when a module can't be imported
If an ImportError or ModuleNotFoundError is raised, tell the user
to run the update scripts.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-03 21:59:06 -04:00
kingbri
2a33ebbf29 Model: Bypass lock checks when shutting down
Previously, when a SIGINT was emitted and a model load is running,
the API didn't shut down until the load finished due to waitng for
the lock. However, when shutting down, the lock doesn't matter since
the process is being killed anyway.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-03 16:05:34 -04:00
kingbri
7bf2b07d4c Signals: Exit on async cleanup
The async signal exit function should be the internal for exiting
the program. In addition, prevent the handler from being called
twice by adding a boolean. May become an asyncio event later on.

In addition, make sure to skip_wait when running model.unload.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-02 15:11:57 -04:00
kingbri
3e42211c3e Config: Embeddings: Make embeddings_device a default when API loading
When loading from the API, the fallback for embeddings_device will be
the same as the config.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-01 13:59:49 -04:00
kingbri
0bcb4e4a7d Model: Attach request ID to logs
If multiple logs come in at once, track which log corresponds to
which request.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-08-01 00:25:54 -04:00
Brian Dashore
1bf062559d
Merge pull request #158 from AlpinDale/embeddings
feat: add embeddings support via Infinity-emb
2024-07-31 20:33:12 -04:00
kingbri
dc3dcc9c0d Embeddings: Update config, args, and parameter names
Use embeddings_device as the parameter for device to remove ambiguity.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-30 15:32:26 -04:00
kingbri
bfa011e0ce Embeddings: Add model management
Embedding models are managed on a separate backend, but are run
in parallel with the model itself. Therefore, manage this in a separate
container with separate routes.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-07-30 15:19:27 -04:00