Commit graph

1038 commits

Author SHA1 Message Date
Brian
fe44e4a524
Merge pull request #253 from randoentity/workaround-toolcall
workaround for tool calling
2024-11-28 23:30:00 -05:00
kingbri
2e06fb01d3 OAI: Pass mm_embeddings to tool call generation
Don't exclude the vision embeddings when regenerating for a tool call.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-11-28 23:27:59 -05:00
Brian
b81dcdaf66
Merge pull request #232 from AlpinDale/serviceinfo_uri
feat: add serviceinfo URI
2024-11-28 23:19:52 -05:00
kingbri
5fadaa728a API: Move serviceinfo to core
Best to expose this endpoint to all APIs as its an information endpoint.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-11-28 23:07:58 -05:00
lucy
ab1f4b7a6a
add draft_gpu_split option 2024-11-27 02:52:19 +01:00
DocShotgun
6f2dc2ea99 Grammar: Fix syntax, lint 2024-11-24 11:35:45 -08:00
DocShotgun
8f209efb99 Grammar: Clean up KBNF implementation
* Also remove empty cache clear function
2024-11-24 10:44:45 -08:00
randoentity
a52610fb19 workaround for tool calling 2024-11-24 13:40:33 +01:00
DocShotgun
a9f39bcff3 Grammar: Preliminary Formatron KBNF support 2024-11-23 12:05:41 -08:00
DocShotgun
0836a9317f Grammar: Initial Formatron regex and JSON schema implementation
* Replace LMFE's regex and JSON schema filters with Formatron's
* Remove Outlines EBNF filter in preparation for Formatron KBNF filter
* TODO: Implement Formatron KBNF filter
2024-11-23 10:27:37 -08:00
kingbri
aa4ccd03d4 Infinity: Use a runtime type hint for engine
Remove the antipattern of the conditional type for the Async engine
and use string-based type inference.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-11-22 18:06:08 -05:00
kingbri
242ff4f892 Dependencies: Fix OpenAPI generation
The vision module from the ExllamaV2 backend is used in files outside
the backends contained folder. Therefore, import ExllamaV2 as an
optional dependency here.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-11-22 17:59:20 -05:00
kingbri
9cd7fcaf99 Pyproject: Add pillow to deps
Signed-off-by: kingbri <bdashore3@proton.me>
2024-11-22 17:48:56 -05:00
Brian
9c8186c138
Merge pull request #249 from theroyallab/vision
Vision
2024-11-22 17:45:49 -05:00
kingbri
388d36e6bd OAI: Fix chat completion list parsing
The strings weren't being concatenated properly. Only add the combined
text if the chat completion type is a List.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-11-22 17:30:29 -05:00
kingbri
eadc71a4c3 Model: Add unload and error messages for vision
If vision is enabled and the model doesn't support it, send an
error asking the user to reload. Also, add a method to unload the
vision tower.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-11-22 14:25:03 -05:00
kingbri
c49047eea1 Model: Fix load packets
The model_type internal reference was changed to an enum for
a more extendable loading process. Return the current model type
when loading a new model.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-11-21 18:06:47 -05:00
kingbri
0ab393f09c Model: Set vision load to False by default
Mistake in unwrapping. Vision should be false to allow normal model
loading when the flag isn't provided.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-11-21 17:54:42 -05:00
kingbri
902045edbb API: Fix chat completion formatting flow
Previously, the flow for parsing chat completion messages and rendering
from the prompt template was disconnected between endpoints. Now, create
a common function to render and handle everything appropriately afterwards.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-11-21 17:51:14 -05:00
kingbri
c652a6e030 API: Transform multimodal into an actual class
Migrate the add method into the class itself. Also, a BaseModel isn't
needed here since this isn't a serialized class.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-11-20 00:06:20 -05:00
kingbri
8ffc636dce OAI: Strictly type chat completions
Previously, the messages were a list of dicts. These are untyped
and don't provide strict hinting. Add types for chat completion
messages and reformat existing code.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-11-19 23:18:18 -05:00
kingbri
0fadb1e5e8 Merge branch 'main' into vision 2024-11-19 21:19:21 -05:00
DocShotgun
731a345cfc OAI: Keep behavior consistent between chat completion and encode
* When vision is not enabled, only the first text block is kept in message.content if it is a list
2024-11-19 12:40:00 -08:00
DocShotgun
27d9af50a8 API: Report whether vision is enabled 2024-11-19 12:29:25 -08:00
DocShotgun
5611365c07 OAI: Allow /v1/encode endpoint to handle vision requests
* More robust checks for OAI chat completion message lists on /v1/encode endpoint
* Added TODO to support other aspects of chat completions
* Fix oversight where embeddings was not defined in advance on /v1/chat/completions endpoint
2024-11-19 11:14:37 -08:00
DocShotgun
c42655336b Config: Add option to disable fetching content from URLs 2024-11-17 23:05:17 -08:00
Brian
a69f86098a
Merge pull request #243 from DocShotgun/chunk-size-fix
Enforce chunk_size as multiple of 256
2024-11-18 00:40:36 -05:00
DocShotgun
dd41eec8a4 OAI: Initial vision support in OAI chat completions
* Support image_url inputs containing URLs or base64 strings following OAI vision spec
* Use async lru cache for image embeddings
* Add generic wrapper class for multimodal embeddings
2024-11-17 21:23:09 -08:00
kingbri
bd9e78e19e API: Add inline exception for dummy models
If an API key sends a dummy model, it shouldn't error as the server
is catering to clients that expect specific OAI model names. This
is a problem with inline model loading since these names would error
by default. Therefore, add an exception if the provided name is in the
dummy model names (which also doubles as inline strict exceptions).

However, the dummy model names weren't configurable, so add a new
option to specify exception names, otherwise the default is gpt-3.5-turbo.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-11-17 21:15:45 -05:00
DocShotgun
5fa298e601 Vision: Define basic utils for ExLlamaV2 vision 2024-11-16 23:25:22 -08:00
kingbri
b94c646210 Embeddings: Add string input as an option
Used in OAI's API

Signed-off-by: kingbri <bdashore3@proton.me>
2024-11-16 23:48:31 -05:00
kingbri
f9fffd42e0 OAI: Fix inline model loading errors when disabled
The admin key check was running even if inline loading was disabled.
Fix this bug, but also preserve the existing permission system when
inline loading is enabled.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-11-16 23:28:44 -05:00
Brian
dfc889952a
Merge pull request #244 from DocShotgun/draft-flash-attn-fix
Fix draft model non-FA2 fallback
2024-11-16 21:23:42 -05:00
DocShotgun
5bb46df3c3 Model: Fix draft model non-FA2 fallback 2024-11-15 21:04:25 -08:00
DocShotgun
37cc701137 Model: Enforce chunk_size as multiple of 256 2024-11-15 20:35:18 -08:00
kingbri
101ebd658a Docker: Add extras to dockerfile
Adds support for all features when pulling the image

Signed-off-by: kingbri <bdashore3@proton.me>
2024-11-15 18:16:48 -05:00
kingbri
69838e92ca Dependencies: Update ExllamaV2
v0.2.4

Signed-off-by: kingbri <bdashore3@proton.me>
2024-11-13 22:16:11 -05:00
kingbri
69ac0eb8aa Model: Add vision loading support
Adds the ability to load vision parts of text + image models. Requires
an explicit flag in config because there isn't a way to automatically
determine whether the vision tower should be used.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-11-11 12:10:11 -05:00
kingbri
cc2516790d Model: Add support for chat_template.json
HuggingFace separated the chat template in the newest transformers
versions.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-11-11 12:10:06 -05:00
kingbri
9530f8c8c7 Model: Add support for chat_template.json
HuggingFace separated the chat template in the newest transformers
versions.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-11-11 12:09:27 -05:00
AlpinDale
c9ff8ef2c2 upgrade to v0.2 2024-11-04 13:28:04 +00:00
AlpinDale
1c9bc2d1af feat: add serviceinfo URI 2024-11-04 12:35:08 +00:00
Brian Dashore
b8700fbbc3
Merge pull request #230 from DocShotgun/main
Remove override_base_seq_len
2024-11-02 12:24:18 -04:00
DocShotgun
603760cecb Model: Remove override_base_seq_len 2024-10-30 10:03:08 +08:00
TerminalMan
7d18d2e2ca
Refactor the sampling class (#199)
* improve validation

* remove to_gen_params functions

* update changes for all endpoint types

* OAI: Fix calls to generation

Chat completion and completion need to have prompt split out before
pushing to the backend.

Signed-off-by: kingbri <bdashore3@proton.me>

* Sampling: Convert Top-K values of -1 to 0

Some OAI implementations use -1 as disabled instead of 0. Therefore,
add a coalesce case.

Signed-off-by: kingbri <bdashore3@proton.me>

* Sampling: Format and space out

Make the code more readable.

Signed-off-by: kingbri <bdashore3@proton.me>

* Sampling: Fix mirostat

Field items are nested in data within a Pydantic FieldInfo

Signed-off-by: kingbri <bdashore3@proton.me>

* Sampling: Format

Signed-off-by: kingbri <bdashore3@proton.me>

* Sampling: Fix banned_tokens and allowed_tokens conversion

If the provided string has whitespace, trim it before splitting.

Signed-off-by: kingbri <bdashore3@proton.me>

* Sampling: Add helpful log to dry_sequence_breakers

Let the user know if the sequence errors out.

Signed-off-by: kingbri <bdashore3@proton.me>

* Sampling: Apply validators in right order

Validators need to be applied in order from top to bottom, this is why
the after validator was not being applied properly.

Set the model to validate default params for sampler override purposes.
This can be turned off if there are unclear errors.

Signed-off-by: kingbri <bdashore3@proton.me>

* Endpoints: Format

Cleanup and semantically fix field validators

Signed-off-by: kingbri <bdashore3@proton.me>

* Kobold: Update validators and fix parameter application

Validators on parent fields cannot see child fields. Therefore,
validate using the child fields instead and alter the parent field
data from there.

Also fix badwordsids casting.

Signed-off-by: kingbri <bdashore3@proton.me>

* Sampling: Remove validate defaults and fix mirostat

If a user sets an override to a non-default value, that's their
own fault.

Run validator on the actual mirostat_mode parameter rather than
the alternate mirostat parameter.

Signed-off-by: kingbri <bdashore3@proton.me>

* Kobold: Rework badwordsids

Currently, this serves to ban the EOS token. All other functionality
was legacy, so remove it.

Signed-off-by: kingbri <bdashore3@proton.me>

* Model: Remove HuggingfaceConfig

This was only necessary for badwordsids. All other fields are handled
by exl2. Keep the class as a stub if it's needed again.

Signed-off-by: kingbri <bdashore3@proton.me>

* Kobold: Bump kcpp impersonation

TabbyAPI supports XTC now.

Signed-off-by: kingbri <bdashore3@proton.me>

* Sampling: Change alias to validation_alias

Reduces the probability for errors and makes the class consistent.

Signed-off-by: kingbri <bdashore3@proton.me>

* OAI: Use constraints for validation

Instead of adding a model_validator, use greater than or equal to
constraints provided by Pydantic.

Signed-off-by: kingbri <bdashore3@proton.me>

* Tree: Lint

Signed-off-by: kingbri <bdashore3@proton.me>

---------

Co-authored-by: SecretiveShell <84923604+SecretiveShell@users.noreply.github.com>
Co-authored-by: kingbri <bdashore3@proton.me>
2024-10-27 11:43:41 -04:00
Brian Dashore
6e48bb420a
Model: Fix inline loading and draft key (#225)
* Model: Fix inline loading and draft key

There was a lack of foresight between the new config.yml and how
it was structured. The "draft" key became "draft_model" without updating
both the API request and inline loading keys.

For the API requests, still support "draft" as legacy, but the "draft_model"
key is preferred.

Signed-off-by: kingbri <bdashore3@proton.me>

* OAI: Add draft model dir to inline load

Was not pushed before and caused errors of the kwargs being None.

Signed-off-by: kingbri <bdashore3@proton.me>

* Model: Fix draft args application

Draft model args weren't applying since there was a reset due to how
the old override behavior worked.

Signed-off-by: kingbri <bdashore3@proton.me>

* OAI: Change embedding model load params

Use embedding_model_name to be inline with the config.

Signed-off-by: kingbri <bdashore3@proton.me>

* API: Fix parameter for draft model load

Alias name to draft_model_name.

Signed-off-by: kingbri <bdashore3@proton.me>

* API: Fix parameter for template switch

Add prompt_template_name to be more descriptive.

Signed-off-by: kingbri <bdashore3@proton.me>

* API: Fix parameter for model load

Alias name to model_name for config parity.

Signed-off-by: kingbri <bdashore3@proton.me>

* API: Add alias documentation

Signed-off-by: kingbri <bdashore3@proton.me>

---------

Signed-off-by: kingbri <bdashore3@proton.me>
2024-10-24 23:35:05 -04:00
kingbri
f20857cb34 Model: Fix override application
None values weren't being excluded on initial load when dumping.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-30 00:41:23 -04:00
kingbri
126a44483c Tree: Remove fasttensors
Now a noop in upstream.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-30 00:18:47 -04:00
kingbri
6726014d35 Dependencies: Update ExllamaV2
v0.2.3

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-30 00:17:12 -04:00
kingbri
56ce82ef77 Sampling: Add XTC support
Matches with upstream.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-09-24 18:10:52 -04:00