Templates: Switch to Jinja2
Jinja2 is a lightweight template parser that's used in Transformers for parsing chat completions. It's much more efficient than Fastchat and can be imported as part of requirements. Also allows for unblocking Pydantic's version. Users now have to provide their own template if needed. A separate repo may be usable for common prompt template storage. Signed-off-by: kingbri <bdashore3@proton.me>
This commit is contained in:
parent
95fd0f075e
commit
f631dd6ff7
14 changed files with 115 additions and 74 deletions
|
|
@ -54,8 +54,6 @@ NOTE: For Flash Attention 2 to work on Windows, CUDA 12.x **must** be installed!
|
|||
|
||||
3. ROCm 5.6: `pip install -r requirements-amd.txt`
|
||||
|
||||
5. If you want the `/v1/chat/completions` endpoint to work with a list of messages, install fastchat by running `pip install fschat[model_worker]`
|
||||
|
||||
## Configuration
|
||||
|
||||
A config.yml file is required for overriding project defaults. If you are okay with the defaults, you don't need a config file!
|
||||
|
|
@ -126,6 +124,12 @@ All routes require an API key except for the following which require an **admin*
|
|||
|
||||
- `/v1/model/unload`
|
||||
|
||||
## Chat Completions
|
||||
|
||||
`/v1/chat/completions` now uses Jinja2 for templating. Please read [Huggingface's documentation](https://huggingface.co/docs/transformers/main/chat_templating) for more information of how chat templates work.
|
||||
|
||||
Also make sure to set the template name in `config.yml` to the template's filename.
|
||||
|
||||
## Common Issues
|
||||
|
||||
- AMD cards will error out with flash attention installed, even if the config option is set to False. Run `pip uninstall flash_attn` to remove the wheel from your system.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue