tabbyAPI-ollama

History

DocShotgun 156b74f3f0 Revision to paged attention checks (#133 ) * Model: Clean up paged attention checks * Model: Move cache_size checks after paged attn checks Cache size is only relevant in paged mode * Model: Fix no_flash_attention * Model: Remove no_flash_attention Ability to use flash attention is auto-detected, so this flag is unneeded. Uninstall flash attention to disable it on supported hardware.	2024-06-09 17:28:11 +02:00
..
OAI	Revision to paged attention checks (#133 )	2024-06-09 17:28:11 +02:00
server.py	API: Move OAI to APIRouter	2024-04-06 01:25:31 -04:00

Revision to paged attention checks (#133 )

* Model: Clean up paged attention checks

* Model: Move cache_size checks after paged attn checks
Cache size is only relevant in paged mode

* Model: Fix no_flash_attention

* Model: Remove no_flash_attention
Ability to use flash attention is auto-detected, so this flag is unneeded. Uninstall flash attention to disable it on supported hardware.

2024-06-09 17:28:11 +02:00

OAI

Revision to paged attention checks (#133 )

2024-06-09 17:28:11 +02:00

server.py

API: Move OAI to APIRouter

2024-04-06 01:25:31 -04:00