tabbyAPI-ollama/endpoints
DocShotgun 156b74f3f0
Revision to paged attention checks (#133)
* Model: Clean up paged attention checks

* Model: Move cache_size checks after paged attn checks
Cache size is only relevant in paged mode

* Model: Fix no_flash_attention

* Model: Remove no_flash_attention
Ability to use flash attention is auto-detected, so this flag is unneeded. Uninstall flash attention to disable it on supported hardware.
2024-06-09 17:28:11 +02:00
..
OAI Revision to paged attention checks (#133) 2024-06-09 17:28:11 +02:00
server.py API: Move OAI to APIRouter 2024-04-06 01:25:31 -04:00