Revision to paged attention checks (#133)

* Model: Clean up paged attention checks

* Model: Move cache_size checks after paged attn checks
Cache size is only relevant in paged mode

* Model: Fix no_flash_attention

* Model: Remove no_flash_attention
Ability to use flash attention is auto-detected, so this flag is unneeded. Uninstall flash attention to disable it on supported hardware.
This commit is contained in:
DocShotgun 2024-06-09 08:28:11 -07:00 committed by GitHub
parent 55d979b7a5
commit 156b74f3f0
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
3 changed files with 99 additions and 94 deletions

View file

@ -94,7 +94,6 @@ class ModelLoadRequest(BaseModel):
default=None,
examples=[1.0],
)
no_flash_attention: Optional[bool] = False
# low_mem: Optional[bool] = False
cache_mode: Optional[str] = "FP16"
chunk_size: Optional[int] = 2048