tabbyAPI-ollama/endpoints
kingbri 408c66a1f2 Model: Change FA2 and paged attention checks
The dynamic generator requires Flash attention 2.5.7 or higher to
be installed. This is only supported on Nvidia's 30 series and higher.

If a card is AMD or lower than the 30 series, switch to compatability
mode which functions the same way as the older generator, except
without parallel batching and any features that depend on it, such as
CFG.

Signed-off-by: kingbri <bdashore3@proton.me>
2024-05-25 21:16:14 -04:00
..
OAI Model: Change FA2 and paged attention checks 2024-05-25 21:16:14 -04:00
server.py API: Move OAI to APIRouter 2024-04-06 01:25:31 -04:00