jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
kingbri	d339139fb6	Config: Deep merge model overrides Anything below the first level of kwargs was not being merged properly. A more bulletproof solution would be to refactor the loading code to separate draft and normal model parameters. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-07-03 12:17:09 -04:00
kingbri	3649d3bb51	Tree: Format + Lint Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-26 02:14:30 -04:00
Vhallo	1aefa01a68	Fix RoPE Ratio	2025-04-21 01:46:18 +02:00
kingbri	8e238fa8f6	Model: Move calculate_rope_alpha from backend Makes more sense to use as a utility function. Also clarify how the vars are set. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-20 18:20:19 -04:00
kingbri	027ffce05d	Utils: Remove unused defer utils These did not work anyways Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-20 17:59:09 -04:00
kingbri	11ed3cf5ee	Model: Cleanup logging and remove extraneous declarations Log the parameters passed into the generate gen function rather than the generation settings to reduce complexity. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-15 23:31:12 -04:00
TerminalMan	f4791e7ed9	Cleanup config file loader (#208 ) * fix config file loader * prune nonetype values from config dict fixes default values not initialising properly * Utils: Shrink None removal function It is more concise to use a list and dict collection if necessary rather than iterating through and checking each value. Tested and works with Tabby's cases. Signed-off-by: kingbri <bdashore3@proton.me> --------- Signed-off-by: kingbri <bdashore3@proton.me> Co-authored-by: kingbri <bdashore3@proton.me>	2024-09-23 21:42:01 -04:00
TerminalMan	948fcb7f5b	migrate to ruamel.yaml	2024-09-18 01:06:34 +01:00
kingbri	ebe7f3567e	Config: Alter migration error handling and cleanup Rollback to the old config if automigration fails. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-16 18:02:18 -04:00
kingbri	81ae461eb8	Config: Allow existing values to get included in generated file Allows for generation from an existing config file. Primarily used for migration purposes. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-16 12:19:58 -04:00
TerminalMan	564bdcf0a8	add legacy config converter	2024-09-16 14:12:47 +01:00
kingbri	a09dd802c2	Config: Cleanup and organize functions Remove access of private attributes and use safer functions. Also move generalized functions into utils files. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-14 21:48:39 -04:00
kingbri	93872b34d7	Config: Migrate to global class instead of dicts The config categories can have defined separation, but preserve the dynamic nature of adding new config options by making all the internal class vars as dictionaries. This was necessary since storing global callbacks stored a state of the previous global_config var that wasn't populated. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-04 23:18:47 -04:00
Jake	e772fa2981	Switch to internal dict merge implementation - remove deepmerge dependency - fix ruff formatting	2024-09-04 16:27:28 +01:00
kingbri	7522b1447b	Model: Add support for HuggingFace config and bad_words_ids This is necessary for Kobold's API. Current models use bad_words_ids in generation_config.json, but for some reason, they're also present in the model's config.json. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 18:23:22 -04:00
kingbri	b9fd8555fe	Sampling: Copy over iterable overrides If an override was iterable, any modifications to the returned value would alter the reference to the global storage dict. Therefore, copy the structure if it's an iterable so any modification won't alter the original override. Also apply this for the function that checks for forced overrides. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-17 21:38:28 -04:00
kingbri	6dfcbbd813	Common: Migrate request utils to networking Helps organize the project better. Utils is meant to be for simple functions like unwrap. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-21 23:21:57 -04:00
kingbri	2961c5f3f9	API: Handle request disconnect on non-streaming gens Works the same way as streaming gens. If the request is cancelled, it will log an error to the user and release the semaphore if it's holding anything. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-21 23:12:59 -04:00
kingbri	7fded4f183	Tree: Switch to async generators Async generation helps remove many roadblocks to managing tasks using threads. It should allow for abortables and modern-day paradigms. NOTE: Exllamav2 itself is not an asynchronous library. It's just been added into tabby's async nature to allow for a fast and concurrent API server. It's still being debated to run stream_ex in a separate thread or manually manage it using asyncio.sleep(0) Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-16 23:23:31 -04:00
kingbri	894be4a818	Startup: Check if the port is available and fallback Similar to Gradio, fall back to port + 1 if the config port isn't bindable. If both ports aren't available, let the user know and exit. An infinite loop of finding a port isn't advisable. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-11 21:57:28 -04:00
kingbri	d45e847c7a	API: Fix disconnect handling on streaming responses Starlette's StreamingResponse has an issue where it yields after a request has disconnected. A bugfix to starlette will fix this issue, but FastAPI uses starlette <= 0.36 which isn't ideal. Therefore, switch back to sse-starlette which handles these disconnects correctly. Also don't try yielding after the request is disconnected. Just return out of the generator instead. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-10 17:43:13 -04:00
kingbri	228c227c1e	Logging: Switch to loguru Loguru is a flexible logger that allows for easier hooking and imports into Rich with no problems. Also makes progress bars stick to the bottom of the terminal window. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-08 01:00:48 -05:00
kingbri	fe0ff240e7	Progress: Switch to Rich Rich is a more mature library for displaying progress bars, logging, and console output. This should help properly align progress bars within the terminal. Side note: "We're Rich!" Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-08 01:00:48 -05:00
kingbri	0b25c208d6	API: Fix error reporting Make a disconnect on load error consistently. It should be safer to warn the user to run unload (or re-run load) if a model does not load correctly. Also don't log the traceback for request errors that don't have one. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-05 18:16:02 -05:00
kingbri	b0c295dd2f	API: Add more methods to semaphore The semaphore/queue model for Tabby is as follows: - Any load requests go through the semaphore by default - Any load request can include the skip_queue parameter to bypass the semaphore - Any unload requests are immediately executed - All completion requests are placed inside the semaphore by default This model preserves the parallelism of single-user mode with extra convenience methods for queues in multi-user. It also helps mitigate problems that were previously present in the concurrency stack. Also change how the program's loop runs so it exits when the API thread dies. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-04 23:21:40 -05:00
kingbri	c82697fef2	API: Fix issues with concurrent requests and queueing This is the first in many future commits that will overhaul the API to be more robust and concurrent. The model is admin-first where the admin can do anything in-case something goes awry. Previously, calls to long running synchronous background tasks would block the entire API, making it ignore any terminal signals until generation is completed. To fix this, levrage FastAPI's run_in_threadpool to offload the long running tasks to another thread. However, signals to abort the process still kept the background thread running and made the terminal hang. This was due to an issue with Uvicorn not propegating the SIGINT signal across threads in its event loop. To fix this in a catch-all way, run the API processes in a separate thread so the main thread can still kill the process if needed. In addition, make request error logging more robust and refer to the console for full error logs rather than creating a long message on the client-side. Finally, add state checks to see if a model is fully loaded before generating a completion. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-04 23:21:40 -05:00
kingbri	b827bcbb44	Sampling: Cleanup and update Cleanup how overrides are handled, class naming, and adopt exllamav2's model class to enforce latest stable version methods rather than adding multiple backwards compatability checks. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-02 23:36:17 -05:00
kingbri	78f920eeda	Tree: Refactor code organization Move common functions into their own folder and refactor the backends to use their own folder as well. Also cleanup imports and alphabetize import statments themselves. Finally, move colab and docker into their own folders as well. Signed-off-by: kingbri <bdashore3@proton.me>	2024-01-25 00:15:40 -05:00

28 commits