jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
kingbri	f9f8c97c6d	Templates: Fix stop_string parsing Template modules grab all set vars, including ones that use runtime vars. If a template var is set to a runtime var and a module is created, an UndefinedError fires. Use make_module instead to pass runtime vars when creating a template module. Resolves #92 Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-02 00:44:04 -04:00
AlpinDale	1650e6e640	ruff	2024-04-01 23:11:30 +00:00
AlpinDale	5e599ddbd4	typo	2024-04-01 23:08:28 +00:00
AlpinDale	6c4a1a9c70	make log level a global var	2024-04-01 23:07:30 +00:00
AlpinDale	031349133b	properly order imports	2024-04-01 23:03:16 +00:00
AlpinDale	e90ead3b35	chore: make log level configurable via env variable	2024-04-01 22:57:56 +00:00
kingbri	d716527b92	Sampling: Add additive param to overrides Additive is used to add collections together. Currently, it's used for lists, but it can be used for dictionaries in the future. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-31 01:10:55 -04:00
kingbri	dc456f4cc2	Templates: Add stop_strings meta param Adding the stop_strings var to chat templates will allow for the template creator to specify stopping strings to add onto chat completions. Thes get appended with existing stopping strings that are passed in the API request. However, a sampler override with force: true will override all stopping strings. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-27 22:22:07 -04:00
kingbri	6dfcbbd813	Common: Migrate request utils to networking Helps organize the project better. Utils is meant to be for simple functions like unwrap. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-21 23:21:57 -04:00
kingbri	2961c5f3f9	API: Handle request disconnect on non-streaming gens Works the same way as streaming gens. If the request is cancelled, it will log an error to the user and release the semaphore if it's holding anything. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-21 23:12:59 -04:00
kingbri	09a4c79847	Model: Auto-scale max_tokens by default If max_tokens is None, it automatically scales to fill up the context. This does not mean the generation will fill up that context since EOS stops also exist. Originally suggested by #86 Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-18 22:54:59 -04:00
kingbri	25f5d4a690	API: Cleanup permission endpoint Don't return an OAI specific type from a common file. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-18 15:13:26 -04:00
kingbri	3c08f46c51	Endpoints: Add key permission checker This is a definite way to check if an authorized key is API or admin. The endpoint only runs if the key is valid in the first place to keep inline with the API's security model. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-18 00:53:27 -04:00
kingbri	14d8ec2007	Signal: Fix signal handlers for uvicorn Add the ability to override uvicorn's signal handler in addition to using main's signal handler for any SIGINTs before the API server starts. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-16 23:23:31 -04:00
kingbri	95e44c20d6	Model: Fix load if model didn't load properly If the model didn't load properly, the container still exists until unload is called. However, the name check still registered as true. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-16 23:23:31 -04:00
kingbri	2755fd1af0	API: Fix blocking iterator execution Run these iterators on the background thread. On startup, the API spawns a background thread as needed to run sync code on without blocking the event loop. Use asyncio's run_thread function since it allows for errors to be propegated. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-16 23:23:31 -04:00
kingbri	7fded4f183	Tree: Switch to async generators Async generation helps remove many roadblocks to managing tasks using threads. It should allow for abortables and modern-day paradigms. NOTE: Exllamav2 itself is not an asynchronous library. It's just been added into tabby's async nature to allow for a fast and concurrent API server. It's still being debated to run stream_ex in a separate thread or manually manage it using asyncio.sleep(0) Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-16 23:23:31 -04:00
kingbri	7006fa4cc8	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-13 23:33:18 -04:00
kingbri	efc01d947b	API + Model: Add speculative ngram decoding Speculative ngram decoding is like speculative decoding without the draft model. It's not as useful because it only decodes on predictable sequences, but it depends on the usecase. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-13 23:32:11 -04:00
kingbri	2ebefe8258	Logging: Move metrics to gen logging This didn't have a place in the generation function. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-13 23:13:55 -04:00
kingbri	1ec8eb9620	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-13 00:02:55 -04:00
kingbri	6f03be9523	API: Split functions into their own files Previously, generation function were bundled with the request function causing the overall code structure and API to look ugly and unreadable. Split these up and cleanup a lot of the methods that were previously overlooked in the API itself. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-12 23:59:30 -04:00
kingbri	5a2de30066	Tree: Update to cleanup globals Use the module singleton pattern to share global state. This can also be a modified version of the Global Object Pattern. The main reason this pattern is used is for ease of use when handling global state rather than adding extra dependencies for a DI parameter. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-12 23:59:30 -04:00
kingbri	b373b25235	API: Move to ModelManager This is a shared module which manages the model container and provides extra utility functions around it to help slim down the API. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-12 23:59:30 -04:00
kingbri	894be4a818	Startup: Check if the port is available and fallback Similar to Gradio, fall back to port + 1 if the config port isn't bindable. If both ports aren't available, let the user know and exit. An infinite loop of finding a port isn't advisable. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-11 21:57:28 -04:00
kingbri	ba3da6d92f	Logging: Escape rich markup sequences Rich markup sequences inside the log string were causing issues with printing. Fix this by using their escape function. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-11 00:28:48 -04:00
kingbri	42c0dbe795	Generation: Explicitly release semaphore on disconnect This prevents any lockups when querying another request. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-10 17:54:48 -04:00
kingbri	045262f51f	Logging: Loglevel INFO This is the max that Tabby should log because debug and trace aren't used within the application. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-10 17:44:19 -04:00
kingbri	d45e847c7a	API: Fix disconnect handling on streaming responses Starlette's StreamingResponse has an issue where it yields after a request has disconnected. A bugfix to starlette will fix this issue, but FastAPI uses starlette <= 0.36 which isn't ideal. Therefore, switch back to sse-starlette which handles these disconnects correctly. Also don't try yielding after the request is disconnected. Just return out of the generator instead. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-10 17:43:13 -04:00
kingbri	6b4f100db2	Logger: Escape tags Angle brackets should be escaped to avoid mistaken color formatting. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-10 01:24:50 -05:00
kingbri	c77259bfbb	Logger: Fix reformatting of message Use the reformatted message when splitting lines instead of the raw message to prevent exceptions. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-09 15:40:37 -05:00
kingbri	4d09226364	Logging: Fix Uvicorn hook The Uvicorn logging config wasn't being set. Fix that when creating a new server. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-08 17:56:48 -05:00
kingbri	c9b4b7c509	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-08 01:00:48 -05:00
kingbri	ef2dc326f5	Logging: Fix inconsistent formatting Some colorization was incorrect and the separator insertion has become more robust. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-08 01:00:48 -05:00
kingbri	228c227c1e	Logging: Switch to loguru Loguru is a flexible logger that allows for easier hooking and imports into Rich with no problems. Also makes progress bars stick to the bottom of the terminal window. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-08 01:00:48 -05:00
kingbri	fe0ff240e7	Progress: Switch to Rich Rich is a more mature library for displaying progress bars, logging, and console output. This should help properly align progress bars within the terminal. Side note: "We're Rich!" Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-08 01:00:48 -05:00
kingbri	0b25c208d6	API: Fix error reporting Make a disconnect on load error consistently. It should be safer to warn the user to run unload (or re-run load) if a model does not load correctly. Also don't log the traceback for request errors that don't have one. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-05 18:16:02 -05:00
kingbri	d2c6ae2d35	API: Back to async According to FastAPI docs, if you're using a generic function, running it in async will make it more performant (which makes sense since running def functions for routes will automatically run the caller through a threadpool). Tested and everything works fine. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-04 23:21:40 -05:00
kingbri	b0c295dd2f	API: Add more methods to semaphore The semaphore/queue model for Tabby is as follows: - Any load requests go through the semaphore by default - Any load request can include the skip_queue parameter to bypass the semaphore - Any unload requests are immediately executed - All completion requests are placed inside the semaphore by default This model preserves the parallelism of single-user mode with extra convenience methods for queues in multi-user. It also helps mitigate problems that were previously present in the concurrency stack. Also change how the program's loop runs so it exits when the API thread dies. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-04 23:21:40 -05:00
kingbri	c82697fef2	API: Fix issues with concurrent requests and queueing This is the first in many future commits that will overhaul the API to be more robust and concurrent. The model is admin-first where the admin can do anything in-case something goes awry. Previously, calls to long running synchronous background tasks would block the entire API, making it ignore any terminal signals until generation is completed. To fix this, levrage FastAPI's run_in_threadpool to offload the long running tasks to another thread. However, signals to abort the process still kept the background thread running and made the terminal hang. This was due to an issue with Uvicorn not propegating the SIGINT signal across threads in its event loop. To fix this in a catch-all way, run the API processes in a separate thread so the main thread can still kill the process if needed. In addition, make request error logging more robust and refer to the console for full error logs rather than creating a long message on the client-side. Finally, add state checks to see if a model is fully loaded before generating a completion. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-04 23:21:40 -05:00
kingbri	f6d749c771	Model: Add EBNF grammar support Using the Outlines library, add support to supply EBNF strings and pass them to the library for parsing. From there, a wrapper is created and a filter is passed to generation. Replace with an in-house solution at some point that's more flexible. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-24 23:40:11 -05:00
kingbri	57b3d69949	API + Model: Add support for JSON schema constraints Add the ability to constrain the return value of a model to be JSON. Built using the JSON schema standard to define the properties of what the model should return. This feature should be more accurate than using GBNF/EBNF to yield the same results due to the use of lmformatenforcer. GBNF/EBNF will be added in a different commit/branch. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-24 23:40:11 -05:00
kingbri	7def32e4de	Model: Fix logit bias handling If the token doesn't exist, gracefully warn instead of erroring out. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-18 18:30:58 -05:00
kingbri	949248fb94	Config: Add experimental torch cuda malloc backend This option saves some VRAM, but does have the chance to error out. Add this in the experimental config section. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-14 21:45:56 -05:00
kingbri	a79c42ff4c	Sampling: Make validators simpler Injecting into Pydantic fields caused issues with serialization for documentation rendering. Rather than reinvent the wheel again, switch to a chain of if statements for now. This may change in the future if subclasses from the base sampler request need to be validated as well. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-11 15:28:43 -05:00
kingbri	7e730e3507	Sampling: Add universal validation system Rather than maintaining yet another function to validate sampler ranges/values, embed them in fields which allows for less maintainence in the future. Also add validation for existing samplers that can corrupt the sampling stack if set improperly. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-10 14:59:23 -05:00
kingbri	0af6a38af3	Model: Add logprobs support Returns token offsets, selected tokens, probabilities of tokens post-sampling, and normalized probability of selecting a token pre-sampling (for efficiency purposes). Only for text completions. Chat completions in a later commit. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-08 21:26:53 -05:00
AliCat	bb48f77ca1	Neutralize samplers (#59 ) * Update sample_preset.yml Neutralized the samplers. * Sampling: Fix dynatemp defaults Default max temp and min temp is 1.0 * Sampling: Fix TFS defaults Default is 1.0 --------- Co-authored-by: AliCat <86847834+alicat22@users.noreply.github.com> Co-authored-by: kingbri <bdashore3@proton.me>	2024-02-08 00:23:09 -05:00
kingbri	58590a6c57	Config: Add option to force streaming off Many APIs automatically ask for request streaming without giving the user the option to turn it off. Therefore, give the user more freedom by giving a server-side kill switch. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-07 21:09:59 -05:00
kingbri	f10a5cfee6	Auth: Create keys on different exception FileNotFoundError is the proper exception to catch here rather than OSError. Signed-off-by: kingbri <bdashore3@proton.me>	2024-02-04 01:56:42 -05:00

... 3 4 5 6 7

312 commits