jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
kingbri	2913ce29fc	API: Add timings to usage stats It's useful for the client to know what the T/s and total time for generation are per-request. Works with both completions and chat completions. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-06-17 22:54:51 -04:00
kingbri	3960612d38	API: Format and fix message naming Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-28 22:36:30 -04:00
kingbri	9157be3e34	API: Append task index to generations with n > 1 Since jobs are tracked via request IDs now, each generation task should be uniquely identified in the event of cancellation. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-28 22:29:48 -04:00
kingbri	f070587e9f	Model: Add proper jobs cleanup and fix var calls Jobs should be started and immediately cleaned up when calling the generation stream. Expose a stream_generate function and append this to the base class since it's more idiomatic than generate_gen. The exl2 container's generate_gen function is now internal. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-24 21:30:55 -04:00
kingbri	3084ef9fa1	Model + API: Migrate to use BaseSamplerParams kwargs is pretty ugly when figuring out which arguments to use. The base requests falls back to defaults anyways, so pass in the params object as is. However, since Python's typing isn't like TypeScript where types can be transformed, the type hinting has a possiblity of None showing up despite there always being a value for some params. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-04-16 00:50:05 -04:00
kingbri	bd9e78e19e	API: Add inline exception for dummy models If an API key sends a dummy model, it shouldn't error as the server is catering to clients that expect specific OAI model names. This is a problem with inline model loading since these names would error by default. Therefore, add an exception if the provided name is in the dummy model names (which also doubles as inline strict exceptions). However, the dummy model names weren't configurable, so add a new option to specify exception names, otherwise the default is gpt-3.5-turbo. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-17 21:15:45 -05:00
kingbri	f9fffd42e0	OAI: Fix inline model loading errors when disabled The admin key check was running even if inline loading was disabled. Fix this bug, but also preserve the existing permission system when inline loading is enabled. Signed-off-by: kingbri <bdashore3@proton.me>	2024-11-16 23:28:44 -05:00
TerminalMan	7d18d2e2ca	Refactor the sampling class (#199 ) * improve validation * remove to_gen_params functions * update changes for all endpoint types * OAI: Fix calls to generation Chat completion and completion need to have prompt split out before pushing to the backend. Signed-off-by: kingbri <bdashore3@proton.me> * Sampling: Convert Top-K values of -1 to 0 Some OAI implementations use -1 as disabled instead of 0. Therefore, add a coalesce case. Signed-off-by: kingbri <bdashore3@proton.me> * Sampling: Format and space out Make the code more readable. Signed-off-by: kingbri <bdashore3@proton.me> * Sampling: Fix mirostat Field items are nested in data within a Pydantic FieldInfo Signed-off-by: kingbri <bdashore3@proton.me> * Sampling: Format Signed-off-by: kingbri <bdashore3@proton.me> * Sampling: Fix banned_tokens and allowed_tokens conversion If the provided string has whitespace, trim it before splitting. Signed-off-by: kingbri <bdashore3@proton.me> * Sampling: Add helpful log to dry_sequence_breakers Let the user know if the sequence errors out. Signed-off-by: kingbri <bdashore3@proton.me> * Sampling: Apply validators in right order Validators need to be applied in order from top to bottom, this is why the after validator was not being applied properly. Set the model to validate default params for sampler override purposes. This can be turned off if there are unclear errors. Signed-off-by: kingbri <bdashore3@proton.me> * Endpoints: Format Cleanup and semantically fix field validators Signed-off-by: kingbri <bdashore3@proton.me> * Kobold: Update validators and fix parameter application Validators on parent fields cannot see child fields. Therefore, validate using the child fields instead and alter the parent field data from there. Also fix badwordsids casting. Signed-off-by: kingbri <bdashore3@proton.me> * Sampling: Remove validate defaults and fix mirostat If a user sets an override to a non-default value, that's their own fault. Run validator on the actual mirostat_mode parameter rather than the alternate mirostat parameter. Signed-off-by: kingbri <bdashore3@proton.me> * Kobold: Rework badwordsids Currently, this serves to ban the EOS token. All other functionality was legacy, so remove it. Signed-off-by: kingbri <bdashore3@proton.me> * Model: Remove HuggingfaceConfig This was only necessary for badwordsids. All other fields are handled by exl2. Keep the class as a stub if it's needed again. Signed-off-by: kingbri <bdashore3@proton.me> * Kobold: Bump kcpp impersonation TabbyAPI supports XTC now. Signed-off-by: kingbri <bdashore3@proton.me> * Sampling: Change alias to validation_alias Reduces the probability for errors and makes the class consistent. Signed-off-by: kingbri <bdashore3@proton.me> * OAI: Use constraints for validation Instead of adding a model_validator, use greater than or equal to constraints provided by Pydantic. Signed-off-by: kingbri <bdashore3@proton.me> * Tree: Lint Signed-off-by: kingbri <bdashore3@proton.me> --------- Co-authored-by: SecretiveShell <84923604+SecretiveShell@users.noreply.github.com> Co-authored-by: kingbri <bdashore3@proton.me>	2024-10-27 11:43:41 -04:00
Brian Dashore	6e48bb420a	Model: Fix inline loading and draft key (#225 ) * Model: Fix inline loading and draft key There was a lack of foresight between the new config.yml and how it was structured. The "draft" key became "draft_model" without updating both the API request and inline loading keys. For the API requests, still support "draft" as legacy, but the "draft_model" key is preferred. Signed-off-by: kingbri <bdashore3@proton.me> * OAI: Add draft model dir to inline load Was not pushed before and caused errors of the kwargs being None. Signed-off-by: kingbri <bdashore3@proton.me> * Model: Fix draft args application Draft model args weren't applying since there was a reset due to how the old override behavior worked. Signed-off-by: kingbri <bdashore3@proton.me> * OAI: Change embedding model load params Use embedding_model_name to be inline with the config. Signed-off-by: kingbri <bdashore3@proton.me> * API: Fix parameter for draft model load Alias name to draft_model_name. Signed-off-by: kingbri <bdashore3@proton.me> * API: Fix parameter for template switch Add prompt_template_name to be more descriptive. Signed-off-by: kingbri <bdashore3@proton.me> * API: Fix parameter for model load Alias name to model_name for config parity. Signed-off-by: kingbri <bdashore3@proton.me> * API: Add alias documentation Signed-off-by: kingbri <bdashore3@proton.me> --------- Signed-off-by: kingbri <bdashore3@proton.me>	2024-10-24 23:35:05 -04:00
kingbri	daa57ceada	API: Upgrade config declarations Some were using the old unwrap methods. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-17 00:42:39 -04:00
kingbri	e00eb09ef3	OAI: Add cancellation with inline load When the request is cancelled, cancel the load task. In addition, when checking if a model container exists, also check if the model is fully loaded. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-11 00:08:55 -04:00
Cohee	63476041d1	Properly specify config value in the error message	2024-09-08 22:02:49 +03:00
kingbri	2f45e978c5	API: Fix merge overwrite The completions utils did not take the new imports. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-05 18:04:53 -04:00
kingbri	9c10789ca1	API: Error on invalid key permissions and cleanup format If a user requesting a model change isn't admin, error. Better to place the load function before the generate functions. Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-04 21:44:14 -04:00
kingbri	21f14d4318	API: Update inline load - Add a config flag - Migrate support to /v1/completions - Unify the load function Signed-off-by: kingbri <bdashore3@proton.me>	2024-09-03 23:37:28 -04:00
Vhallo	b2064bbfb4	Typo fix in completion.py	2024-07-23 23:49:43 +02:00
kingbri	d1706fb067	OAI: Remove double logging if request is cancelled Uvicorn can log in both the request disconnect handler and the CancelledError. However, these sometimes don't work and both need to be checked. But, don't log twice if one works. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-22 21:48:59 -04:00
kingbri	cae94b920c	API: Add ability to use request IDs Identify which request is being processed to help users disambiguate which logs correspond to which request. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-21 21:01:05 -04:00
turboderp	1951f7521c	Forward exceptions from _stream_collector to stream_generate_(chat)_completion (#126 )	2024-06-03 19:42:45 +02:00
kingbri	e2a8b6e8ae	OAI: Add "n" support for streaming generations Use a queue-based system to get choices independently and send them in the overall streaming payload. This method allows for unordered streaming of generations. The system is a bit redundant, so maybe make the code more optimized in the future. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-28 00:52:30 -04:00
kingbri	c8371e0f50	OAI: Copy gen params for "n" For multiple generations in the same request, nested arrays kept their original reference, resulting in duplications. This will occur with any collection type. For optimization purposes, a deepcopy isn't run for the first iteration since original references are created. This is not the most elegant solution, but it works for the described cases. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-28 00:52:30 -04:00
kingbri	b944f8d756	OAI: Add "n" for non-streaming generations This adds the ability to add multiple choices to a generation. This is only available for non-streaming gens for now, it requires some more work to port over to streaming. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-28 00:52:30 -04:00
kingbri	d710a1b441	OAI: Switch to background task for disconnect checks Waiting for request disconnect takes some extra time and allows generation chunks to pile up, resulting in large payloads being sent at once not making up a smooth stream. Use the polling method in non-streaming requests by creating a background task and then check if the task is done, signifying that the request has been disconnected. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-26 13:52:20 -04:00
kingbri	660f9b8432	OAI: Fix request cancellation behavior Depending on the day of the week, Starlette can work with a CancelledError or using await request.is_disconnected(). Run the same behavior for both cases and allow cancellation. Streaming requests now set an event to cancel the batched job and break out of the generation loop. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-26 13:00:33 -04:00
kingbri	06ff47e2b4	Model: Use true async jobs and add logprobs The new async dynamic job allows for native async support without the need of threading. Also add logprobs and metrics back to responses. Signed-off-by: kingbri <bdashore3@proton.me>	2024-05-25 21:16:14 -04:00
kingbri	fb1d2f34c1	OAI: Add response_prefix and fix BOS token issues in chat completions response_prefix is used to add a prefix before generating the next message. This is used in many cases such as continuining a prompt (see #96). Also if a template has BOS token specified, add_bos_token will append two BOS tokens. Add a check which strips a starting BOS token from the prompt if it exists. Signed-off-by: kingbri <bdashore3@proton.me>	2024-04-25 00:54:43 -04:00
kingbri	6dfcbbd813	Common: Migrate request utils to networking Helps organize the project better. Utils is meant to be for simple functions like unwrap. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-21 23:21:57 -04:00
kingbri	07d9b7cf7b	Model: Add abort on generation When the model is processing a prompt, add the ability to abort on request cancellation. This is also a catch for a SIGINT. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-20 15:21:37 -04:00
kingbri	2704ff8344	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-18 16:02:29 -04:00
kingbri	5c7fc69ded	API: Fix finish_reason returns OAI expects finish_reason to be "stop" or "length" (there are others, but they're not in the current scope of this project). Make all completions and chat completions responses return this from the model generation itself rather than putting a placeholder. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-18 15:59:28 -04:00
kingbri	2755fd1af0	API: Fix blocking iterator execution Run these iterators on the background thread. On startup, the API spawns a background thread as needed to run sync code on without blocking the event loop. Use asyncio's run_thread function since it allows for errors to be propegated. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-16 23:23:31 -04:00
kingbri	7fded4f183	Tree: Switch to async generators Async generation helps remove many roadblocks to managing tasks using threads. It should allow for abortables and modern-day paradigms. NOTE: Exllamav2 itself is not an asynchronous library. It's just been added into tabby's async nature to allow for a fast and concurrent API server. It's still being debated to run stream_ex in a separate thread or manually manage it using asyncio.sleep(0) Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-16 23:23:31 -04:00
kingbri	1ec8eb9620	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-13 00:02:55 -04:00
kingbri	6f03be9523	API: Split functions into their own files Previously, generation function were bundled with the request function causing the overall code structure and API to look ugly and unreadable. Split these up and cleanup a lot of the methods that were previously overlooked in the API itself. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-12 23:59:30 -04:00
kingbri	104a6121cb	API: Split into separate folder Moving the API into its own directory helps compartmentalize it and allows for cleaning up the main file to just contain bootstrapping and the entry point. Signed-off-by: kingbri <bdashore3@proton.me>	2024-03-12 23:59:30 -04:00

35 commits