jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
kingbri	a96fa5f138	API: Don't fallback to default values on model load request It's best to pass them down the config stack. API/User config.yml -> model config.yml -> model config.json -> fallback. Doing this allows for seamless flow and yielding control to each member in the stack. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-31 22:59:56 -04:00
kingbri	dd55b99af5	Model: Store directory paths Storing a pathlib type makes it easier to manipulate the model directory path in the long run without constantly fetching it from the config. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-31 22:59:56 -04:00
kingbri	21712578cf	API: Add allowed_tokens support This is the opposite of banned tokens. Exllama specific implementation of #181. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-29 21:44:42 -04:00
kingbri	871c89063d	Model: Add Tensor Parallel support Use the tensor parallel loader when the flag is enabled. The new loader has its own autosplit implementation, so gpu_split_auto isn't valid here. Also make it easier to determine which cache type to use rather than multiple if/else statements. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-22 14:15:19 -04:00
kingbri	a51acb9db4	Templates: Switch to async jinja engine This prevents any possible blocking of the event loop due to template rendering. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-17 12:03:41 -04:00
kingbri	b4752c1e62	Templates: Revert to load metadata on runtime Metadata is generated via a template's module. This requires a single iteration through the template. If a template tries to access a passed variable that doesn't exist, it will error. Therefore, generate the metadata at runtime to prevent these errors from happening. To optimize further, cache the metadata after the first generation to prevent the expensive call of making a template module. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-17 11:44:42 -04:00
Ben Gitter	70b9fc95de	[WIP] OpenAI Tools Support/Function calling (#154 ) * returning stop str if exists from gen * added chat template for firefunctionv2 * pulling tool vars from template * adding parsing for tool inputs/outputs * passing tool data from endpoint to chat template, adding tool_start to the stop list * loosened typing on the response tool call, leaning more on the user supplying a quality schema if they want a particular format * non streaming generation prototype * cleaning template * Continued work with type, ingestion into template, and chat template for fire func * Correction - streaming toolcall comes back as delta obj not inside chatcomprespchoice per chat_completion_chunk.py inside OAI lib. * Ruff Formating * Moved stop string and tool updates out of prompt creation func Updated tool pydantic to match OAI Support for streaming Updated generate tool calls to use flag within chat_template and insert tool reminder * Llama 3.1 chat templates Updated fire func template * renamed llama3.1 to chatml_with_headers.. * update name of template * Support for calling a tool start token rather than the string. Simplified tool_params Warning when gen_settings are being overidden becuase user set temp to 0 Corrected schema and tools to correct types for function args. Str for some reason * draft groq tool use model template * changed headers to vars for readablity (but mostly because some models are weird about newlines after headers, so this is an easier way to change globally) * Clean up comments and code in chat comp * Post processed tool call to meet OAI spec rather than forcing model to write json in a string in the middle of the call. * changes example back to args as json rather than string of json * Standardize chat templates to each other * cleaning/rewording * stop elements can also be ints (tokens) * Cleaning/formatting * added special tokens for tools and tool_response as specified in description * Cleaning * removing aux templates - going to live in llm-promp-templates repo instead * Tree: Format Signed-off-by: kingbri <bdashore3@proton.me> * Chat Completions: Don't include internal tool variables in OpenAPI Use SkipJsonSchema to supress inclusion with the OpenAPI JSON. The location of these variables may need to be changed in the future. Signed-off-by: kingbri <bdashore3@proton.me> * Templates: Deserialize metadata on template load Since we're only looking for specific template variables that are static in the template, it makes more sense to render when the template is initialized. Signed-off-by: kingbri <bdashore3@proton.me> * Tools: Fix comments Adhere to the format style of comments in the rest of the project. Signed-off-by: kingbri <bdashore3@proton.me> --------- Co-authored-by: Ben Gitter <gitterbd@gmail.com> Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-17 00:16:25 -04:00
kingbri	685e3836e9	Args: Add api-servers to parser Also run OpenAPI export after args/config are parsed. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-08 16:32:29 -04:00
kingbri	b6d2676f1c	Start: Give the user a hint when a module can't be imported If an ImportError or ModuleNotFoundError is raised, tell the user to run the update scripts. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 21:59:06 -04:00
kingbri	2a33ebbf29	Model: Bypass lock checks when shutting down Previously, when a SIGINT was emitted and a model load is running, the API didn't shut down until the load finished due to waitng for the lock. However, when shutting down, the lock doesn't matter since the process is being killed anyway. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 16:05:34 -04:00
kingbri	7bf2b07d4c	Signals: Exit on async cleanup The async signal exit function should be the internal for exiting the program. In addition, prevent the handler from being called twice by adding a boolean. May become an asyncio event later on. In addition, make sure to skip_wait when running model.unload. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-02 15:11:57 -04:00
kingbri	3e42211c3e	Config: Embeddings: Make embeddings_device a default when API loading When loading from the API, the fallback for embeddings_device will be the same as the config. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-01 13:59:49 -04:00
kingbri	0bcb4e4a7d	Model: Attach request ID to logs If multiple logs come in at once, track which log corresponds to which request. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-01 00:25:54 -04:00
Brian Dashore	1bf062559d	Merge pull request #158 from AlpinDale/embeddings feat: add embeddings support via Infinity-emb	2024-07-31 20:33:12 -04:00
kingbri	dc3dcc9c0d	Embeddings: Update config, args, and parameter names Use embeddings_device as the parameter for device to remove ambiguity. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 15:32:26 -04:00
kingbri	bfa011e0ce	Embeddings: Add model management Embedding models are managed on a separate backend, but are run in parallel with the model itself. Therefore, manage this in a separate container with separate routes. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 15:19:27 -04:00
kingbri	f13d0fb8b3	Embeddings: Add model load checks Same as the normal model container. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 11:17:36 -04:00
kingbri	01c7702859	Signal: Fix async signal handling Run unload async functions before exiting the program. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 11:11:05 -04:00
kingbri	fbf1455db1	Embeddings: Migrate and organize Infinity Use Infinity as a separate backend and handle the model within the common module. This separates out the embeddings model from the endpoint which allows for model loading/unloading in core. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 11:00:23 -04:00
kingbri	3f21d9ef96	Embeddings: Switch to Infinity Infinity-emb is an async batching engine for embeddings. This is preferable to sentence-transformers since it handles scalable usecases without the need for external thread intervention. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-29 13:42:03 -04:00
kingbri	e8fc13a1f6	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 18:33:04 -04:00
kingbri	ea80b62e30	Sampling: Reorder aliased params and add kobold aliases Also add dynatemp range which is an alternative way of calculating min and max temp. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 18:32:33 -04:00
kingbri	7522b1447b	Model: Add support for HuggingFace config and bad_words_ids This is necessary for Kobold's API. Current models use bad_words_ids in generation_config.json, but for some reason, they're also present in the model's config.json. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 18:23:22 -04:00
kingbri	545e26608f	Kobold: Move params to aliases Some of the parameters the API provides are aliases for their OAI equivalents. It makes more sense to move them to the common file. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 16:46:54 -04:00
kingbri	4e808cbed7	Auth: Fix disable auth when checking for key permissions Since authentication is disabled, remove the limited permissions for requests. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 15:04:29 -04:00
kingbri	5c082b7e8c	Async: Add option to use Uvloop/Winloop These are faster event loops for asyncio which should improve overall performance. Gate these under an experimental flag for now to stress test these loops. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-24 18:59:20 -04:00
kingbri	71de3060bb	Downloader: Make timeout configurable Add an API parameter to set the timeout in seconds. Keep it to None by default for uninterrupted downloads. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-23 21:42:38 -04:00
kingbri	8c02fe9771	Downloader: Disable timeout This prevents TimeoutErrors from showing up. However, a longer timeout may be necessary since this is in the API. Turning it off for now will help resolve immediate errors. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-23 21:38:46 -04:00
kingbri	64c2cc85c9	OAI: Migrate model depends into proper file Use amongst multiple routers. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-23 13:59:56 -04:00
kingbri	14dfaf600a	Args: Add request logging Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-22 21:41:42 -04:00
kingbri	3826815edb	API: Add request logging Log all the parts of a request if the config flag is set. The logged fields are all server side anyways, so nothing is being exposed to clients. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-22 21:40:00 -04:00
kingbri	522999ebb4	Config: Change from gen_logging to logging More accurately reflects the config.yml's sections. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-22 21:15:16 -04:00
kingbri	15f891b277	Args: Update to latest config.yml Fix order of params to follow the same flow as config.yml Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-22 16:26:41 -04:00
kingbri	0eedc8ca14	API: Switch from request ID middleware to depends Middleware runs on both the request and response. Therefore, streaming responses had increased latency when processing tasks and sending data to the client which resulted in erratic streaming behavior. Use a depends to add request IDs since it only executes when the request is run rather than expecting the response to be sent as well. For the future, it would be best to think about limiting the time between each tick of chunk data to be safe. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-22 12:19:46 -04:00
kingbri	cae94b920c	API: Add ability to use request IDs Identify which request is being processed to help users disambiguate which logs correspond to which request. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-21 21:01:05 -04:00
kingbri	38185a1ff4	Auth: Fix key check coalesce Prefer the auth-specific headers before the generic authorization header. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-19 10:08:57 -04:00
kingbri	e20a2d504b	API: Fix pydantic validation errors on disconnect poll returns Raise a 422 exception for the disconnect. This prevents pydantic errors when returning a "response" which doesn't contain anything in this case. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-15 14:41:49 -04:00
kingbri	6019c93637	Networking: Gate sending tracebacks over the API It's possible that tracebacks can give too much info about a system when sent over the API. Gate this under a flag to send them only when debugging since this feature is still useful. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-14 10:30:11 -04:00
kingbri	1f46a1130c	OAI: Restrict list permissions for API keys API keys are not allowed to view all the admin's models, templates, draft models, loras, etc. Basically anything that can be viewed on the filesystem outside of anything that's currently loaded is not allowed to be returned unless an admin key is present. This change helps preserve user privacy while not erroring out on list endpoints that the OAI spec requires. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-11 14:22:50 -04:00
kingbri	10890913b8	Auth: Revert x-admin-key allowance in API key check These kinda clash with each other. Use the correct header for the correct endpoint. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-11 14:22:50 -04:00
kingbri	b9a58ff01b	Auth: Make key permission check work on Requests Pass a request and internally unwrap the headers. In addition, allow X-admin-key to get checked in an API key request. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-11 14:22:49 -04:00
kingbri	c7ce97f119	Tree: Ruff lint Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 15:06:28 -04:00
kingbri	6613e38436	Main: Make openapi export store locally This runs faster than always making a syscall to check if the env var is set. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 14:54:06 -04:00
kingbri	ae66e8f9ba	Ruff: Lint Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 13:44:12 -04:00
kingbri	b907421285	Main: Fix launch if EXPORT_OPENAPI is unset A default needs to be provided with getenv. Fix that with an empty string. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 13:41:44 -04:00
kingbri	933268f7e2	API: Integrate OpenAPI export script Move OpenAPI export as an env var within the main function. This allows for easy export by running main. In addition, an env variable provides global and explicit state to disable conditional wheel imports (ex. Exl2 and torch) which caused errors at first. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-08 12:34:32 -04:00
kingbri	27d2d5f3d2	Config + Model: Allow for default fallbacks from config for model loads Previously, the parameters under the "model" block in config.yml only handled the loading of a model on startup. This meant that any subsequent API request required each parameter to be filled out or use a sane default (usually defaults to the model's config.json). However, there are cases where admins may want an argument from the config to apply if the parameter isn't provided in the request body. To help alleviate this, add a mechanism that works like sampler overrides where users can specify a flag that acts as a fallback. Therefore, this change both preserves the source of truth of what parameters the admin is loading and adds some convenience for users that want customizable defaults for their requests. This behavior may change in the future, but I think it solves the issue for now. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-06 17:50:58 -04:00
turboderp	0eb8fa5d1e	[fix] Bring draft progress and model progress in sync with model loader (#125 ) * Bring draft progress and model progress in sync with model loader * Fix formatting	2024-06-03 19:41:02 +02:00
DocShotgun	7084081b1f	Tree: Lint	2024-05-26 18:27:30 -07:00
DocShotgun	ce5e2ec8de	Logging: Clarify new vs cached tokens in prompt processing	2024-05-26 18:21:17 -07:00

... 2 3 4 5 6

291 commits