jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
Amgad Hasan	dae394050e	Improve docker deployment configuration (#163 )	2024-08-18 15:19:18 -04:00
kingbri	a51acb9db4	Templates: Switch to async jinja engine This prevents any possible blocking of the event loop due to template rendering. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-17 12:03:41 -04:00
kingbri	b4752c1e62	Templates: Revert to load metadata on runtime Metadata is generated via a template's module. This requires a single iteration through the template. If a template tries to access a passed variable that doesn't exist, it will error. Therefore, generate the metadata at runtime to prevent these errors from happening. To optimize further, cache the metadata after the first generation to prevent the expensive call of making a template module. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-17 11:44:42 -04:00
kingbri	617ac12150	Update README Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-17 00:35:42 -04:00
Ben Gitter	70b9fc95de	[WIP] OpenAI Tools Support/Function calling (#154 ) * returning stop str if exists from gen * added chat template for firefunctionv2 * pulling tool vars from template * adding parsing for tool inputs/outputs * passing tool data from endpoint to chat template, adding tool_start to the stop list * loosened typing on the response tool call, leaning more on the user supplying a quality schema if they want a particular format * non streaming generation prototype * cleaning template * Continued work with type, ingestion into template, and chat template for fire func * Correction - streaming toolcall comes back as delta obj not inside chatcomprespchoice per chat_completion_chunk.py inside OAI lib. * Ruff Formating * Moved stop string and tool updates out of prompt creation func Updated tool pydantic to match OAI Support for streaming Updated generate tool calls to use flag within chat_template and insert tool reminder * Llama 3.1 chat templates Updated fire func template * renamed llama3.1 to chatml_with_headers.. * update name of template * Support for calling a tool start token rather than the string. Simplified tool_params Warning when gen_settings are being overidden becuase user set temp to 0 Corrected schema and tools to correct types for function args. Str for some reason * draft groq tool use model template * changed headers to vars for readablity (but mostly because some models are weird about newlines after headers, so this is an easier way to change globally) * Clean up comments and code in chat comp * Post processed tool call to meet OAI spec rather than forcing model to write json in a string in the middle of the call. * changes example back to args as json rather than string of json * Standardize chat templates to each other * cleaning/rewording * stop elements can also be ints (tokens) * Cleaning/formatting * added special tokens for tools and tool_response as specified in description * Cleaning * removing aux templates - going to live in llm-promp-templates repo instead * Tree: Format Signed-off-by: kingbri <bdashore3@proton.me> * Chat Completions: Don't include internal tool variables in OpenAPI Use SkipJsonSchema to supress inclusion with the OpenAPI JSON. The location of these variables may need to be changed in the future. Signed-off-by: kingbri <bdashore3@proton.me> * Templates: Deserialize metadata on template load Since we're only looking for specific template variables that are static in the template, it makes more sense to render when the template is initialized. Signed-off-by: kingbri <bdashore3@proton.me> * Tools: Fix comments Adhere to the format style of comments in the rest of the project. Signed-off-by: kingbri <bdashore3@proton.me> --------- Co-authored-by: Ben Gitter <gitterbd@gmail.com> Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-17 00:16:25 -04:00
kingbri	9cc0e70098	Actions: Build kobold docs subpage Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-08 16:40:50 -04:00
kingbri	685e3836e9	Args: Add api-servers to parser Also run OpenAPI export after args/config are parsed. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-08 16:32:29 -04:00
kingbri	63650d2c3c	Model: Disable banned strings if grammar is used ExllamaV2 filters don't allow for rewinding which is what banned strings uses. Therefore, constrained generation via LMFE or outlines is not compatible for now. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-05 11:08:58 -04:00
kingbri	34281c2e14	Start: Add --force-reinstall argument Forces a reinstall of dependencies in the event that one is corrupted or broken. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-04 11:14:38 -04:00
kingbri	ab6c3a53b9	Start: Remove eager upgrade strategy This will upgrade second-level pinned dependencies to their latest versions which is not ideal. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-04 10:50:57 -04:00
kingbri	8ff2586d45	Start: Fix pip update, method calls, and logging platform.system() was not called in some places, breaking the ternary on Windows. Pip's --upgrade flag does not actually update dependencies to their latest versions. That's what the --upgrade-strategy eager flag is for. Tell the user where their start preferences are coming from. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-04 10:30:26 -04:00
kingbri	6a0cfd731b	Main: Only import psutil when the experimental function is run Experimental options shouldn't be imported at the top level until the testing period is over. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 22:00:15 -04:00
kingbri	b6d2676f1c	Start: Give the user a hint when a module can't be imported If an ImportError or ModuleNotFoundError is raised, tell the user to run the update scripts. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 21:59:06 -04:00
kingbri	1aa934664c	Issues: Update issue templates Use forms instead of markdown templates. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 21:59:02 -04:00
kingbri	87b6a31fad	Update .gitignore Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 20:59:28 -04:00
kingbri	4868fc6b10	Update README Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 20:58:26 -04:00
kingbri	5fb9cdc2b1	Dependencies: Add Python 3.12 specific dependencies Install a prebuilt fastparquet wheel for Windows and add setuptools since torch may require it for some reason. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 17:43:14 -04:00
kingbri	2a33ebbf29	Model: Bypass lock checks when shutting down Previously, when a SIGINT was emitted and a model load is running, the API didn't shut down until the load finished due to waitng for the lock. However, when shutting down, the lock doesn't matter since the process is being killed anyway. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 16:05:34 -04:00
Brian Dashore	65c16f2a7c	Merge pull request #161 from theroyallab/new-start-scripts Fix pip index bandwidth costs and update start scripts	2024-08-03 15:21:02 -04:00
kingbri	8703b23f89	Start: Make linux scripts executable Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 15:19:31 -04:00
kingbri	b795bfc7b2	Start: Split some prints up Newlines can be helpful at times. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 15:14:40 -04:00
kingbri	65e758e134	Tree: Format Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 15:08:24 -04:00
kingbri	7ce46cc2da	Start: Rewrite start scripts Start scripts now don't update dependencies by default due to mishandling caches from pip. Also add dedicated update scripts and save options to a JSON file instead of a text one. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 13:03:24 -04:00
kingbri	e66d213aef	Revert "Dependencies: Use hosted pip index instead of Github" This reverts commit `f111052e39`. This was a bad idea since the netlify server has limited bandwidth. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-03 11:35:26 -04:00
kingbri	7bf2b07d4c	Signals: Exit on async cleanup The async signal exit function should be the internal for exiting the program. In addition, prevent the handler from being called twice by adding a boolean. May become an asyncio event later on. In addition, make sure to skip_wait when running model.unload. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-02 15:11:57 -04:00
kingbri	b124797949	Dependencies: Re-add sentence-transformers This is actually required for infinity to load a model. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-02 14:35:58 -04:00
kingbri	56619810bf	Dependencies: Switch sentence-transformers to infinity-emb Leftover before the transition. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-02 13:34:47 -04:00
kingbri	3e42211c3e	Config: Embeddings: Make embeddings_device a default when API loading When loading from the API, the fallback for embeddings_device will be the same as the config. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-01 13:59:49 -04:00
kingbri	54aeebaec1	API: Fix return of current embeddings model Return a ModelCard instead of a ModelList. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-01 13:43:31 -04:00
kingbri	0bcb4e4a7d	Model: Attach request ID to logs If multiple logs come in at once, track which log corresponds to which request. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-01 00:25:54 -04:00
kingbri	9390d362dd	Model: Log generation params and metrics after the prompt/response A user's prompt and response can be large in the console. Therefore, always log the smaller payloads (ex. gen params + metrics) after the large chunks. However, it's recommended to keep prompt logging off anyways since it'll result in console spam. Signed-off-by: kingbri <bdashore3@proton.me>	2024-08-01 00:19:21 -04:00
Brian Dashore	1bf062559d	Merge pull request #158 from AlpinDale/embeddings feat: add embeddings support via Infinity-emb	2024-07-31 20:33:12 -04:00
kingbri	f111052e39	Dependencies: Use hosted pip index instead of Github Installing directly from github causes pip's HTTP cache to not recognize that the correct version of a package is already installed. This causes a redownload. When using the Start.bat script, it updates dependencies automatically to keep users on the latest versions of a package for security reasons. A simple pip cache website helps alleviate this problem and allows pip to find the cached wheels when invoked with an upgrade argument. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 20:46:37 -04:00
kingbri	46304ce875	Model: Properly pass in max_batch_size from config The override wasn't being passed in before. Also, the default is now none since Exl2 can automatically calculate the max batch size. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 18:42:25 -04:00
kingbri	dc3dcc9c0d	Embeddings: Update config, args, and parameter names Use embeddings_device as the parameter for device to remove ambiguity. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 15:32:26 -04:00
kingbri	bfa011e0ce	Embeddings: Add model management Embedding models are managed on a separate backend, but are run in parallel with the model itself. Therefore, manage this in a separate container with separate routes. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 15:19:27 -04:00
kingbri	f13d0fb8b3	Embeddings: Add model load checks Same as the normal model container. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 11:17:36 -04:00
kingbri	01c7702859	Signal: Fix async signal handling Run unload async functions before exiting the program. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 11:11:05 -04:00
kingbri	fbf1455db1	Embeddings: Migrate and organize Infinity Use Infinity as a separate backend and handle the model within the common module. This separates out the embeddings model from the endpoint which allows for model loading/unloading in core. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-30 11:00:23 -04:00
kingbri	ac1afcc588	Embeddings: Use response classes instead of dicts Follows the existing code style. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-29 14:15:40 -04:00
kingbri	3f21d9ef96	Embeddings: Switch to Infinity Infinity-emb is an async batching engine for embeddings. This is preferable to sentence-transformers since it handles scalable usecases without the need for external thread intervention. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-29 13:42:03 -04:00
kingbri	c9a5d2c363	OAI: Refactor embeddings Move files and rewrite routes to adhere to Tabby's code style. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-28 14:10:51 -04:00
kingbri	d85414738d	Dependencies: Update Flash Attention 2 v2.6.3 with torch 2.3 wheels. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-28 13:50:15 -04:00
kingbri	c79e0832d5	Revert "Dependencies: Update pytorch and flash_attention" This reverts commit `f47d96790c`. See https://github.com/pytorch/pytorch/issues/131662 for more information. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-28 13:49:04 -04:00
kingbri	7b8b3fe23d	Kobold: Fix max length type Was mistakenly a string instead of an integer. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 23:00:26 -04:00
kingbri	e3226ed930	Kobold: Add untracked file Model types weren't added. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 22:57:55 -04:00
kingbri	3038f668e8	Kobold: Add extra routes for horde compatability Needed to connect to horde. Also do some reordering to clean the router file up. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 22:55:54 -04:00
kingbri	2773517a16	API: Add setup function to routers This helps prepare the router before exposing it to the parent app. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 22:24:33 -04:00
Brian Dashore	6365427d38	Merge pull request #155 from Vhallo/main Simple Typo Fix	2024-07-26 21:35:50 -04:00
kingbri	884b6f5ecd	API: Add log options for initialization Make each API log their respective URLs to help inform users. Signed-off-by: kingbri <bdashore3@proton.me>	2024-07-26 21:32:05 -04:00

1 2 3 4 5 ...

641 commits