jalr/tabbyAPI-ollama

Author	SHA1	Message	Date
Andrew Phillips	436ce752da	Support more common tool variables in templates (tools, message.tool_calls) (#308 ) * Add non-JSON version of `tools` and `functions` to `template_vars`. Increase the compatibility with VLLM templates which use a non-JSON tools object. * Add list of tool template variables to the documentation * Use Jinja templates to provide `tools_json` and `functions_json` This should be functionally equivelant, but the JSON won't be produced unless it's needed. * Make message.tool_calls match the JSON from ToolCallProcessor * Log something when generating tool calls * Add template for Qwen QwQ 32b * Only log if tool calls have been detected * API: Fix tool call variable assignments Jinja functions do not run when variables are called. Use json.dumps instead. In addition, log the request ID when stating that a tool call was fired. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com> * Add `ToolCallProcessor.dump()` to get the list of processed dicts * Remove qwen_qwq_32b.jinja This will be added to the following repository at a later date: https://github.com/theroyallab/llm-prompt-templates --------- Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com> Co-authored-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-23 13:23:00 -04:00
David Allada	d31d17e5a2	Trigger ruff formatting	2025-03-23 17:04:09 +00:00
David Allada	bcd3413628	Try to fix ruff format	2025-03-23 17:02:52 +00:00
David Allada	0256d3b2a2	Fix the comment from 10MB to 20MB	2025-03-23 16:51:47 +00:00
David Allada	6750c291db	Add file based logging in addition to the normal console logs	2025-03-23 16:49:58 +00:00
kingbri	ccf23243c1	Docs: Update getting started with downloading from private repos Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-19 12:02:48 -04:00
kingbri	529c90b93e	Tree: Format and lint Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-19 11:55:02 -04:00
kingbri	d990bbc431	Args: Remove action arguments Superseded by subcommands to perform the same action. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-19 11:53:47 -04:00
kingbri	79f9c6e854	Model: Remove num_experts_per_token This shouldn't even be an exposed option since changing it always breaks inference with the model. Let the model's config.json handle it. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-19 11:52:10 -04:00
kingbri	698d8339cb	Config + Docs: Clarify YaRN rope scaling changes In ExllamaV2, if a model has YaRN support, linear RoPE options are not applied. Users can set max_seq_len and exl2 will take care of the rest. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-19 11:47:49 -04:00
Benjamin Oldenburg	a20abe2d33	Bugfix: Chat completion requests fail with UnboundLocalError: finish_reason variable not initialized (#307 ) * fix issue #306 * removed whitespaces for ruff	2025-03-15 20:31:21 -04:00
kingbri	d98c0bd3f6	API: Add tools class Was mistakenly not added in PR 302. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-14 15:07:11 -04:00
Brian	51b32621e1	Update README.md	2025-03-14 15:04:24 -04:00
Benjamin Oldenburg	a2a14ea114	Fix Tool Call JSON Serialization Error (#302 ) * Fix Tool Call JSON Serialization Error * Incorporate changes from PR 292 kingbri note: Adjusts the tool JSON formation and incorporates finish reasons. Added both authors as co-authors due to edits on this commit from the original PR. Co-Authored-by: David Allada <dallada1@vt.edu> Co-Authored-by: Benjamin Oldenburg <benjamin.oldenburg@ordis.co.th> Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com> * API: Cleanup tool call JSON parsing Split pre and post-processing of tool calls to its own class. This cleans up the chat_completion utility module and also fixes the JSON serialization bug. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com> --------- Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com> Co-authored-by: David Allada <dallada1@vt.edu> Co-authored-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-14 15:01:33 -04:00
kingbri	de77955428	Docs: Update Update getting started and server options Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-12 00:41:17 -04:00
David Allada	4196bb6bc8	Update the behavior of start.py so that we can do a full build AND sa… (#293 ) * Update the behavior of start.py so that we can do a full build AND save the options, so we can build in a docker image * Add actual args RIP * Start: Move start_options write before dependency install message This ensures that start options are properly written before determining to exit. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com> --------- Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com> Co-authored-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-03-11 23:54:34 -04:00
kingbri	73688670a6	Docs: Add model and inline loading documentation Sorely required due to the amount of questions about how does inline loading work. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-25 00:09:18 -05:00
kingbri	35fe372f2b	Embeddings: Handle case if embedding input is passed as a string Infinity expects a list when embedding, so convert to a list if the input is a string. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-23 00:39:21 -05:00
kingbri	c580893054	Downloader: log errors when downloading If an error is returned from HuggingFace, raise it to the calling function. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-19 23:16:17 -05:00
kingbri	48bb78c614	Logger: Switch to ISO timestamp formatting I thought this was previously enabled, but turns out I labeled with the wrong date format. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-19 21:48:23 -05:00
kingbri	d6b8c7db4b	Docs: Update getting started guide Add downloader options and edit some points. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-18 12:17:14 -05:00
kingbri	830301b2b4	Actions: Update and add Wiki publish Publishes the github wiki and runs these in concurrency groups to avoid spawning multiple actions at a time. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-17 23:47:38 -05:00
kingbri	5614b342a7	Tree: Migrate docs into repository This will auto-publish to the Github wiki via an action. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-17 23:39:35 -05:00
kingbri	9f649647f0	Model + API: GPU split updates and fixes For the TP loader, GPU split cannot be an empty array. However, defaulting the parameter to an empty array makes it easier to calculate the device list. Therefore, cast an empty array to None using falsy comparisons at load time. Also add draft_gpu_split to the load request. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-15 21:50:14 -05:00
Brian	304df16543	Update README.md	2025-02-15 12:14:06 -05:00
Brian	ba9fae808e	Merge pull request #281 from mefich/main Add strftime_now to Jinja2 to use with Granite3 models	2025-02-13 22:52:51 -05:00
kingbri	7f6294a96d	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-13 22:42:59 -05:00
mefich	3b482d80e4	Add strftime_now to Jijnja2 to use with Granite3 models Granite3 default template uses strftime_now function. Currently Jinja2 raises an exception because strftime_now is undefined and /v1/chat/completions endpoint doesn't work with these models when a template from the model metadata is used.	2025-02-13 18:08:24 +02:00
Brian	2e491472d1	Merge pull request #254 from lucyknada/main add draft_gpu_split option for spec decoding	2025-02-11 16:48:03 -05:00
kingbri	e290b88568	Args: Expose api-servers to subcommands This is required for the export-openapi action. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-10 23:39:46 -05:00
kingbri	153dac496c	Args: Fix imports and handling of export openapi The api-servers arg is passed when running subcommands, so use that instead of replicating the arg again. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-10 23:19:44 -05:00
kingbri	30ab8e04b9	Args: Add subcommands to run actions Migrate OpenAPI and sample config export to subcommands "export-openapi" and "export-config". Also add a "download" subcommand that passes args to the TabbyAPI downloader. This allows models to be downloaded via the API and CLI args. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-10 23:14:22 -05:00
kingbri	30f02e5453	Main: Remove uvloop/winloop from experimental status Uvloop/Winloop does provide advantages to asyncio vs the standard Proactor loop, so remove experimental status. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-10 21:30:48 -05:00
kingbri	0dcbb7a722	Dependencies: Update torch, exllamav2, and flash-attn Torch - 2.6.0 ExllamaV2 - 0.2.8 Flash-attn - 2.7.4.post1 Cuda wheels are now 12.4 instead of 12.1, feature names need to be migrated over. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-09 01:27:48 -05:00
kingbri	beb6d8faa5	Model: Adjust draft_gpu_split and add to config The previous code overrode the existing gpu split and device idx values. This now sets an independent draft_gpu_split value and adjusts the gpu_devices check only if the draft_gpu_split array is larger than the gpu_split array. Draft gpu split is not Tensor Parallel, and defaults to gpu_split_auto if a split is not provided. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-08 16:09:46 -05:00
kingbri	bd8256d168	Merge branch 'main' into draft-split	2025-02-08 15:10:44 -05:00
kingbri	dcbf2de9e5	Logger: Add timestamps Was against this for a while due to the length of timestamps clogging the console, but it makes sense to know when something goes wrong. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-07 18:40:28 -05:00
kingbri	54fda0dc09	Tree: Format Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-07 18:03:33 -05:00
kingbri	96e8375ec8	Multimodal: Fix memory leak with MMEmbeddings On a basic python class, class attributes are handled by reference, meaning that every instance of embeddings would attach to that reference and allocate more memory. Switch to a Pydantic class and factory methods when instantiating. Signed-off-by: kingbri <8082010+kingbri1@users.noreply.github.com>	2025-02-02 12:21:19 -05:00
kingbri	bd16681825	Start: Mark cuda 11.8 as unsupported Temporary until existing cuda 11.8 scripts can be migrated to cuda 12. Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2025-01-12 21:50:41 -05:00
Brian	566e5b5937	Merge pull request #271 from lifo9/bump-formatron Bump formatron to `0.4.11`	2025-01-07 23:19:35 -05:00
Jakub Filo	f8d9cfb5fd	Bump formatron to 0.4.11	2025-01-08 00:48:25 +01:00
kingbri	cfb439c0e6	Dependencies: Update exllamav2 and pytorch for ROCm Exllama v0.2.7, pytorch v2.5.1 across all cards. AMD now requires ROCm 6.2 Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2025-01-01 16:22:10 -05:00
kingbri	6da65a8fd3	Embeddings: Fix base64 return A base64 embedding can be a string post-encoding. Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2025-01-01 16:15:12 -05:00
kingbri	245bd5c008	Templates: Alter chatml_with_headers to fit huggingface spec The previous template was compatible with Jinja2 in Python, but it was not cross-platform compatible according to HF's standards. Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-30 14:00:44 -05:00
Brian	709493837b	Merge pull request #264 from DocShotgun/robust-length-checking Robust request length checking in generator	2024-12-26 23:37:53 -05:00
kingbri	b994aae995	Model: Cleanup generation length and page checks Reduce the amount of if statements and combine parts of code. Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-26 23:13:08 -05:00
kingbri	ba2579ff74	Merge branch 'main' into robust-length-checks	2024-12-26 18:00:26 -05:00
kingbri	7878d351a7	Endpoints: Add props endpoint and add more values to model params The props endpoint is a standard used by llamacpp APIs which returns various properties of a model to a server. It's still recommended to use /v1/model to get all the parameters a TabbyAPI model has. Also include the contents of a prompt template when fetching the current model. Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-26 17:32:19 -05:00
kingbri	fa8035ef72	Dependencies: Update sse-starlette and formatron Also pin newer versions of dependencies and fix an import from sse-starlette Signed-off-by: kingbri <8082010+bdashore3@users.noreply.github.com>	2024-12-21 23:14:55 -05:00

... 2 3 4 5 6 ...

1051 commits