Commit graph

  • f8070e7707 Model: Auto detect model backend from config * Use exllamav3 for exl3 models, exllamav2 otherwise DocShotgun 2025-05-06 18:51:58 -07:00
  • 9dcde59c57 Model: Check for unsupported cache mode in exllamav2 DocShotgun 2025-05-06 01:18:15 -07:00
  • bc0a84241a API: Patch kobold generation call kingbri 2025-05-05 22:11:21 -04:00
  • b683545d0e Config: Fix argparse help kingbri 2025-05-05 21:52:30 -04:00
  • ff38305145 Common: Fix exception f-string turboderp 2025-05-05 02:01:16 +02:00
  • 45b966363e Tree: Format DocShotgun 2025-05-03 21:01:03 -07:00
  • a635a719d7 Model: Enable draft model q-cache in Exl3 * Remove unneeded default fp16 cache layer import DocShotgun 2025-05-03 20:59:36 -07:00
  • 58e34ba4c5 Model: Exl3 cache quant settings lenient with whitespace DocShotgun 2025-05-01 23:05:41 -07:00
  • 68a660bdb3 Model: Initial Exl3 cache quantization support DocShotgun 2025-05-01 22:55:51 -07:00
  • 036af02bf6 Common: No default add_bos_token value for chat completion requests turboderp 2025-05-04 05:25:58 +02:00
  • 92ea7ee7cd Model: Add draft model/speculative decoding turboderp 2025-05-04 01:27:42 +02:00
  • 1db2cb99cb Model: Avoid initializing class variables turboderp 2025-05-04 01:26:42 +02:00
  • 0405a94a89 Model: Cast penalty range to int turboderp 2025-05-03 22:28:36 +02:00
  • 58c380b8ca Model: Create generator on load turboderp 2025-05-03 18:32:51 +02:00
  • 0d949d00b9 Model: Set default max_batch_size turboderp 2025-05-03 18:32:30 +02:00
  • 8c75b29923 Model: Fix some warnings turboderp 2025-05-03 18:31:14 +02:00
  • 15cc480cb0 Exl3: Simplify add_bos_token handling kingbri 2025-05-02 21:44:36 -04:00
  • d8a8ccfc2a Model: fix add_bos_token randoentity 2025-05-02 14:53:44 +02:00
  • 0d02af3c81 Model: Set model_dir on init kingbri 2025-05-02 00:26:40 -04:00
  • c89bea030e Model: Add template fetching to Exl3 kingbri 2025-05-02 00:22:34 -04:00
  • e8f00412f6 Model: Fetch from generation_config and tokenizer_config in Exl3 kingbri 2025-05-02 00:16:11 -04:00
  • 59d081fe83 Common: Add hardware file kingbri 2025-05-01 22:39:32 -04:00
  • eca403a0e4 Model: Add Exllamav3 sampler kingbri 2025-05-01 18:23:45 -04:00
  • bdc5189a4b Exl3: Add chunk size, cache size, and model info kingbri 2025-04-30 23:58:27 -04:00
  • 303e2dde12 Model: Correct exl3 generation, add concurrency, and cleanup kingbri 2025-04-30 22:59:25 -04:00
  • c744790f14 fixup: add sampler logs randoentity 2025-04-30 13:14:34 +02:00
  • b35c48da37 fixup: some metrics randoentity 2025-04-30 11:56:24 +02:00
  • c0f268f33e fixup: autosplit, start work on metrics randoentity 2025-04-30 11:10:03 +02:00
  • 306fc7cd15 fixup: autosplit reserve randoentity 2025-04-30 09:43:33 +02:00
  • acb3adb953 fixup: auto split randoentity 2025-04-30 08:43:26 +02:00
  • 14fb573371 fixup: max_seq_len randoentity 2025-04-30 00:23:25 +02:00
  • daae9ec43d Exl3: Couldn't wait randoentity 2025-04-29 23:57:53 +02:00
  • b4ff2f23cf Exl3: Add token encode, decode, and special token fetch kingbri 2025-05-02 21:32:53 -04:00
  • 0c1d794390 Model: Add exl3 and associated load functions kingbri 2025-04-28 23:54:55 -04:00
  • 7c6a053747 Model: Add option to select backend kingbri 2025-04-27 22:27:26 -04:00
  • 242f6b7d2a Model: Simplify add_bos_token handling kingbri 2025-05-02 21:30:18 -04:00
  • 4cb3e5d5b1 Tree: Format kingbri 2025-05-02 00:23:15 -04:00
  • 47cb2a0de9 Model: Add TokenizerConfig stub and add_eos_token fallback kingbri 2025-05-02 00:08:01 -04:00
  • aa657fa6e9 API: Ignore add_bos_token in chat completions kingbri 2025-05-01 22:51:15 -04:00
  • 3960612d38 API: Format and fix message naming kingbri 2025-04-28 22:36:30 -04:00
  • 9157be3e34 API: Append task index to generations with n > 1 kingbri 2025-04-28 22:29:48 -04:00
  • b43f0983c8 Model: Fix max_seq_len fallbacks kingbri 2025-04-28 14:07:32 -04:00
  • 755f98a338 Docker: Move to venv for running kingbri 2025-04-27 00:38:07 -04:00
  • f70eb11db3 Docker: Use python 3.12 kingbri 2025-04-27 00:24:32 -04:00
  • 09ddfa8ffb Docker: Update to Cuda 12.8 and Ubuntu 24.04 kingbri 2025-04-26 21:29:36 -04:00
  • 2b3ed3fc79 Dependencies: Switch back to official exl2 wheels kingbri 2025-04-26 21:27:28 -04:00
  • b081aa9fa3
    Merge pull request #322 from theroyallab/model-rewrite Brian 2025-04-26 02:15:48 -04:00
  • 3649d3bb51 Tree: Format + Lint kingbri 2025-04-26 02:14:30 -04:00
  • eb435f79e3 Dependencies (TEMP): Use my wheels for exl2 kingbri 2025-04-26 02:11:33 -04:00
  • f4757d31bd Model: Raise a 503 exception with model checks kingbri 2025-04-25 00:00:15 -04:00
  • 136c8139f9 Dependencies: Update PyTorch, Exllamav2, and FA2 kingbri 2025-04-24 21:52:48 -04:00
  • f070587e9f Model: Add proper jobs cleanup and fix var calls kingbri 2025-04-24 21:30:55 -04:00
  • 7e007f0761 Model: Handle finish chunks and logprobs in separate functions kingbri 2025-04-24 21:19:03 -04:00
  • bc1bef3324 FIx logs path David Allada 2025-04-22 21:14:45 -04:00
  • f2c7da2faf Tree: Format kingbri 2025-04-21 23:16:14 -04:00
  • 3f09fcd8c9 Model: Make model params return a model card kingbri 2025-04-21 23:15:46 -04:00
  • 9834c7f99b Dependencies: Ungate numpy kingbri 2025-04-21 23:14:14 -04:00
  • d26260b332 Model: Add fixes for kwargs and add note for migration kingbri 2025-04-21 22:39:07 -04:00
  • 93854a3107
    Merge pull request #320 from Vhallo/model-rewrite Brian 2025-04-21 10:55:55 -04:00
  • 1aefa01a68
    Fix RoPE Ratio Vhallo 2025-04-21 01:46:18 +02:00
  • 13beef8021 Model: Move find_template function to templating kingbri 2025-04-20 18:27:53 -04:00
  • 8e238fa8f6 Model: Move calculate_rope_alpha from backend kingbri 2025-04-20 18:20:19 -04:00
  • 027ffce05d Utils: Remove unused defer utils kingbri 2025-04-20 17:59:09 -04:00
  • b751e0a1d5 Model: Move inline overrides to common kingbri 2025-04-20 17:51:57 -04:00
  • 034682fcf1 Backends: Add base model container kingbri 2025-04-20 17:24:10 -04:00
  • f15ac1f69d Model: Reject model requests when unloading kingbri 2025-04-19 22:34:06 -04:00
  • 552a64c723 Model: Have load take the highest priority kingbri 2025-04-18 22:08:48 -04:00
  • 3f1d5d396e Model: Store active jobs in tabby kingbri 2025-04-16 13:17:55 -04:00
  • 1afc9b983e Model: Remove generate_window kingbri 2025-04-16 12:58:23 -04:00
  • 2f5235e1a3 Model: Extract settings creation to a separate function kingbri 2025-04-16 12:57:27 -04:00
  • 5697204e47 Merge branch 'main' into model-rewrite kingbri 2025-04-16 02:15:46 -04:00
  • 6bb5f8f599 Sampling: Rewrite mirostat_mode parameter kingbri 2025-04-16 02:13:55 -04:00
  • 3084ef9fa1 Model + API: Migrate to use BaseSamplerParams kingbri 2025-04-16 00:50:05 -04:00
  • dcb36e9ab2 Model: Remove extra unwraps kingbri 2025-04-15 23:38:46 -04:00
  • 11ed3cf5ee Model: Cleanup logging and remove extraneous declarations kingbri 2025-04-15 23:31:12 -04:00
  • 436ce752da
    Support more common tool variables in templates (tools, message.tool_calls) (#308) Andrew Phillips 2025-03-23 14:23:00 -03:00
  • d31d17e5a2 Trigger ruff formatting David Allada 2025-03-23 17:04:09 +00:00
  • bcd3413628 Try to fix ruff format David Allada 2025-03-23 17:02:52 +00:00
  • 0256d3b2a2 Fix the comment from 10MB to 20MB David Allada 2025-03-23 16:51:47 +00:00
  • 6750c291db Add file based logging in addition to the normal console logs David Allada 2025-03-23 16:49:58 +00:00
  • ccf23243c1 Docs: Update getting started with downloading from private repos kingbri 2025-03-19 12:02:48 -04:00
  • 529c90b93e Tree: Format and lint kingbri 2025-03-19 11:55:02 -04:00
  • d990bbc431 Args: Remove action arguments kingbri 2025-03-19 11:53:47 -04:00
  • 79f9c6e854 Model: Remove num_experts_per_token kingbri 2025-03-19 11:52:10 -04:00
  • 698d8339cb Config + Docs: Clarify YaRN rope scaling changes kingbri 2025-03-19 11:47:49 -04:00
  • a20abe2d33
    Bugfix: Chat completion requests fail with UnboundLocalError: finish_reason variable not initialized (#307) Benjamin Oldenburg 2025-03-16 07:31:21 +07:00
  • d98c0bd3f6 API: Add tools class kingbri 2025-03-14 15:06:03 -04:00
  • 51b32621e1
    Update README.md Brian 2025-03-14 15:04:24 -04:00
  • a2a14ea114
    Fix Tool Call JSON Serialization Error (#302) Benjamin Oldenburg 2025-03-15 02:01:33 +07:00
  • de77955428 Docs: Update kingbri 2025-03-12 00:40:15 -04:00
  • 4196bb6bc8
    Update the behavior of start.py so that we can do a full build AND sa… (#293) David Allada 2025-03-11 23:54:34 -04:00
  • 73688670a6 Docs: Add model and inline loading documentation kingbri 2025-02-25 00:09:18 -05:00
  • 35fe372f2b Embeddings: Handle case if embedding input is passed as a string kingbri 2025-02-23 00:39:21 -05:00
  • c580893054 Downloader: log errors when downloading kingbri 2025-02-19 23:16:17 -05:00
  • 48bb78c614 Logger: Switch to ISO timestamp formatting kingbri 2025-02-19 21:48:23 -05:00
  • d6b8c7db4b Docs: Update getting started guide kingbri 2025-02-18 12:14:52 -05:00
  • 830301b2b4 Actions: Update and add Wiki publish kingbri 2025-02-17 23:47:38 -05:00
  • 5614b342a7 Tree: Migrate docs into repository kingbri 2025-02-17 23:39:35 -05:00
  • 9f649647f0 Model + API: GPU split updates and fixes kingbri 2025-02-15 21:50:14 -05:00
  • 304df16543
    Update README.md Brian 2025-02-15 12:14:06 -05:00