Todays AI Summary

AI Developments: Enhanced Pronoun Handling, Synthetic Content Forensics, and More

This week's AI landscape features advancements in language models, synthetic content detection, and medical imaging, among other areas. Here's a breakdown of the key developments:

Noteworthy Research Papers

  • Improved Pronoun Handling: The paper "Do They Understand Them? An Updated Evaluation on Nonbinary Pronoun Handling in Large Language Models" introduces MISGENDERED+, a benchmark for evaluating LLMs' pronoun fidelity. Results show improvements in binary and gender-neutral pronoun accuracy in models like GPT-4o, Claude 4, and Qwen2.5, but inconsistencies persist with neopronouns.
  • Universal Synthetic Content Forensics: "Unraveling Hidden Representations: A Multi-Modal Layer Analysis for Better Synthetic Content Forensics" explores using large pre-trained multi-modal models for detecting generative content. The research demonstrates that linear classifiers trained on the latent code of these models achieve state-of-the-art results across various modalities, particularly in audio and images.
  • Audio-Driven Video Generation: "SpA2V: Harnessing Spatial Auditory Cues for Audio-driven Spatially-aware Video Generation" introduces a framework that exploits spatial auditory cues to generate videos with high semantic and spatial correspondence. The method uses a state-of-the-art MLLM to construct Video Scene Layouts (VSLs) from audio, which then guide a diffusion model for video generation.
  • Sample-Aware Test-Time Adaptation: "Sample-Aware Test-Time Adaptation for Medical Image-to-Image Translation" proposes a novel Test-Time Adaptation (TTA) framework that dynamically adjusts the translation process based on the characteristics of each test sample. The method introduces a Reconstruction Module to quantify the domain shift and a Dynamic Adaptation Block that selectively modifies the internal features of a pretrained translation model to mitigate the shift.
  • Robust Chinese Hate Speech Detection: "MMBERT: Scaled Mixture-of-Experts Multimodal BERT for Robust Chinese Hate Speech Detection under Cloaking Perturbations" introduces a novel BERT-based multimodal framework that integrates textual, speech, and visual modalities through a Mixture-of-Experts (MoE) architecture. Empirical results in several Chinese hate speech datasets show that MMBERT significantly surpasses fine-tuned BERT-based encoder models, fine-tuned LLMs, and LLMs utilizing in-context learning approaches.
  • Uncertainty Quantification and OOD Detection: "A Simple and Effective Method for Uncertainty Quantification and OOD Detection" proposes an effective method based on feature space density to quantify uncertainty for distributional shifts and out-of-distribution (OOD) detection. The results demonstrate that their method outperforms baseline models.
  • Airbnb Search Ranking: "Harnessing the Power of Interleaving and Counterfactual Evaluation for Airbnb Search Ranking" developed interleaving and counterfactual evaluation methods to facilitate rapid online assessments for identifying the most promising candidates for A/B tests.
  • Biometric Verification in Avatar Videos: "Is It Really You? Exploring Biometric Verification Scenarios in Photorealistic Talking-Head Avatar Videos" explores the challenge of biometric verification in avatar-mediated scenarios. Experimental results demonstrate that facial motion cues enable meaningful identity verification with AUC values approaching 80%.
  • Agentic LLMs for Radiology QA: "Agentic large language models improve retrieval-based radiology question answering" proposes an agentic RAG framework enabling LLMs to autonomously decompose radiology questions, iteratively retrieve targeted clinical evidence, and dynamically synthesize evidence-based responses.
  • Out-of-Context Abduction: "Out-of-Context Abduction: LLMs Make Inferences About Procedural Data Leveraging Declarative Facts in Earlier Training Data" design experiments to study out-of-context abduction in LLMs, the ability to infer the most plausible explanations for observations using relevant facts present in training data.

Model Releases

  • lightx2v/Wan2.2-Lightning: A new model with safetensors.
  • Intel/Qwen3-Coder-30B-A3B-Instruct-gguf-q2ks-mixed-AutoRound: A GGUF format model for code generation, quantized using Intel's auto-round algorithm. It's based on Qwen/Qwen3-Coder-30B-A3B-Instruct and fine-tuned on the codeparrot/github-code-clean dataset.
  • DFloat11/Qwen-Image-DF11: A DFloat11 losslessly compressed version of the Qwen/Qwen-Image model, reducing model size by 32% while maintaining bit-identical outputs and supporting efficient GPU inference.
  • **mradermacher/GL

AI Papers for 2026-04-21

ASMR-Bench: Auditing for Sabotage in ML Research

As AI systems are increasingly used to conduct research autonomously, misaligned systems could introduce subtle flaws that produce misleading results while evading detection. We introduce ASMR-Bench (Auditing for Sabotage in ML Research), a benchmark for evaluating the ability of auditors to detect sabotage in ML research codebases. ASMR-Bench consists of 9 ML research codebases with sabotaged variants that produce qualitatively different experimental results. Each sabotage modifies implementation details, such as hyperparameters, training data, or evaluation code, while preserving the high-level methodology described in the paper. We evaluated frontier LLMs and LLM-assisted human auditors on ASMR-Bench and found that both struggled to reliably detect sabotage: the best performance was an AUROC of 0.77 and a top-1 fix rate of 42%, achieved by Gemini 3.1 Pro. We also tested LLMs as red teamers and found that LLM-generated sabotages were weaker than human-generated ones but still sometimes evaded same-capability LLM auditors. We release ASMR-Bench to support research on monitoring and auditing techniques for AI-conducted research.

Using Large Language Models and Knowledge Graphs to Improve the Interpretability of Machine Learning Models in Manufacturing

Explaining Machine Learning (ML) results in a transparent and user-friendly manner remains a challenging task of Explainable Artificial Intelligence (XAI). In this paper, we present a method to enhance the interpretability of ML models by using a Knowledge Graph (KG). We store domain-specific data along with ML results and their corresponding explanations, establishing a structured connection between domain knowledge and ML insights. To make these insights accessible to users, we designed a selective retrieval method in which relevant triplets are extracted from the KG and processed by a Large Language Model (LLM) to generate user-friendly explanations of ML results. We evaluated our method in a manufacturing environment using the XAI Question Bank. Beyond standard questions, we introduce more complex, tailored questions that highlight the strengths of our approach. We evaluated 33 questions, analyzing responses using quantitative metrics such as accuracy and consistency, as well as qualitative ones such as clarity and usefulness. Our contribution is both theoretical and practical: from a theoretical perspective, we present a novel approach for effectively enabling LLMs to dynamically access a KG in order to improve the explainability of ML results. From a practical perspective, we provide empirical evidence showing that such explanations can be successfully applied in real-world manufacturing environments, supporting better decision-making in manufacturing processes.

Learning to Reason with Insight for Informal Theorem Proving

Although most of the automated theorem-proving approaches depend on formal proof systems, informal theorem proving can align better with large language models' (LLMs) strength in natural language processing. In this work, we identify a primary bottleneck in informal theorem proving as a lack of insight, namely the difficulty of recognizing the core techniques required to solve complex problems. To address this, we propose a novel framework designed to cultivate this essential reasoning skill and enable LLMs to perform insightful reasoning. We propose $\mathtt{DeepInsightTheorem}$, a hierarchical dataset that structures informal proofs by explicitly extracting core techniques and proof sketches alongside the final proof. To fully exploit this dataset, we design a Progressive Multi-Stage SFT strategy that mimics the human learning process, guiding the model from basic proof writing to insightful thinking. Our experiments on challenging mathematical benchmarks demonstrate that this insight-aware generation strategy significantly outperforms baselines. These results demonstrate that teaching models to identify and apply core techniques can substantially improve their mathematical reasoning.

VEFX-Bench: A Holistic Benchmark for Generic Video Editing and Visual Effects

As AI-assisted video creation becomes increasingly practical, instruction-guided video editing has become essential for refining generated or captured footage to meet professional requirements. Yet the field still lacks both a large-scale human-annotated dataset with complete editing examples and a standardized evaluator for comparing editing systems. Existing resources are limited by small scale, missing edited outputs, or the absence of human quality labels, while current evaluation often relies on expensive manual inspection or generic vision-language model judges that are not specialized for editing quality. We introduce VEFX-Dataset, a human-annotated dataset containing 5,049 video editing examples across 9 major editing categories and 32 subcategories, each labeled along three decoupled dimensions: Instruction Following, Rendering Quality, and Edit Exclusivity. Building on VEFX-Dataset, we propose VEFX-Reward, a reward model designed specifically for video editing quality assessment. VEFX-Reward jointly processes the source video, the editing instruction, and the edited video, and predicts per-dimension quality scores via ordinal regression. We further release VEFX-Bench, a benchmark of 300 curated video-prompt pairs for standardized comparison of editing systems. Experiments show that VEFX-Reward aligns more strongly with human judgments than generic VLM judges and prior reward models on both standard IQA/VQA metrics and group-wise preference evaluation. Using VEFX-Reward as an evaluator, we benchmark representative commercial and open-source video editing systems, revealing a persistent gap between visual plausibility, instruction following, and edit locality in current models.

From Benchmarking to Reasoning: A Dual-Aspect, Large-Scale Evaluation of LLMs on Vietnamese Legal Text

The complexity of Vietnam's legal texts presents a significant barrier to public access to justice. While Large Language Models offer a promising solution for legal text simplification, evaluating their true capabilities requires a multifaceted approach that goes beyond surface-level metrics. This paper introduces a comprehensive dual-aspect evaluation framework to address this need. First, we establish a performance benchmark for four state-of-the-art large language models (GPT-4o, Claude 3 Opus, Gemini 1.5 Pro, and Grok-1) across three key dimensions: Accuracy, Readability, and Consistency. Second, to understand the "why" behind these performance scores, we conduct a large-scale error analysis on a curated dataset of 60 complex Vietnamese legal articles, using a novel, expert-validated error typology. Our results reveal a crucial trade-off: models like Grok-1 excel in Readability and Consistency but compromise on fine-grained legal Accuracy, while models like Claude 3 Opus achieve high Accuracy scores that mask a significant number of subtle but critical reasoning errors. The error analysis pinpoints \textit{Incorrect Example} and \textit{Misinterpretation} as the most prevalent failures, confirming that the primary challenge for current LLMs is not summarization but controlled, accurate legal reasoning. By integrating a quantitative benchmark with a qualitative deep dive, our work provides a holistic and actionable assessment of LLMs for legal applications.

Beyond Distribution Sharpening: The Importance of Task Rewards

Frontier models have demonstrated exceptional capabilities following the integration of task-reward-based reinforcement learning (RL) into their training pipelines, enabling systems to evolve from pure reasoning models into sophisticated agents. However, debate persists regarding whether RL genuinely instills new skills within a base model or merely sharpens its existing distribution to elicit latent capabilities. To address this dichotomy, we present an explicit comparison between distribution sharpening and task-reward-based learning, utilizing RL as a tool to implement both paradigms. Our analysis reveals the inherent limitations of distribution sharpening, demonstrating from first principles how and why the optima can be unfavorable and the approach fundamentally unstable. Furthermore, our experiments using Llama-3.2-3B-Instruct, Qwen2.5-3B-Instruct and Qwen3-4B-Instruct-2507 on math datasets confirm that sharpening yields limited gains, whereas incorporating task-based reward signal can greatly help achieve robust performance improvements and stable learning.

Characterising LLM-Generated Competency Questions: a Cross-Domain Empirical Study using Open and Closed Models

Competency Questions (CQs) are a cornerstone of requirement elicitation in ontology engineering. CQs represent requirements as a set of natural language questions that an ontology should satisfy; they are traditionally modelled by ontology engineers together with domain experts as part of a human-centred, manual elicitation process. The use of Generative AI automates CQ creation at scale, therefore democratising the process of generation, widening stakeholder engagement, and ultimately broadening access to ontology engineering. However, given the large and heterogeneous landscape of LLMs, varying in dimensions such as parameter scale, task and domain specialisation, and accessibility, it is crucial to characterise and understand the intrinsic, observable properties of the CQs they produce (e.g., readability, structural complexity) through a systematic, cross-domain analysis. This paper introduces a set of quantitative measures for the systematic comparison of CQs across multiple dimensions. Using CQs generated from well defined use cases and scenarios, we identify their salient properties, including readability, relevance with respect to the input text and structural complexity of the generated questions. We conduct our experiments over a set of use cases and requirements using a range of LLMs, including both open (KimiK2-1T, LLama3.1-8B, LLama3.2-3B) and closed models (Gemini 2.5 Pro, GPT 4.1). Our analysis demonstrates that LLM performance reflects distinct generation profiles shaped by the use case.

Joint-Centric Dual Contrastive Alignment with Structure-Preserving and Information-Balanced Regularization

We propose HILBERT (HIerarchical Long-sequence Balanced Embedding with Reciprocal contrastive Training), a cross-attentive multimodal framework for learning document-level audio-text representations from long, segmented sequences in low-resource data settings. HILBERT leverages frozen pre-trained speech and language encoders to extract segment-level features, which are aggregated via cross-modal attention and self-attentive pooling to form modality-specific document representations and a joint cross-attentive embedding. To align modalities while preserving modality-specific structure under severe audio-text dimensional imbalance, we introduce a reciprocal dual contrastive objective that simultaneously aligns audio-to-joint and text-to-joint representations, rather than directly contrasting audio and text alone. Two auxiliary regularizers further stabilize long-sequence fusion: a Centered Kernel Alignment (CKA) loss that preserves structural consistency between each modality and the joint embedding, and a mutual information balancing loss that prevents dominance of a single modality by equalizing information flow from audio and text into the joint space. For downstream prediction, HILBERT employs a Mixture-of-Experts (MoE) classifier over concatenated audio, text, and joint representations to accommodate heterogeneous label regimes. Extensive evaluation across multiple audio-text backbone combinations demonstrates that HILBERT learns semantically meaningful long-sequence representations and achieves superior performance on highly imbalanced multi-class settings.

BAGEL: Benchmarking Animal Knowledge Expertise in Language Models

Large language models have shown strong performance on broad-domain knowledge and reasoning benchmarks, but it remains unclear how well language models handle specialized animal-related knowledge under a unified closed-book evaluation protocol. We introduce BAGEL, a benchmark for evaluating animal knowledge expertise in language models. BAGEL is constructed from diverse scientific and reference sources, including bioRxiv, Global Biotic Interactions, Xeno-canto, and Wikipedia, using a combination of curated examples and automatically generated closed-book question-answer pairs. The benchmark covers multiple aspects of animal knowledge, including taxonomy, morphology, habitat, behavior, vocalization, geographic distribution, and species interactions. By focusing on closed-book evaluation, BAGEL measures animal-related knowledge of models without external retrieval at inference time. BAGEL further supports fine-grained analysis across source domains, taxonomic groups, and knowledge categories, enabling a more precise characterization of model strengths and systematic failure modes. Our benchmark provides a new testbed for studying domain-specific knowledge generalization in language models and for improving their reliability in biodiversity-related applications.

A Two-Stage, Object-Centric Deep Learning Framework for Robust Exam Cheating Detection

Academic integrity continues to face the persistent challenge of examination cheating. Traditional invigilation relies on human observation, which is inefficient, costly, and prone to errors at scale. Although some existing AI-powered monitoring systems have been deployed and trusted, many lack transparency or require multi-layered architectures to achieve the desired performance. To overcome these challenges, we propose an improvement over a simple two-stage framework for exam cheating detection that integrates object detection and behavioral analysis using well-known technologies. First, the state-of-the-art YOLOv8n model is used to localize students in exam-room images. Each detected region is cropped and preprocessed, then classified by a fine-tuned RexNet-150 model as either normal or cheating behavior. The system is trained on a dataset compiled from 10 independent sources with a total of 273,897 samples, achieving 0.95 accuracy, 0.94 recall, 0.96 precision, and 0.95 F1-score - a 13\% increase over a baseline accuracy of 0.82 in video-based cheating detection. In addition, with an average inference time of 13.9 ms per sample, the proposed approach demonstrates robustness and scalability for deployment in large-scale environments. Beyond the technical contribution, the AI-assisted monitoring system also addresses ethical concerns by ensuring that final outcomes are delivered privately to individual students after the examination, for example, via personal email. This prevents public exposure or shaming and offers students an opportunity to reflect on their behavior. For further improvement, it is possible to incorporate additional factors, such as audio data and consecutive frames, to achieve greater accuracy. This study provides a foundation for developing real-time, scalable, ethical, and open-source solutions.

AI Models

unsloth/Kimi-K2.6-GGUF


base_model:

  • moonshotai/Kimi-K2.6 tags:
  • compressed-tensors
  • unsloth license: other license_name: modified-mit library_name: transformers pipeline_tag: image-text-to-text

[!NOTE] Includes Unsloth chat template fixes! <br> For llama.cpp, use --jinja

<div> <p style="margin-top: 0;margin-bottom: 0;"> <em><a href="https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf">Unsloth Dynamic 2.0</a> achieves superior accuracy & outperforms other leading quants.</em> </p> <div style="display: flex; gap: 5px; align-items: center; "> <a href="https://github.com/unslothai/unsloth/"> <img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="133"> </a> <a href="https://discord.gg/unsloth"> <img src="https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png" width="173"> </a> <a href="https://docs.unsloth.ai/"> <img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143"> </a> </div> </div> <div align="center"> <picture> <img src="figures/kimi-logo.png" width="30%" alt="Kimi K2.6"> </picture> </div> <hr> <div align="center" style="line-height:1"> <a href="https://www.kimi.com" target="_blank"><img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-Kimi%20K2.6-ff6b6b?color=1783ff&logoColor=white"/></a> <a href="https://www.moonshot.ai" target="_blank"><img alt="Homepage" src="https://img.shields.io/badge/Homepage-Moonshot%20AI-white?logo=Kimi&logoColor=white"/></a> </div> <div align="center" style="line-height: 1;"> <a href="https://huggingface.co/moonshotai" target="_blank"><img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Moonshot%20AI-ffc107?color=ffc107&logoColor=white"/></a> <a href="https://twitter.com/kimi_moonshot" target="_blank"><img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-Kimi.ai-white?logo=x&logoColor=white"/></a> <a href="https://discord.gg/TYU2fdJykW" target="_blank"><img alt="Discord" src="https://img.shields.io/badge/Discord-Kimi.ai-white?logo=discord&logoColor=white"/></a> </div> <div align="center" style="line-height: 1;"> <a href="https://huggingface.co/moonshotai/Kimi-K2.6/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/badge/License-Modified_MIT-f5de53?&color=f5de53"/></a> </div> <p align="center"> <b>📰&nbsp;&nbsp;<a href="https://www.kimi.com/blog/kimi-k2-6.html">Tech Blog</a></b> </p>

1. Model Introduction

Kimi K2.6 is an open-source, native multimodal agentic model that advances practical capabilities in long-horizon coding, coding-driven design, proactive autonomous execution, and swarm-based task orchestration.

Key Features

  • Long-Horizon Coding: K2.6 achieves significant improvements on complex, end-to-end coding tasks, generalizing robustly across programming languages (Rust, Go, Python) and domains spanning front-end, DevOps, and performance optimization.
  • Coding-Driven Design: K2.6 is capable of transforming simple prompts and visual inputs into production-ready interfaces and lightweight full-stack workflows, generating structured layouts, interactive elements, and rich animations with deliberate aesthetic precision.
  • Elevated Agent Swarm: Scaling horizontally to 300 sub-agents executing 4,000 coordinated steps, K2.6 can dynamically decompose tasks into parallel, domain-specialized subtasks, delivering end-to-end outputs from documents to websites to spreadsheets in a single autonomous run.
  • Proactive & Open Orchestration: For autonomous tasks, K2.6 demonstrates strong performance in powering persistent, 24/7 background agents that proactively manage schedules, execute code, and orchestrate cross-platform operations without human oversight.

2. Model Summary

<div align="center">

| | | |:---:|:---:| | Architecture | Mixture-of-Experts (MoE) | | Total Parameters | 1T | | Activated Parameters | 32B | | Number of Layers (Dense layer included) | 61 | | Number of Dense Layers | 1 | | Attention Hidden Dimension | 7168 | | MoE Hidden Dimension (per Expert) | 2048 | | Number of Attention Heads | 64 | | Number of Experts | 384 | | Selected Experts per Token | 8 | | Number of Shared Experts | 1 | | Vocabulary Size | 160K | | Context Length | 256K | | Attention Mechanism | MLA | | Activation Function | SwiGLU | | Vision Encoder | MoonViT | | Parameters of Vision Encoder | 400M |

</div>

3. Evaluation Results

<div align="center"> <table> <thead> <tr> <th align="center">Benchmark</th> <th align="center"><sup>Kimi K2.6</sup></th> <th align="center"><sup>GPT-5.4 <br><sup>(xhigh)</sup></sup></th> <th align="center"><sup>Claude Opus 4.6 <br><sup>(max effort)</sup></sup></th> <th align="center"><sup>Gemini 3.1 Pro<br><sup>(thinking high)</sup></sup></th> <th align="center"><sup>Kimi K2.5</sup></th> </tr> </thead> <tbody> <tr> <td align="center" colspan=6><strong>Agentic</strong></td> </tr> <tr> <td align="center" style="vertical-align: middle">HLE-Full<br>(w/ tools)</td> <td align="center" style="vertical-align: middle">54.0</td> <td align="center" style="vertical-align: middle">52.1</td> <td align="center" style="vertical-align: middle">53.0</td> <td align="center" style="vertical-align: middle">51.4</td> <td align="center" style="vertical-align: middle">50.2</td> </tr> <tr> <td align="center" style="vertical-align: middle">BrowseComp</td> <td align="center" style="vertical-align: middle">83.2</td> <td align="center" style="vertical-align: middle" rowspan="2">82.7</td> <td align="center" style="vertical-align: middle" rowspan="2">83.7</td> <td align="center" style="vertical-align: middle" rowspan="2">85.9</td> <td align="center" style="vertical-align: middle">74.9</td> </tr> <tr> <td align="center" style="vertical-align: middle">BrowseComp<br>(Agent Swarm)</td> <td align="center" style="vertical-align: middle">86.3</td> <td align="center" style="vertical-align: middle">78.4</td> </tr> <tr> <td align="center" style="vertical-align: middle">DeepSearchQA<br>(f1-score)</td> <td align="center" style="vertical-align: middle">92.5</td> <td align="center" style="vertical-align: middle">78.6</td> <td align="center" style="vertical-align: middle">91.3</td> <td align="center" style="vertical-align: middle">81.9</td> <td align="center" style="vertical-align: middle">89.0</td> </tr> <tr> <td align="center" style="vertical-align: middle">DeepSearchQA<br>(accuracy)</td> <td align="center" style="vertical-align: middle">83.0</td> <td align="center" style="vertical-align: middle">63.7</td> <td align="center" style="vertical-align: middle">80.6</td> <td align="center" style="vertical-align: middle">60.2</td> <td align="center" style="vertical-align: middle">77.1</td> </tr> <tr> <td align="center" style="vertical-align: middle">WideSearch<br> (item-f1)</td> <td align="center" style="vertical-align: middle">80.8</td> <td align="center" style="vertical-align: middle">-</td> <td align="center" style="vertical-align: middle">-</td> <td align="center" style="vertical-align: middle">-</td> <td align="center" style="vertical-align: middle">72.7</td> </tr> <tr> <td align="center" style="vertical-align: middle">Toolathlon</td> <td align="center" style="vertical-align: middle">50.0</td> <td align="center" style="vertical-align: middle">54.6</td> <td align="center" style="vertical-align: middle">47.2</td> <td align="center" style="vertical-align: middle">48.8</td> <td align="center" style="vertical-align: middle">27.8</td> </tr> <tr> <td align="center" style="vertical-align: middle">MCPMark</td> <td align="center" style="vertical-align: middle">55.9</td> <td align="center" style="vertical-align: middle">62.5*</td> <td align="center" style="vertical-align: middle">56.7*</td> <td align="center" style="vertical-align: middle">55.9*</td> <td align="center" style="vertical-align: middle">29.5</td> </tr> <tr> <td align="center" style="vertical-align: middle">Claw Eval (pass^3)</td> <td align="center" style="vertical-align: middle">62.3</td> <td align="center" style="vertical-align: middle">60.3</td> <td align="center" style="vertical-align: middle">70.4</td> <td align="center" style="vertical-align: middle">57.8</td> <td align="center" style="vertical-align: middle">52.3</td> </tr> <tr> <td align="center" style="vertical-align: middle">Claw Eval (pass@3)</td> <td align="center" style="vertical-align: middle">80.9</td> <td align="center" style="vertical-align: middle">78.4</td> <td align="center" style="vertical-align: middle">82.4</td> <td align="center" style="vertical-align: middle">82.9</td> <td align="center" style="vertical-align: middle">75.4</td> </tr> <tr> <td align="center" style="vertical-align: middle">APEX-Agents</td> <td align="center" style="vertical-align: middle">27.9</td> <td align="center" style="vertical-align: middle">33.3</td> <td align="center" style="vertical-align: middle">33.0</td> <td align="center" style="vertical-align: middle">32.0</td> <td align="center" style="vertical-align: middle">11.5</td> </tr> <tr> <td align="center" style="vertical-align: middle">OSWorld-Verified</td> <td align="center" style="vertical-align: middle">73.1</td> <td align="center" style="vertical-align: middle">75.0</td> <td align="center" style="vertical-align: middle">72.7</td> <td align="center" style="vertical-align: middle">-</td> <td align="center" style="vertical-align: middle">63.3</td> </tr> <tr> <td align="center" colspan=6><strong>Coding</strong></td> </tr> <tr> <td align="center" style="vertical-align: middle">Terminal-Bench 2.0<br>(Terminus-2)</td> <td align="center" style="vertical-align: middle">66.7</td> <td align="center" style="vertical-align: middle">65.4*</td> <td align="center" style="vertical-align: middle">65.4</td> <td align="center" style="vertical-align: middle">68.5</td> <td align="center" style="vertical-align: middle">50.8</td> </tr> <tr> <td align="center" style="vertical-align: middle">SWE-Bench Pro</td> <td align="center" style="vertical-align: middle">58.6</td> <td align="center" style="vertical-align: middle">57.7</td> <td align="center" style="vertical-align: middle">53.4</td> <td align="center" style="vertical-align: middle">54.2</td> <td align="center" style="vertical-align: middle">50.7</td> </tr> <tr> <td align="center" style="vertical-align: middle">SWE-Bench Multilingual</td> <td align="center" style="vertical-align: middle">76.7</td> <td align="center" style="vertical-align: middle">-</td> <td align="center" style="vertical-align: middle">77.8</td> <td align="center" style="vertical-align: middle">76.9*</td> <td align="center" style="vertical-align: middle">73.0</td> </tr> <tr> <td align="center" style="vertical-align: middle">SWE-Bench Verified</td> <td align="center" style="vertical-align: middle">80.2</td> <td align="center" style="vertical-align: middle">-</td> <td align="center" style="vertical-align: middle">80.8</td> <td align="center" style="vertical-align: middle">80.6</td> <td align="center" style="vertical-align: middle">76.8</td> </tr> <tr> <td align="center" style="vertical-align: middle">SciCode</td> <td align="center" style="vertical-align: middle">52.2</td> <td align="center" style="vertical-align: middle">56.6</td> <td align="center" style="vertical-align: middle">51.9</td> <td align="center" style="vertical-align: middle">58.9</td> <td align="center" style="vertical-align: middle">48.7</td> </tr> <tr> <td align="center" style="vertical-align: middle">OJBench (python)</td> <td align="center" style="vertical-align: middle">60.6</td> <td align="center" style="vertical-align: middle">-</td> <td align="center" style="vertical-align: middle">60.3</td> <td align="center" style="vertical-align: middle">70.7</td> <td align="center" style="vertical-align: middle">54.7</td> </tr> <tr> <td align="center" style="vertical-align: middle">LiveCodeBench (v6)</td> <td align="center" style="vertical-align: middle">89.6</td> <td align="center" style="vertical-align: middle">-</td> <td align="center" style="vertical-align: middle">88.8</td> <td align="center" style="vertical-align: middle">91.7</td> <td align="center" style="vertical-align: middle">85.0</td> </tr> <tr> <td align="center" colspan=6><strong>Reasoning &amp; Knowledge</strong></td> </tr> <tr> <td align="center" style="vertical-align: middle">HLE-Full</td> <td align="center" style="vertical-align: middle">34.7</td> <td align="center" style="vertical-align: middle">39.8</td> <td align="center" style="vertical-align: middle">40.0</td> <td align="center" style="vertical-align: middle">44.4</td> <td align="center" style="vertical-align: middle">30.1</td> </tr> <tr> <td align="center" style="vertical-align: middle">AIME 2026</td> <td align="center" style="vertical-align: middle">96.4</td> <td align="center" style="vertical-align: middle">99.2</td> <td align="center" style="vertical-align: middle">96.7</td> <td align="center" style="vertical-align: middle">98.3</td> <td align="center" style="vertical-align: middle">95.8</td> </tr> <tr> <td align="center" style="vertical-align: middle">HMMT 2026 (Feb)</td> <td align="center" style="vertical-align: middle">92.7</td> <td align="center" style="vertical-align: middle">97.7</td> <td align="center" style="vertical-align: middle">96.2</td> <td align="center" style="vertical-align: middle">94.7</td> <td align="center" style="vertical-align: middle">87.1</td> </tr> <tr> <td align="center" style="vertical-align: middle">IMO-AnswerBench</td> <td align="center" style="vertical-align: middle">86.0</td> <td align="center" style="vertical-align: middle">91.4</td> <td align="center" style="vertical-align: middle">75.3</td> <td align="center" style="vertical-align: middle">91.0*</td> <td align="center" style="vertical-align: middle">81.8</td> </tr> <tr> <td align="center" style="vertical-align: middle">GPQA-Diamond</td> <td align="center" style="vertical-align: middle">90.5</td> <td align="center" style="vertical-align: middle">92.8</td> <td align="center" style="vertical-align: middle">91.3</td> <td align="center" style="vertical-align: middle">94.3</td> <td align="center" style="vertical-align: middle">87.6</td> </tr> <tr> <td align="center" colspan=6><strong>Vision</strong></td> </tr> <tr> <td align="center" style="vertical-align: middle">MMMU-Pro</td> <td align="center" style="vertical-align: middle">79.4</td> <td align="center" style="vertical-align: middle">81.2</td> <td align="center" style="vertical-align: middle">73.9</td> <td align="center" style="vertical-align: middle">83.0*</td> <td align="center" style="vertical-align: middle">78.5</td> </tr> <tr> <td align="center" style="vertical-align: middle">MMMU-Pro (w/ python)</td> <td align="center" style="vertical-align: middle">80.1</td> <td align="center" style="vertical-align: middle">82.1</td> <td align="center" style="vertical-align: middle">77.3</td> <td align="center" style="vertical-align: middle">85.3*</td> <td align="center" style="vertical-align: middle">77.7</td> </tr> <tr> <td align="center" style="vertical-align: middle">CharXiv (RQ)</td> <td align="center" style="vertical-align: middle">80.4</td> <td align="center" style="vertical-align: middle">82.8*</td> <td align="center" style="vertical-align: middle">69.1</td> <td align="center" style="vertical-align: middle">80.2*</td> <td align="center" style="vertical-align: middle">77.5</td> </tr> <tr> <td align="center" style="vertical-align: middle">CharXiv (RQ) (w/ python)</td> <td align="center" style="vertical-align: middle">86.7</td> <td align="center" style="vertical-align: middle">90.0*</td> <td align="center" style="vertical-align: middle">84.7</td> <td align="center" style="vertical-align: middle">89.9*</td> <td align="center" style="vertical-align: middle">78.7</td> </tr> <tr> <td align="center" style="vertical-align: middle">MathVision</td> <td align="center" style="vertical-align: middle">87.4</td> <td align="center" style="vertical-align: middle">92.0*</td> <td align="center" style="vertical-align: middle">71.2*</td> <td align="center" style="vertical-align: middle">89.8*</td> <td align="center" style="vertical-align: middle">84.2</td> </tr> <tr> <td align="center" style="vertical-align: middle">MathVision (w/ python)</td> <td align="center" style="vertical-align: middle">93.2</td> <td align="center" style="vertical-align: middle">96.1*</td> <td align="center" style="vertical-align: middle">84.6*</td> <td align="center" style="vertical-align: middle">95.7*</td> <td align="center" style="vertical-align: middle">85.0</td> </tr> <tr> <td align="center" style="vertical-align: middle">BabyVision</td> <td align="center" style="vertical-align: middle">39.8</td> <td align="center" style="vertical-align: middle">49.7</td> <td align="center" style="vertical-align: middle">14.8</td> <td align="center" style="vertical-align: middle">51.6</td> <td align="center" style="vertical-align: middle">36.5</td> </tr> <tr> <td align="center" style="vertical-align: middle">BabyVision (w/ python)</td> <td align="center" style="vertical-align: middle">68.5</td> <td align="center" style="vertical-align: middle">80.2*</td> <td align="center" style="vertical-align: middle">38.4*</td> <td align="center" style="vertical-align: middle">68.3*</td> <td align="center" style="vertical-align: middle">40.5</td> </tr> <tr> <td align="center" style="vertical-align: middle">V* (w/ python)</td> <td align="center" style="vertical-align: middle">96.9</td> <td align="center" style="vertical-align: middle">98.4*</td> <td align="center" style="vertical-align: middle">86.4*</td> <td align="center" style="vertical-align: middle">96.9*</td> <td align="center" style="vertical-align: middle">86.9</td> </tr> </tbody> </table> </div> <details> <summary><b>Footnotes</b></summary>
  1. General Testing Details
    • We report results for Kimi K2.6 and Kimi K2.5 with thinking mode enabled, Claude Opus 4.6 with max effort, GPT-5.4 with xhigh reasoning effort, and Gemini 3.1 Pro with a high thinking level.
    • Unless otherwise specified, all Kimi K2.6 experiments were conducted with temperature = 1.0, top-p = 1.0, and a context length of 262,144 tokens.
    • Benchmarks without publicly available scores were re-evaluated under the same conditions used for Kimi K2.6 and are marked with an asterisk (*). Except where noted with an asterisk, all other results are cited from official reports.
  2. Reasoning Benchmarks
    • IMO-AnswerBench scores for GPT-5.4 and Claude 4.6 were obtained from z.ai/blog/glm-5.1.
    • Humanity's Last Exam (HLE) and other reasoning tasks were evaluated with a maximum generation length of 98,304 tokens. By default, we report results on the HLE full set. For the text-only subset, Kimi K2.6 achieves 36.4% accuracy without tools and 55.5% with tools.
  3. Tool-Augmented / Agentic Tasks
    • Kimi K2.6 was equipped with search, code-interpreter, and web-browsing tools for HLE with tools, BrowseComp, DeepSearchQA, and WideSearch.
    • For HLE-Full with tools, the maximum generation length is 262,144 tokens with a per-step limit of 49,152 tokens. We employ a simple context management strategy: once the context window exceeds the threshold, only the most recent round of tool-related messages is retained.
    • For BrowseComp, we report scores obtained with context management using the same discard-all strategy as Kimi K2.5 and DeepSeek-V3.2.
    • For DeepSearchQA, no context management was applied to Kimi K2.6 tests, and tasks exceeding the supported context length were directly counted as failed. Scores for Claude Opus 4.6, GPT-5.4, and Gemini 3.1 Pro on DeepSearchQA are cited from the Claude Opus 4.7 System Card.
    • For WideSearch, we report results under the "hide tool result" context management setting. Once the context window exceeds the threshold, only the most recent round of tool-related messages is retained.
    • The test system prompts are identical to those used in the Kimi K2.5 technical report.
    • Claw Eval was conducted using version 1.1 with max-tokens-per-step = 16384.
    • For APEX-Agents, we evaluate 452 tasks from the public 480-task release, as done by Artificial Analysis(excluding Investment Banking Worlds 244 and 246, which have external runtime dependencies)
  4. Coding Tasks
    • Terminal-Bench 2.0 scores were obtained with the default agent framework (Terminus-2) and the provided JSON parser, operating in preserve thinking mode.
    • For the SWE-Bench series of evaluations (including Verified, Multilingual, and Pro), we used an in-house evaluation framework adapted from SWE-agent. This framework includes a minimal set of tools—bash tool, createfile tool, insert tool, view tool, strreplace tool, and submit tool.
    • All reported scores for coding tasks are averaged over 10 independent runs.
  5. Vision Benchmarks
    • Max-tokens = 98,304, averaged over three runs (avg@3).
    • Settings with Python tool use max-tokens-per-step = 65,536 and max-steps = 50 for multi-step reasoning.
    • MMMU-Pro follows the official protocol, preserving input order and prepending images.
</details>

4. Native INT4 Quantization

Kimi-K2.6 adopts the same native int4 quantization method as Kimi-K2-Thinking.

5. Deployment

[!Note] You can access Kimi-K2.6's API on https://platform.moonshot.ai and we provide OpenAI/Anthropic-compatible API for you. To verify the deployment is correct, we also provide the Kimi Vendor Verifier. Currently, Kimi-K2.6 is recommended to run on the following inference engines:

  • vLLM
  • SGLang
  • KTransformers

Kimi-K2.6 has the same architecture as Kimi-K2.5, and the deployment method can be directly reused.

The version requirement for transformers is >=4.57.1, <5.0.0.

Deployment examples can be found in the Model Deployment Guide.


6. Model Usage

The usage demos below demonstrate how to call our official API.

For third-party APIs deployed with vLLM or SGLang, please note that:

[!Note]

  • Chat with video content is an experimental feature and is only supported in our official API for now.

  • The recommended temperature will be 1.0 for Thinking mode and 0.6 for Instant mode.

  • The recommended top_p is 0.95.

  • To use instant mode, you need to pass {'chat_template_kwargs': {"thinking": False}} in extra_body.

Chat Completion

This is a simple chat completion script which shows how to call K2.6 API in Thinking and Instant modes.

import openai
import base64
import requests
def simple_chat(client: openai.OpenAI, model_name: str):
    messages = [
        {'role': 'system', 'content': 'You are Kimi, an AI assistant created by Moonshot AI.'},
        {
            'role': 'user',
            'content': [
                {'type': 'text', 'text': 'which one is bigger, 9.11 or 9.9? think carefully.'}
            ],
        },
    ]
    response = client.chat.completions.create(
        model=model_name, messages=messages, stream=False, max_tokens=4096
    )
    print('====== Below is reasoning content in Thinking Mode ======')
    print(f'reasoning content: {response.choices[0].message.reasoning}')
    print('====== Below is response in Thinking Mode ======')
    print(f'response: {response.choices[0].message.content}')

    # To use instant mode, pass {"thinking" = {"type":"disabled"}}
    response = client.chat.completions.create(
        model=model_name,
        messages=messages,
        stream=False,
        max_tokens=4096,
        extra_body={'thinking': {'type': 'disabled'}},  # this is for official API
        # extra_body= {'chat_template_kwargs': {"thinking": False}}  # this is for vLLM/SGLang
    )
    print('====== Below is response in Instant Mode ======')
    print(f'response: {response.choices[0].message.content}')

Chat Completion with visual content

K2.6 supports Image and Video input.

The following example demonstrates how to call K2.6 API with image input:

import openai
import base64
import requests

def chat_with_image(client: openai.OpenAI, model_name: str):
    url = 'https://huggingface.co/moonshotai/Kimi-K2.6/resolve/main/figures/kimi-logo.png'
    image_base64 = base64.b64encode(requests.get(url).content).decode()
    messages = [
        {
            'role': 'user',
            'content': [
                {'type': 'text', 'text': 'Describe this image in detail.'},
                {
                    'type': 'image_url',
                    'image_url': {'url': f'data:image/png;base64, {image_base64}'},
                },
            ],
        }
    ]

    response = client.chat.completions.create(
        model=model_name, messages=messages, stream=False, max_tokens=8192
    )
    print('====== Below is reasoning content in Thinking Mode ======')
    print(f'reasoning content: {response.choices[0].message.reasoning}')
    print('====== Below is response in Thinking Mode ======')
    print(f'response: {response.choices[0].message.content}')

    # Also support instant mode if you pass {"thinking" = {"type":"disabled"}}
    response = client.chat.completions.create(
        model=model_name,
        messages=messages,
        stream=False,
        max_tokens=4096,
        extra_body={'thinking': {'type': 'disabled'}},  # this is for official API
        # extra_body= {'chat_template_kwargs': {"thinking": False}}  # this is for vLLM/SGLang
    )
    print('====== Below is response in Instant Mode ======')
    print(f'response: {response.choices[0].message.content}')

    return response.choices[0].message.content

The following example demonstrates how to call K2.6 API with video input:

import openai
import base64
import requests

def chat_with_video(client: openai.OpenAI, model_name:str):
    url = 'https://huggingface.co/moonshotai/Kimi-K2.6/resolve/main/figures/demo_video.mp4'
    video_base64 = base64.b64encode(requests.get(url).content).decode()
    messages = [
        {
            "role": "user",
            "content": [
                {"type": "text","text": "Describe the video in detail."},
                {
                    "type": "video_url",
                    "video_url": {"url": f"data:video/mp4;base64,{video_base64}"},
                },
            ],
        }
    ]

    response = client.chat.completions.create(model=model_name, messages=messages)
    print('====== Below is reasoning content in Thinking Mode ======')
    print(f'reasoning content: {response.choices[0].message.reasoning}')
    print('====== Below is response in Thinking Mode ======')
    print(f'response: {response.choices[0].message.content}')

    # Also support instant mode if pass {"thinking" = {"type":"disabled"}}
    response = client.chat.completions.create(
        model=model_name,
        messages=messages,
        stream=False,
        max_tokens=4096,
        extra_body={'thinking': {'type': 'disabled'}},  # this is for official API
        # extra_body= {'chat_template_kwargs': {"thinking": False}}  # this is for vLLM/SGLang
    )
    print('====== Below is response in Instant Mode ======')
    print(f'response: {response.choices[0].message.content}')
    return response.choices[0].message.content

Preserve Thinking

Kimi K2.6 supports preserve_thinking mode, which retains full reasoning content across multi-turn interactions and enhances performance in coding agent scenarios.

This feature is disabled by default. The following example demonstrates how to call K2.6 API in preserve_thinking mode:

def chat_with_preserve_thinking(client: openai.OpenAI, model_name: str):
    messages = [
        {
            "role": "user",
            "content": "Tell me three random numbers."
        },
        {
            "role": "assistant",
            "reasoning_content": "I'll start by listing five numbers: 473, 921, 235, 215, 222, and I'll tell you the first three.",
            "content": "473, 921, 235"
        },
        {
            "role": "user",
            "content": "What are the other two numbers you have in mind?"
        }
    ]

    response = client.chat.completions.create(
        model=model_name,
        messages=messages,
        stream=False,
        max_tokens=4096,
        extra_body={'thinking': {'type': 'enabled', 'keep': 'all'}},  # this is for official API
        # extra_body={"chat_template_kwargs": {"thinking":True, "preserve_thinking": True}},  # this is for vLLM/SGLang
        # We recommend enabling preserve_thinking only in think mode.
    )
    # the assistant should mention 215 and 222 that appear in the prior reasoning content
    print(f"response: {response.choices[0].message.reasoning}")
    return response.choices[0].message.content

Interleaved Thinking and Multi-Step Tool Call

K2.6 shares the same design of Interleaved Thinking and Multi-Step Tool Call as K2 Thinking. For usage example, please refer to the K2 Thinking documentation.

Coding Agent Framework

Kimi K2.6 works best with Kimi Code CLI as its agent framework — give it a try at https://www.kimi.com/code.


7. License

Both the code repository and the model weights are released under the Modified MIT License.


8. Third Party Notices

See THIRD PARTY NOTICES


9. Contact Us

If you have any questions, please reach out at support@moonshot.ai.

Author: unsloth

Likes: 45

Downloads: 0

Tags: transformers, compressed-tensors, unsloth, image-text-to-text, arxiv:2602.02276, base_model:moonshotai/Kimi-K2.6, base_model:finetune:moonshotai/Kimi-K2.6, license:other, endpoints_compatible, region:us

lovis93/crt-animation-terminal-ltx-2.3-lora


license: apache-2.0 tags:

  • text-to-video
  • lora
  • ltx-video
  • ltx-2.3
  • crt
  • retro
  • animation base_model: Lightricks/LTX-Video pipeline_tag: text-to-video

<p align="center"> <img src="./banner/F1_neon_reveal.gif" width="70%"/> </p>

📺 crtanim — CRT / Retro Terminal Video LoRA for LTX‑2.3 22B

An open-source LoRA for LTX‑2.3 22B that locks generations into a real late-80s / early-90s CRT-monitor look — scanlines, phosphor glow, chromatic aberration, barrel curvature, dithering, signal noise, choppy limited-frame motion.

Trained on fal. Runs on fal, ComfyUI, or any other LTX‑2.3 22B runtime that accepts a standard .safetensors LoRA file.

Trigger word: crtanim, (first token of every prompt).

🚀 One-click test on fal: playground


Checkpoints

| File | Steps | Best for | |---|---|---| | crtanim_4000.safetensors | 4 000 | Looser, more expressive motion | | crtanim_10000.safetensors | 10 000 | Cleaner text, sharper structure (recommended) |

Both files are already in ComfyUI key format — drop into ComfyUI/models/loras/, no conversion needed.

Direct URLs:

  • 4 000: https://v3b.fal.media/files/b/0a96b375/yYgKwLBgP9cIx7yJMT0Iy_lora_weights_step_04000.safetensors
  • 10 000: https://v3b.fal.media/files/b/0a96eb30/QratjdCIjYTgJe6jB47OY_lora_weights_step_10000.safetensors

Showcase

<table> <tr><td><img src="./gifs/01_titlecard.gif"/></td><td><img src="./gifs/02_hello_world.gif"/></td><td><img src="./gifs/03_vaporwave.gif"/></td></tr> <tr><td><img src="./gifs/04_hackerman.gif"/></td><td><img src="./gifs/05_404.gif"/></td><td><img src="./gifs/06_underwater.gif"/></td></tr> <tr><td><img src="./gifs/07_btc.gif"/></td><td><img src="./gifs/08_legacy_boss.gif"/></td><td><img src="./gifs/09_crystal_cave.gif"/></td></tr> <tr><td><img src="./gifs/10_devday.gif"/></td><td><img src="./gifs/11_recursion.gif"/></td><td><img src="./gifs/12_end.gif"/></td></tr> </table>

Range (6k vs 8k vs 10k training steps)

<img src="./comparison/grid_fgh2.gif" width="100%"/>

Terminal text rendering (10k is the one)

<img src="./comparison/grid_terminal_distilled.gif" width="100%"/>

Usage — fal

curl --request POST \
  --url https://fal.run/fal-ai/ltx-2.3-22b/text-to-video/lora \
  --header "Authorization: Key $FAL_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "prompt": "crtanim, a dense CRT terminal typing \"$ claude --continue\" in glowing green pixel font, scanlines, phosphor glow, low choppy frame rate.",
    "loras": [{"path": "https://v3b.fal.media/files/b/0a96eb30/QratjdCIjYTgJe6jB47OY_lora_weights_step_10000.safetensors", "scale": 1.0}],
    "video_size": {"width": 1024, "height": 1024},
    "num_frames": 121,
    "fps": 24,
    "enable_prompt_expansion": false,
    "negative_prompt": "blurry, low quality, jpeg artifacts, compression artifacts, watermark, signature"
  }'

Important settings:

  • enable_prompt_expansion: false — keeps your prompt intact
  • Override negative_prompt — the default negative includes "text, on screen text, titles" which breaks readable text
  • video_size 1024×1024 matches training ratio

Works on every LTX‑2.3 22B endpoint: text-to-video/lora, image-to-video/lora, distilled/text-to-video/lora.


Prompt recipe

Always start with crtanim, then describe, in order: CRT aesthetic → color palette → animation style → subject → composition → literal text (in quotes) → mood.

Example:

crtanim, a CRT aesthetic with scanlines, green phosphor glow, bloom, chromatic aberration, a terminal typing out exactly "$ ls -la" character by character in glowing green blocky pixel font, a blinking cursor below, static centered composition, low choppy frame rate, hacker mood.

Caveats

  • Experimental first release — text rendering and motion can be hit-or-miss.
  • On-screen text works best with short strings (1–3 words / short commands).
  • The LoRA pulls toward static framing; ask for camera movement explicitly if you want it.
  • Always override the default negative prompt if you want readable text.

Training

  • Base: LTX‑2.3 22B · Trainer: fal-ai/ltx23-video-trainer
  • Aspect 1:1 · LR 2e‑4
  • Dataset: small hand-curated set of CRT / pixel-art clips

License

Apache 2.0.

Credits

By @lovisdotio. Trained on fal.

Author: lovis93

Likes: 11

Downloads: 0

Tags: text-to-video, lora, ltx-video, ltx-2.3, crt, retro, animation, base_model:Lightricks/LTX-Video, base_model:adapter:Lightricks/LTX-Video, license:apache-2.0, region:us

ubergarm/Kimi-K2.6-GGUF


quantized_by: ubergarm pipeline_tag: text-generation base_model: moonshotai/Kimi-K2.6 license: other license_name: modified-mit license_link: https://huggingface.co/moonshotai/Kimi-K2.6/blob/main/LICENSE base_model_relation: quantized tags:

  • mla
  • imatrix
  • conversational
  • ik_llama.cpp

imatrix Quantization of moonshotai/Kimi-K2.6

Except for the Q4_X, the other quants in this collection REQUIRE ik_llama.cpp fork to support the ik's latest SOTA quants and optimizations! Do not download these big files and expect them to run on mainline vanilla llama.cpp, ollama, LM Studio, KoboldCpp, etc! NOTE ik_llama.cpp can also run your existing GGUFs from bartowski, unsloth, mradermacher, etc if you want to try it out before downloading my quants.

Some of ik's new quants are supported with Nexesenex/croco.cpp fork of KoboldCPP with Windows builds for CUDA 12.9. Also check for Windows builds by Thireus here. which have been CUDA 12.8.

These quants provide best in class perplexity for the given memory footprint.

Big Thanks

Shout out to Wendell and the Level1Techs crew, the community Forums, YouTube Channel! BIG thanks for providing BIG hardware expertise and access to run these experiments and make these great quants available to the community!!!

Also thanks to all the folks in the quanting and inferencing community on BeaverAI Club Discord and on r/LocalLLaMA for tips and tricks helping each other run, test, and benchmark all the fun new models!

Finally, I really appreciate all the support from aifoundry.org so check out their open source RISC-V solutions, and of course huggingface for hosting all these big quants!

Quant Collection

Perplexity computed against wiki.test.raw. (lower is "better")

Perplexity Chart

Q4_X 543.617 GiB (4.549 BPW)

PPL over 568 chunks for n_ctx=512 = 1.8433 +/- 0.00721

This quant is the "full size" model made using the Q4_X patch to match moonshot official int4 released as described below. It does not use imatrix and is compatible on both ik and mainline llama.cpp

<details> <summary>👈 Secret Recipe</summary>
#!/usr/bin/env bash

# https://github.com/ikawrakow/ik_llama.cpp/pull/1556#issuecomment-4282712006
# Q4_0 (patched) routed experts approximating original QAT design
# Q8_0 everything else

custom="
## Attention [0-60] (GPU)
blk\..*\.attn_k_b\.weight=q8_0
blk\..*\.attn_v_b\.weight=q8_0

# Balance of attn tensors
blk\..*\.attn_kv_a_mqa\.weight=q8_0
blk\..*\.attn_q_a\.weight=q8_0
blk\..*\.attn_q_b\.weight=q8_0
blk\..*\.attn_output\.weight=q8_0

## First Single Dense Layer [0] (GPU)
blk\..*\.ffn_down\.weight=q8_0
blk\..*\.ffn_(gate|up)\.weight=q8_0

## Shared Expert [1-60] (GPU)
blk\..*\.ffn_down_shexp\.weight=q8_0
blk\..*\.ffn_(gate|up)_shexp\.weight=q8_0

## Routed Experts [1-60] (CPU)
blk\..*\.ffn_down_exps\.weight=q4_0
blk\..*\.ffn_(gate|up)_exps\.weight=q4_0

token_embd\.weight=q8_0
output\.weight=q8_0
"

custom=$(
  echo "$custom" | grep -v '^#' | \
  sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
)

numactl -N ${SOCKET} -m ${SOCKET} \
./build/bin/llama-quantize \
    --custom-q "$custom" \
    /mnt/data/models/ubergarm/Kimi-K2.6-GGUF/Kimi-K2.6-384x14B-BF16-00001-of-00046.gguf \
    /mnt/data/models/ubergarm/Kimi-K2.6-GGUF/Kimi-K2.6-Q4_X.gguf \
    Q8_0 \
    128
</details>

IQ3_K 459.945 GiB (3.849 BPW)

PPL over 568 chunks for n_ctx=512 = 1.9012 +/- 0.00753

Note: Just on this quant, imatrix was applied only to ffn_(gate|up)_exps tensors that are iq3_k. Also this recipe is just a smooch bigger than previous Kimi-K2.5 version, but still fits nicely in under 512GB.

<details> <summary>👈 Secret Recipe</summary>
#!/usr/bin/env bash

custom="
## Attention [0-60] (GPU)
blk\..*\.attn_k_b\.weight=q8_0
blk\..*\.attn_v_b\.weight=q8_0

# Balance of attn tensors
blk\..*\.attn_kv_a_mqa\.weight=q8_0
blk\..*\.attn_q_a\.weight=q8_0
blk\..*\.attn_q_b\.weight=q8_0
blk\..*\.attn_output\.weight=q8_0

## First Single Dense Layer [0] (GPU)
blk\..*\.ffn_down\.weight=q8_0
blk\..*\.ffn_(gate|up)\.weight=q8_0

## Shared Expert [1-60] (GPU)
blk\..*\.ffn_down_shexp\.weight=q8_0
blk\..*\.ffn_(gate|up)_shexp\.weight=q8_0

## Routed Experts [1-60] (CPU)
## NOTE: imatrix is *only* applied to the iq3_k tensors for this recipe
blk\..*\.ffn_down_exps\.weight=q4_0
blk\..*\.ffn_(gate|up)_exps\.weight=iq3_k

## NOTE: previous recipe used iq6_k for both of these
token_embd\.weight=q8_0
output\.weight=q8_0
"

custom=$(
  echo "$custom" | grep -v '^#' | \
  sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
)

numactl -N ${SOCKET} -m ${SOCKET} \
./build/bin/llama-quantize \
    --custom-q "$custom" \
    --imatrix /mnt/data/models/ubergarm/Kimi-K2.6-GGUF/imatrix-Kimi-K2.6-Q4_X.dat \
    --include-weights ffn_gate_exps \
    --include-weights ffn_up_exps \
    /mnt/data/models/ubergarm/Kimi-K2.6-GGUF/Kimi-K2.6-384x14B-BF16-00001-of-00046.gguf \
    /mnt/data/models/ubergarm/Kimi-K2.6-GGUF/Kimi-K2.6-IQ3_K.gguf \
    IQ3_K \
    128
</details>

smol-IQ3_KS 388.258 GiB (3.249 BPW)

TODO

<details> <summary>👈 Secret Recipe</summary>
#!/usr/bin/env bash

custom="
## Attention [0-60] (GPU)
blk\..*\.attn_k_b\.weight=q8_0
blk\..*\.attn_v_b\.weight=q8_0

# Balance of attn tensors
blk\..*\.attn_kv_a_mqa\.weight=q8_0
blk\..*\.attn_q_a\.weight=q8_0
blk\..*\.attn_q_b\.weight=q8_0
blk\..*\.attn_output\.weight=q8_0

## First Single Dense Layer [0] (GPU)
blk\..*\.ffn_down\.weight=q8_0
blk\..*\.ffn_(gate|up)\.weight=q8_0

## Shared Expert [1-60] (GPU)
blk\..*\.ffn_down_shexp\.weight=q8_0
blk\..*\.ffn_(gate|up)_shexp\.weight=q8_0

## Routed Experts [1-60] (CPU)
blk\..*\.ffn_down_exps\.weight=iq3_ks
blk\..*\.ffn_(gate|up)_exps\.weight=iq3_ks

token_embd\.weight=iq4_k
output\.weight=iq6_k
"

custom=$(
  echo "$custom" | grep -v '^#' | \
  sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
)

numactl -N ${SOCKET} -m ${SOCKET} \
./build/bin/llama-quantize \
    --custom-q "$custom" \
    --imatrix /mnt/data/models/ubergarm/Kimi-K2.6-GGUF/imatrix-Kimi-K2.6-Q4_X.dat \
    /mnt/data/models/ubergarm/Kimi-K2.6-GGUF/Kimi-K2.6-384x14B-BF16-00001-of-00046.gguf \
    /mnt/data/models/ubergarm/Kimi-K2.6-GGUF/Kimi-K2.6-smol-IQ3_KS.gguf \
    IQ3_KS \
    128
</details>

smol-IQ2_KL 329.195 GiB (2.755 BPW)

PPL over 568 chunks for n_ctx=512 = 2.2190 +/- 0.00936

<details> <summary>👈 Secret Recipe</summary>
#!/usr/bin/env bash

custom="
## Attention [0-60] (GPU)
blk\..*\.attn_k_b\.weight=q8_0
blk\..*\.attn_v_b\.weight=q8_0

# Balance of attn tensors
blk\..*\.attn_kv_a_mqa\.weight=q8_0
blk\..*\.attn_q_a\.weight=q8_0
blk\..*\.attn_q_b\.weight=q8_0
blk\..*\.attn_output\.weight=q8_0

## First Single Dense Layer [0] (GPU)
blk\..*\.ffn_down\.weight=q8_0
blk\..*\.ffn_(gate|up)\.weight=q8_0

## Shared Expert [1-60] (GPU)
blk\..*\.ffn_down_shexp\.weight=q8_0
blk\..*\.ffn_(gate|up)_shexp\.weight=q8_0

## Routed Experts [1-60] (CPU)
blk\..*\.ffn_down_exps\.weight=iq2_kl
blk\..*\.ffn_(gate|up)_exps\.weight=iq2_kl

token_embd\.weight=iq4_k
output\.weight=iq6_k
"

custom=$(
  echo "$custom" | grep -v '^#' | \
  sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
)

numactl -N ${SOCKET} -m ${SOCKET} \
./build/bin/llama-quantize \
    --custom-q "$custom" \
    --imatrix /mnt/data/models/ubergarm/Kimi-K2.6-GGUF/imatrix-Kimi-K2.6-Q4_X.dat \
    /mnt/data/models/ubergarm/Kimi-K2.6-GGUF/Kimi-K2.6-384x14B-BF16-00001-of-00046.gguf \
    /mnt/data/models/ubergarm/Kimi-K2.6-GGUF/Kimi-K2.6-smol-IQ2_KL.gguf \
    IQ2_KL \
    128
</details>

smol-IQ2_KS 270.133 GiB (2.261 BPW)

TODO

<details> <summary>👈 Secret Recipe</summary>
#!/usr/bin/env bash

custom="
## Attention [0-60] (GPU)
blk\..*\.attn_k_b\.weight=q8_0
blk\..*\.attn_v_b\.weight=q8_0

# Balance of attn tensors
blk\..*\.attn_kv_a_mqa\.weight=q8_0
blk\..*\.attn_q_a\.weight=q8_0
blk\..*\.attn_q_b\.weight=q8_0
blk\..*\.attn_output\.weight=q8_0

## First Single Dense Layer [0] (GPU)
blk\..*\.ffn_down\.weight=q8_0
blk\..*\.ffn_(gate|up)\.weight=q8_0

## Shared Expert [1-60] (GPU)
blk\..*\.ffn_down_shexp\.weight=q8_0
blk\..*\.ffn_(gate|up)_shexp\.weight=q8_0

## Routed Experts [1-60] (CPU)
blk\..*\.ffn_down_exps\.weight=iq2_ks
blk\..*\.ffn_(gate|up)_exps\.weight=iq2_ks

token_embd\.weight=iq4_k
output\.weight=iq6_k
"

custom=$(
  echo "$custom" | grep -v '^#' | \
  sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
)

numactl -N ${SOCKET} -m ${SOCKET} \
./build/bin/llama-quantize \
    --custom-q "$custom" \
    --imatrix /mnt/data/models/ubergarm/Kimi-K2.6-GGUF/imatrix-Kimi-K2.6-Q4_X.dat \
    /mnt/data/models/ubergarm/Kimi-K2.6-GGUF/Kimi-K2.6-384x14B-BF16-00001-of-00046.gguf \
    /mnt/data/models/ubergarm/Kimi-K2.6-GGUF/Kimi-K2.6-smol-IQ2_KS.gguf \
    IQ2_KS \
    128
</details>

smol-IQ1_KT 218.936 GiB (1.832 BPW)

PPL over 568 chunks for n_ctx=512 = 3.3252 +/- 0.01613

only for the desperate

Also keep in mind KT trellis quants generally are slower token generation given likely compute bottleneck if running on CPU, but if it is all you can fit then well... They are fast on GPU similar to EXL3.

<details> <summary>👈 Secret Recipe</summary>
#!/usr/bin/env bash

custom="
## Attention [0-60] (GPU)
blk\..*\.attn_k_b\.weight=q8_0
blk\..*\.attn_v_b\.weight=q8_0

# Balance of attn tensors
blk\..*\.attn_kv_a_mqa\.weight=q8_0
blk\..*\.attn_q_a\.weight=q8_0
blk\..*\.attn_q_b\.weight=q8_0
blk\..*\.attn_output\.weight=q8_0

## First Single Dense Layer [0] (GPU)
blk\..*\.ffn_down\.weight=q8_0
blk\..*\.ffn_(gate|up)\.weight=q8_0

## Shared Expert [1-60] (GPU)
blk\..*\.ffn_down_shexp\.weight=q8_0
blk\..*\.ffn_(gate|up)_shexp\.weight=q8_0

## Routed Experts [1-60] (CPU)
blk\..*\.ffn_down_exps\.weight=iq1_kt
blk\..*\.ffn_(gate|up)_exps\.weight=iq1_kt

token_embd\.weight=iq4_k
output\.weight=iq6_k
"

custom=$(
  echo "$custom" | grep -v '^#' | \
  sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
)

numactl -N ${SOCKET} -m ${SOCKET} \
./build/bin/llama-quantize \
    --custom-q "$custom" \
    --imatrix /mnt/data/models/ubergarm/Kimi-K2.6-GGUF/imatrix-Kimi-K2.6-Q4_X.dat \
    /mnt/data/models/ubergarm/Kimi-K2.6-GGUF/Kimi-K2.6-384x14B-BF16-00001-of-00046.gguf \
    /mnt/data/models/ubergarm/Kimi-K2.6-GGUF/Kimi-K2.6-smol-IQ1_KT.gguf \
    IQ1_KT \
    128
</details>

Quick Start

# Clone and checkout
$ git clone https://github.com/ikawrakow/ik_llama.cpp
$ cd ik_llama.cpp

# Build for hybrid CPU+CUDA (or set GGML_CUDA=OFF for CPU only)
$ cmake -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON
$ cmake --build build --config Release -j $(nproc)

# Hybrid CPU+GPU Inference
echo TODO

# CPU-only inference
numactl -N ${SOCKET} -m ${SOCKET} \
./build/bin/llama-server \
    --model "$model"\
    --alias ubergarm/Kimi-K2.6-GGUF \
    -muge \
    --merge-qkv \
    --ctx-size 131072 \
    -ctk f16 \
    -mla 3 \
    --parallel 1 \
    -ub 4096 -b 4096 \
    --threads 96 \
    --threads-batch 128 \
    --numa numactl \
    --host 127.0.0.1 \
    --port 8080 \
    --no-mmap \
    --jinja

Bring your own jinja chat template with --chat-template-file myTemplate.jinja and may or may not need --special still experimenting.

Seems to be working with spec-decoding e.g. --spec-type ngram-map-k4v --spec-ngram-size-n 8 --spec-ngram-size-m 8 --spec-ngram-min-hits 2 --draft-min 1 --draft-max 12

Increase prompt cache with stuff like -cram 16384 --prompt-cache-all.

Q4_X Patch

https://github.com/ikawrakow/ik_llama.cpp/pull/1556#issuecomment-4282712006

References

Author: ubergarm

Likes: 10

Downloads: 0

Tags: gguf, mla, imatrix, conversational, ik_llama.cpp, text-generation, base_model:moonshotai/Kimi-K2.6, base_model:quantized:moonshotai/Kimi-K2.6, license:other, endpoints_compatible, region:us

unsloth/Kimi-K2.6


base_model:

  • moonshotai/Kimi-K2.6 tags:
  • compressed-tensors
  • unsloth license: other license_name: modified-mit library_name: transformers pipeline_tag: image-text-to-text

[!NOTE] Includes Unsloth chat template fixes! <br> For llama.cpp, use --jinja

<div> <p style="margin-top: 0;margin-bottom: 0;"> <em><a href="https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf">Unsloth Dynamic 2.0</a> achieves superior accuracy & outperforms other leading quants.</em> </p> <div style="display: flex; gap: 5px; align-items: center; "> <a href="https://github.com/unslothai/unsloth/"> <img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="133"> </a> <a href="https://discord.gg/unsloth"> <img src="https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png" width="173"> </a> <a href="https://docs.unsloth.ai/"> <img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143"> </a> </div> </div> <div align="center"> <picture> <img src="figures/kimi-logo.png" width="30%" alt="Kimi K2.6"> </picture> </div> <hr> <div align="center" style="line-height:1"> <a href="https://www.kimi.com" target="_blank"><img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-Kimi%20K2.6-ff6b6b?color=1783ff&logoColor=white"/></a> <a href="https://www.moonshot.ai" target="_blank"><img alt="Homepage" src="https://img.shields.io/badge/Homepage-Moonshot%20AI-white?logo=Kimi&logoColor=white"/></a> </div> <div align="center" style="line-height: 1;"> <a href="https://huggingface.co/moonshotai" target="_blank"><img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Moonshot%20AI-ffc107?color=ffc107&logoColor=white"/></a> <a href="https://twitter.com/kimi_moonshot" target="_blank"><img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-Kimi.ai-white?logo=x&logoColor=white"/></a> <a href="https://discord.gg/TYU2fdJykW" target="_blank"><img alt="Discord" src="https://img.shields.io/badge/Discord-Kimi.ai-white?logo=discord&logoColor=white"/></a> </div> <div align="center" style="line-height: 1;"> <a href="https://huggingface.co/moonshotai/Kimi-K2.6/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/badge/License-Modified_MIT-f5de53?&color=f5de53"/></a> </div> <p align="center"> <b>📰&nbsp;&nbsp;<a href="https://www.kimi.com/blog/kimi-k2-6.html">Tech Blog</a></b> </p>

1. Model Introduction

Kimi K2.6 is an open-source, native multimodal agentic model that advances practical capabilities in long-horizon coding, coding-driven design, proactive autonomous execution, and swarm-based task orchestration.

Key Features

  • Long-Horizon Coding: K2.6 achieves significant improvements on complex, end-to-end coding tasks, generalizing robustly across programming languages (Rust, Go, Python) and domains spanning front-end, DevOps, and performance optimization.
  • Coding-Driven Design: K2.6 is capable of transforming simple prompts and visual inputs into production-ready interfaces and lightweight full-stack workflows, generating structured layouts, interactive elements, and rich animations with deliberate aesthetic precision.
  • Elevated Agent Swarm: Scaling horizontally to 300 sub-agents executing 4,000 coordinated steps, K2.6 can dynamically decompose tasks into parallel, domain-specialized subtasks, delivering end-to-end outputs from documents to websites to spreadsheets in a single autonomous run.
  • Proactive & Open Orchestration: For autonomous tasks, K2.6 demonstrates strong performance in powering persistent, 24/7 background agents that proactively manage schedules, execute code, and orchestrate cross-platform operations without human oversight.

2. Model Summary

<div align="center">

| | | |:---:|:---:| | Architecture | Mixture-of-Experts (MoE) | | Total Parameters | 1T | | Activated Parameters | 32B | | Number of Layers (Dense layer included) | 61 | | Number of Dense Layers | 1 | | Attention Hidden Dimension | 7168 | | MoE Hidden Dimension (per Expert) | 2048 | | Number of Attention Heads | 64 | | Number of Experts | 384 | | Selected Experts per Token | 8 | | Number of Shared Experts | 1 | | Vocabulary Size | 160K | | Context Length | 256K | | Attention Mechanism | MLA | | Activation Function | SwiGLU | | Vision Encoder | MoonViT | | Parameters of Vision Encoder | 400M |

</div>

3. Evaluation Results

<div align="center"> <table> <thead> <tr> <th align="center">Benchmark</th> <th align="center"><sup>Kimi K2.6</sup></th> <th align="center"><sup>GPT-5.4 <br><sup>(xhigh)</sup></sup></th> <th align="center"><sup>Claude Opus 4.6 <br><sup>(max effort)</sup></sup></th> <th align="center"><sup>Gemini 3.1 Pro<br><sup>(thinking high)</sup></sup></th> <th align="center"><sup>Kimi K2.5</sup></th> </tr> </thead> <tbody> <tr> <td align="center" colspan=6><strong>Agentic</strong></td> </tr> <tr> <td align="center" style="vertical-align: middle">HLE-Full<br>(w/ tools)</td> <td align="center" style="vertical-align: middle">54.0</td> <td align="center" style="vertical-align: middle">52.1</td> <td align="center" style="vertical-align: middle">53.0</td> <td align="center" style="vertical-align: middle">51.4</td> <td align="center" style="vertical-align: middle">50.2</td> </tr> <tr> <td align="center" style="vertical-align: middle">BrowseComp</td> <td align="center" style="vertical-align: middle">83.2</td> <td align="center" style="vertical-align: middle" rowspan="2">82.7</td> <td align="center" style="vertical-align: middle" rowspan="2">83.7</td> <td align="center" style="vertical-align: middle" rowspan="2">85.9</td> <td align="center" style="vertical-align: middle">74.9</td> </tr> <tr> <td align="center" style="vertical-align: middle">BrowseComp<br>(Agent Swarm)</td> <td align="center" style="vertical-align: middle">86.3</td> <td align="center" style="vertical-align: middle">78.4</td> </tr> <tr> <td align="center" style="vertical-align: middle">DeepSearchQA<br>(f1-score)</td> <td align="center" style="vertical-align: middle">92.5</td> <td align="center" style="vertical-align: middle">78.6</td> <td align="center" style="vertical-align: middle">91.3</td> <td align="center" style="vertical-align: middle">81.9</td> <td align="center" style="vertical-align: middle">89.0</td> </tr> <tr> <td align="center" style="vertical-align: middle">DeepSearchQA<br>(accuracy)</td> <td align="center" style="vertical-align: middle">83.0</td> <td align="center" style="vertical-align: middle">63.7</td> <td align="center" style="vertical-align: middle">80.6</td> <td align="center" style="vertical-align: middle">60.2</td> <td align="center" style="vertical-align: middle">77.1</td> </tr> <tr> <td align="center" style="vertical-align: middle">WideSearch<br> (item-f1)</td> <td align="center" style="vertical-align: middle">80.8</td> <td align="center" style="vertical-align: middle">-</td> <td align="center" style="vertical-align: middle">-</td> <td align="center" style="vertical-align: middle">-</td> <td align="center" style="vertical-align: middle">72.7</td> </tr> <tr> <td align="center" style="vertical-align: middle">Toolathlon</td> <td align="center" style="vertical-align: middle">50.0</td> <td align="center" style="vertical-align: middle">54.6</td> <td align="center" style="vertical-align: middle">47.2</td> <td align="center" style="vertical-align: middle">48.8</td> <td align="center" style="vertical-align: middle">27.8</td> </tr> <tr> <td align="center" style="vertical-align: middle">MCPMark</td> <td align="center" style="vertical-align: middle">55.9</td> <td align="center" style="vertical-align: middle">62.5*</td> <td align="center" style="vertical-align: middle">56.7*</td> <td align="center" style="vertical-align: middle">55.9*</td> <td align="center" style="vertical-align: middle">29.5</td> </tr> <tr> <td align="center" style="vertical-align: middle">Claw Eval (pass^3)</td> <td align="center" style="vertical-align: middle">62.3</td> <td align="center" style="vertical-align: middle">60.3</td> <td align="center" style="vertical-align: middle">70.4</td> <td align="center" style="vertical-align: middle">57.8</td> <td align="center" style="vertical-align: middle">52.3</td> </tr> <tr> <td align="center" style="vertical-align: middle">Claw Eval (pass@3)</td> <td align="center" style="vertical-align: middle">80.9</td> <td align="center" style="vertical-align: middle">78.4</td> <td align="center" style="vertical-align: middle">82.4</td> <td align="center" style="vertical-align: middle">82.9</td> <td align="center" style="vertical-align: middle">75.4</td> </tr> <tr> <td align="center" style="vertical-align: middle">APEX-Agents</td> <td align="center" style="vertical-align: middle">27.9</td> <td align="center" style="vertical-align: middle">33.3</td> <td align="center" style="vertical-align: middle">33.0</td> <td align="center" style="vertical-align: middle">32.0</td> <td align="center" style="vertical-align: middle">11.5</td> </tr> <tr> <td align="center" style="vertical-align: middle">OSWorld-Verified</td> <td align="center" style="vertical-align: middle">73.1</td> <td align="center" style="vertical-align: middle">75.0</td> <td align="center" style="vertical-align: middle">72.7</td> <td align="center" style="vertical-align: middle">-</td> <td align="center" style="vertical-align: middle">63.3</td> </tr> <tr> <td align="center" colspan=6><strong>Coding</strong></td> </tr> <tr> <td align="center" style="vertical-align: middle">Terminal-Bench 2.0<br>(Terminus-2)</td> <td align="center" style="vertical-align: middle">66.7</td> <td align="center" style="vertical-align: middle">65.4*</td> <td align="center" style="vertical-align: middle">65.4</td> <td align="center" style="vertical-align: middle">68.5</td> <td align="center" style="vertical-align: middle">50.8</td> </tr> <tr> <td align="center" style="vertical-align: middle">SWE-Bench Pro</td> <td align="center" style="vertical-align: middle">58.6</td> <td align="center" style="vertical-align: middle">57.7</td> <td align="center" style="vertical-align: middle">53.4</td> <td align="center" style="vertical-align: middle">54.2</td> <td align="center" style="vertical-align: middle">50.7</td> </tr> <tr> <td align="center" style="vertical-align: middle">SWE-Bench Multilingual</td> <td align="center" style="vertical-align: middle">76.7</td> <td align="center" style="vertical-align: middle">-</td> <td align="center" style="vertical-align: middle">77.8</td> <td align="center" style="vertical-align: middle">76.9*</td> <td align="center" style="vertical-align: middle">73.0</td> </tr> <tr> <td align="center" style="vertical-align: middle">SWE-Bench Verified</td> <td align="center" style="vertical-align: middle">80.2</td> <td align="center" style="vertical-align: middle">-</td> <td align="center" style="vertical-align: middle">80.8</td> <td align="center" style="vertical-align: middle">80.6</td> <td align="center" style="vertical-align: middle">76.8</td> </tr> <tr> <td align="center" style="vertical-align: middle">SciCode</td> <td align="center" style="vertical-align: middle">52.2</td> <td align="center" style="vertical-align: middle">56.6</td> <td align="center" style="vertical-align: middle">51.9</td> <td align="center" style="vertical-align: middle">58.9</td> <td align="center" style="vertical-align: middle">48.7</td> </tr> <tr> <td align="center" style="vertical-align: middle">OJBench (python)</td> <td align="center" style="vertical-align: middle">60.6</td> <td align="center" style="vertical-align: middle">-</td> <td align="center" style="vertical-align: middle">60.3</td> <td align="center" style="vertical-align: middle">70.7</td> <td align="center" style="vertical-align: middle">54.7</td> </tr> <tr> <td align="center" style="vertical-align: middle">LiveCodeBench (v6)</td> <td align="center" style="vertical-align: middle">89.6</td> <td align="center" style="vertical-align: middle">-</td> <td align="center" style="vertical-align: middle">88.8</td> <td align="center" style="vertical-align: middle">91.7</td> <td align="center" style="vertical-align: middle">85.0</td> </tr> <tr> <td align="center" colspan=6><strong>Reasoning &amp; Knowledge</strong></td> </tr> <tr> <td align="center" style="vertical-align: middle">HLE-Full</td> <td align="center" style="vertical-align: middle">34.7</td> <td align="center" style="vertical-align: middle">39.8</td> <td align="center" style="vertical-align: middle">40.0</td> <td align="center" style="vertical-align: middle">44.4</td> <td align="center" style="vertical-align: middle">30.1</td> </tr> <tr> <td align="center" style="vertical-align: middle">AIME 2026</td> <td align="center" style="vertical-align: middle">96.4</td> <td align="center" style="vertical-align: middle">99.2</td> <td align="center" style="vertical-align: middle">96.7</td> <td align="center" style="vertical-align: middle">98.3</td> <td align="center" style="vertical-align: middle">95.8</td> </tr> <tr> <td align="center" style="vertical-align: middle">HMMT 2026 (Feb)</td> <td align="center" style="vertical-align: middle">92.7</td> <td align="center" style="vertical-align: middle">97.7</td> <td align="center" style="vertical-align: middle">96.2</td> <td align="center" style="vertical-align: middle">94.7</td> <td align="center" style="vertical-align: middle">87.1</td> </tr> <tr> <td align="center" style="vertical-align: middle">IMO-AnswerBench</td> <td align="center" style="vertical-align: middle">86.0</td> <td align="center" style="vertical-align: middle">91.4</td> <td align="center" style="vertical-align: middle">75.3</td> <td align="center" style="vertical-align: middle">91.0*</td> <td align="center" style="vertical-align: middle">81.8</td> </tr> <tr> <td align="center" style="vertical-align: middle">GPQA-Diamond</td> <td align="center" style="vertical-align: middle">90.5</td> <td align="center" style="vertical-align: middle">92.8</td> <td align="center" style="vertical-align: middle">91.3</td> <td align="center" style="vertical-align: middle">94.3</td> <td align="center" style="vertical-align: middle">87.6</td> </tr> <tr> <td align="center" colspan=6><strong>Vision</strong></td> </tr> <tr> <td align="center" style="vertical-align: middle">MMMU-Pro</td> <td align="center" style="vertical-align: middle">79.4</td> <td align="center" style="vertical-align: middle">81.2</td> <td align="center" style="vertical-align: middle">73.9</td> <td align="center" style="vertical-align: middle">83.0*</td> <td align="center" style="vertical-align: middle">78.5</td> </tr> <tr> <td align="center" style="vertical-align: middle">MMMU-Pro (w/ python)</td> <td align="center" style="vertical-align: middle">80.1</td> <td align="center" style="vertical-align: middle">82.1</td> <td align="center" style="vertical-align: middle">77.3</td> <td align="center" style="vertical-align: middle">85.3*</td> <td align="center" style="vertical-align: middle">77.7</td> </tr> <tr> <td align="center" style="vertical-align: middle">CharXiv (RQ)</td> <td align="center" style="vertical-align: middle">80.4</td> <td align="center" style="vertical-align: middle">82.8*</td> <td align="center" style="vertical-align: middle">69.1</td> <td align="center" style="vertical-align: middle">80.2*</td> <td align="center" style="vertical-align: middle">77.5</td> </tr> <tr> <td align="center" style="vertical-align: middle">CharXiv (RQ) (w/ python)</td> <td align="center" style="vertical-align: middle">86.7</td> <td align="center" style="vertical-align: middle">90.0*</td> <td align="center" style="vertical-align: middle">84.7</td> <td align="center" style="vertical-align: middle">89.9*</td> <td align="center" style="vertical-align: middle">78.7</td> </tr> <tr> <td align="center" style="vertical-align: middle">MathVision</td> <td align="center" style="vertical-align: middle">87.4</td> <td align="center" style="vertical-align: middle">92.0*</td> <td align="center" style="vertical-align: middle">71.2*</td> <td align="center" style="vertical-align: middle">89.8*</td> <td align="center" style="vertical-align: middle">84.2</td> </tr> <tr> <td align="center" style="vertical-align: middle">MathVision (w/ python)</td> <td align="center" style="vertical-align: middle">93.2</td> <td align="center" style="vertical-align: middle">96.1*</td> <td align="center" style="vertical-align: middle">84.6*</td> <td align="center" style="vertical-align: middle">95.7*</td> <td align="center" style="vertical-align: middle">85.0</td> </tr> <tr> <td align="center" style="vertical-align: middle">BabyVision</td> <td align="center" style="vertical-align: middle">39.8</td> <td align="center" style="vertical-align: middle">49.7</td> <td align="center" style="vertical-align: middle">14.8</td> <td align="center" style="vertical-align: middle">51.6</td> <td align="center" style="vertical-align: middle">36.5</td> </tr> <tr> <td align="center" style="vertical-align: middle">BabyVision (w/ python)</td> <td align="center" style="vertical-align: middle">68.5</td> <td align="center" style="vertical-align: middle">80.2*</td> <td align="center" style="vertical-align: middle">38.4*</td> <td align="center" style="vertical-align: middle">68.3*</td> <td align="center" style="vertical-align: middle">40.5</td> </tr> <tr> <td align="center" style="vertical-align: middle">V* (w/ python)</td> <td align="center" style="vertical-align: middle">96.9</td> <td align="center" style="vertical-align: middle">98.4*</td> <td align="center" style="vertical-align: middle">86.4*</td> <td align="center" style="vertical-align: middle">96.9*</td> <td align="center" style="vertical-align: middle">86.9</td> </tr> </tbody> </table> </div> <details> <summary><b>Footnotes</b></summary>
  1. General Testing Details
    • We report results for Kimi K2.6 and Kimi K2.5 with thinking mode enabled, Claude Opus 4.6 with max effort, GPT-5.4 with xhigh reasoning effort, and Gemini 3.1 Pro with a high thinking level.
    • Unless otherwise specified, all Kimi K2.6 experiments were conducted with temperature = 1.0, top-p = 1.0, and a context length of 262,144 tokens.
    • Benchmarks without publicly available scores were re-evaluated under the same conditions used for Kimi K2.6 and are marked with an asterisk (*). Except where noted with an asterisk, all other results are cited from official reports.
  2. Reasoning Benchmarks
    • IMO-AnswerBench scores for GPT-5.4 and Claude 4.6 were obtained from z.ai/blog/glm-5.1.
    • Humanity's Last Exam (HLE) and other reasoning tasks were evaluated with a maximum generation length of 98,304 tokens. By default, we report results on the HLE full set. For the text-only subset, Kimi K2.6 achieves 36.4% accuracy without tools and 55.5% with tools.
  3. Tool-Augmented / Agentic Tasks
    • Kimi K2.6 was equipped with search, code-interpreter, and web-browsing tools for HLE with tools, BrowseComp, DeepSearchQA, and WideSearch.
    • For HLE-Full with tools, the maximum generation length is 262,144 tokens with a per-step limit of 49,152 tokens. We employ a simple context management strategy: once the context window exceeds the threshold, only the most recent round of tool-related messages is retained.
    • For BrowseComp, we report scores obtained with context management using the same discard-all strategy as Kimi K2.5 and DeepSeek-V3.2.
    • For DeepSearchQA, no context management was applied to Kimi K2.6 tests, and tasks exceeding the supported context length were directly counted as failed. Scores for Claude Opus 4.6, GPT-5.4, and Gemini 3.1 Pro on DeepSearchQA are cited from the Claude Opus 4.7 System Card.
    • For WideSearch, we report results under the "hide tool result" context management setting. Once the context window exceeds the threshold, only the most recent round of tool-related messages is retained.
    • The test system prompts are identical to those used in the Kimi K2.5 technical report.
    • Claw Eval was conducted using version 1.1 with max-tokens-per-step = 16384.
    • For APEX-Agents, we evaluate 452 tasks from the public 480-task release, as done by Artificial Analysis(excluding Investment Banking Worlds 244 and 246, which have external runtime dependencies)
  4. Coding Tasks
    • Terminal-Bench 2.0 scores were obtained with the default agent framework (Terminus-2) and the provided JSON parser, operating in preserve thinking mode.
    • For the SWE-Bench series of evaluations (including Verified, Multilingual, and Pro), we used an in-house evaluation framework adapted from SWE-agent. This framework includes a minimal set of tools—bash tool, createfile tool, insert tool, view tool, strreplace tool, and submit tool.
    • All reported scores for coding tasks are averaged over 10 independent runs.
  5. Vision Benchmarks
    • Max-tokens = 98,304, averaged over three runs (avg@3).
    • Settings with Python tool use max-tokens-per-step = 65,536 and max-steps = 50 for multi-step reasoning.
    • MMMU-Pro follows the official protocol, preserving input order and prepending images.
</details>

4. Native INT4 Quantization

Kimi-K2.6 adopts the same native int4 quantization method as Kimi-K2-Thinking.

5. Deployment

[!Note] You can access Kimi-K2.6's API on https://platform.moonshot.ai and we provide OpenAI/Anthropic-compatible API for you. To verify the deployment is correct, we also provide the Kimi Vendor Verifier. Currently, Kimi-K2.6 is recommended to run on the following inference engines:

  • vLLM
  • SGLang
  • KTransformers

Kimi-K2.6 has the same architecture as Kimi-K2.5, and the deployment method can be directly reused.

The version requirement for transformers is >=4.57.1, <5.0.0.

Deployment examples can be found in the Model Deployment Guide.


6. Model Usage

The usage demos below demonstrate how to call our official API.

For third-party APIs deployed with vLLM or SGLang, please note that:

[!Note]

  • Chat with video content is an experimental feature and is only supported in our official API for now.

  • The recommended temperature will be 1.0 for Thinking mode and 0.6 for Instant mode.

  • The recommended top_p is 0.95.

  • To use instant mode, you need to pass {'chat_template_kwargs': {"thinking": False}} in extra_body.

Chat Completion

This is a simple chat completion script which shows how to call K2.6 API in Thinking and Instant modes.

import openai
import base64
import requests
def simple_chat(client: openai.OpenAI, model_name: str):
    messages = [
        {'role': 'system', 'content': 'You are Kimi, an AI assistant created by Moonshot AI.'},
        {
            'role': 'user',
            'content': [
                {'type': 'text', 'text': 'which one is bigger, 9.11 or 9.9? think carefully.'}
            ],
        },
    ]
    response = client.chat.completions.create(
        model=model_name, messages=messages, stream=False, max_tokens=4096
    )
    print('====== Below is reasoning content in Thinking Mode ======')
    print(f'reasoning content: {response.choices[0].message.reasoning}')
    print('====== Below is response in Thinking Mode ======')
    print(f'response: {response.choices[0].message.content}')

    # To use instant mode, pass {"thinking" = {"type":"disabled"}}
    response = client.chat.completions.create(
        model=model_name,
        messages=messages,
        stream=False,
        max_tokens=4096,
        extra_body={'thinking': {'type': 'disabled'}},  # this is for official API
        # extra_body= {'chat_template_kwargs': {"thinking": False}}  # this is for vLLM/SGLang
    )
    print('====== Below is response in Instant Mode ======')
    print(f'response: {response.choices[0].message.content}')

Chat Completion with visual content

K2.6 supports Image and Video input.

The following example demonstrates how to call K2.6 API with image input:

import openai
import base64
import requests

def chat_with_image(client: openai.OpenAI, model_name: str):
    url = 'https://huggingface.co/moonshotai/Kimi-K2.6/resolve/main/figures/kimi-logo.png'
    image_base64 = base64.b64encode(requests.get(url).content).decode()
    messages = [
        {
            'role': 'user',
            'content': [
                {'type': 'text', 'text': 'Describe this image in detail.'},
                {
                    'type': 'image_url',
                    'image_url': {'url': f'data:image/png;base64, {image_base64}'},
                },
            ],
        }
    ]

    response = client.chat.completions.create(
        model=model_name, messages=messages, stream=False, max_tokens=8192
    )
    print('====== Below is reasoning content in Thinking Mode ======')
    print(f'reasoning content: {response.choices[0].message.reasoning}')
    print('====== Below is response in Thinking Mode ======')
    print(f'response: {response.choices[0].message.content}')

    # Also support instant mode if you pass {"thinking" = {"type":"disabled"}}
    response = client.chat.completions.create(
        model=model_name,
        messages=messages,
        stream=False,
        max_tokens=4096,
        extra_body={'thinking': {'type': 'disabled'}},  # this is for official API
        # extra_body= {'chat_template_kwargs': {"thinking": False}}  # this is for vLLM/SGLang
    )
    print('====== Below is response in Instant Mode ======')
    print(f'response: {response.choices[0].message.content}')

    return response.choices[0].message.content

The following example demonstrates how to call K2.6 API with video input:

import openai
import base64
import requests

def chat_with_video(client: openai.OpenAI, model_name:str):
    url = 'https://huggingface.co/moonshotai/Kimi-K2.6/resolve/main/figures/demo_video.mp4'
    video_base64 = base64.b64encode(requests.get(url).content).decode()
    messages = [
        {
            "role": "user",
            "content": [
                {"type": "text","text": "Describe the video in detail."},
                {
                    "type": "video_url",
                    "video_url": {"url": f"data:video/mp4;base64,{video_base64}"},
                },
            ],
        }
    ]

    response = client.chat.completions.create(model=model_name, messages=messages)
    print('====== Below is reasoning content in Thinking Mode ======')
    print(f'reasoning content: {response.choices[0].message.reasoning}')
    print('====== Below is response in Thinking Mode ======')
    print(f'response: {response.choices[0].message.content}')

    # Also support instant mode if pass {"thinking" = {"type":"disabled"}}
    response = client.chat.completions.create(
        model=model_name,
        messages=messages,
        stream=False,
        max_tokens=4096,
        extra_body={'thinking': {'type': 'disabled'}},  # this is for official API
        # extra_body= {'chat_template_kwargs': {"thinking": False}}  # this is for vLLM/SGLang
    )
    print('====== Below is response in Instant Mode ======')
    print(f'response: {response.choices[0].message.content}')
    return response.choices[0].message.content

Preserve Thinking

Kimi K2.6 supports preserve_thinking mode, which retains full reasoning content across multi-turn interactions and enhances performance in coding agent scenarios.

This feature is disabled by default. The following example demonstrates how to call K2.6 API in preserve_thinking mode:

def chat_with_preserve_thinking(client: openai.OpenAI, model_name: str):
    messages = [
        {
            "role": "user",
            "content": "Tell me three random numbers."
        },
        {
            "role": "assistant",
            "reasoning_content": "I'll start by listing five numbers: 473, 921, 235, 215, 222, and I'll tell you the first three.",
            "content": "473, 921, 235"
        },
        {
            "role": "user",
            "content": "What are the other two numbers you have in mind?"
        }
    ]

    response = client.chat.completions.create(
        model=model_name,
        messages=messages,
        stream=False,
        max_tokens=4096,
        extra_body={'thinking': {'type': 'enabled', 'keep': 'all'}},  # this is for official API
        # extra_body={"chat_template_kwargs": {"thinking":True, "preserve_thinking": True}},  # this is for vLLM/SGLang
        # We recommend enabling preserve_thinking only in think mode.
    )
    # the assistant should mention 215 and 222 that appear in the prior reasoning content
    print(f"response: {response.choices[0].message.reasoning}")
    return response.choices[0].message.content

Interleaved Thinking and Multi-Step Tool Call

K2.6 shares the same design of Interleaved Thinking and Multi-Step Tool Call as K2 Thinking. For usage example, please refer to the K2 Thinking documentation.

Coding Agent Framework

Kimi K2.6 works best with Kimi Code CLI as its agent framework — give it a try at https://www.kimi.com/code.


7. License

Both the code repository and the model weights are released under the Modified MIT License.


8. Third Party Notices

See THIRD PARTY NOTICES


9. Contact Us

If you have any questions, please reach out at support@moonshot.ai.

Author: unsloth

Likes: 6

Downloads: 0

Tags: transformers, safetensors, kimi_k25, feature-extraction, compressed-tensors, unsloth, image-text-to-text, conversational, custom_code, arxiv:2602.02276, base_model:moonshotai/Kimi-K2.6, base_model:quantized:moonshotai/Kimi-K2.6, license:other, eval-results, region:us

oumoumad/LTX-2.3-22b-IC-LoRA-ReFocus


base_model:

  • Lightricks/LTX-2.3 language:
  • en license: other license_name: ltx-2-community-license license_link: https://github.com/Lightricks/LTX-2/blob/main/LICENSE pipeline_tag: video-to-video tags:
  • ltx-video
  • ic-lora
  • refocus
  • deblur
  • lens-blur

LTX-2.3 22B IC-LoRA ReFocus

This is a ReFocus IC-LoRA trained on top of LTX-2.3-22b, designed to turn out-of-focus footage back into sharp, in-focus video by removing lens blur. It effectively reverses shallow depth-of-field or missed-focus artifacts to restore detail in the scene.

It is based on the LTX-2.3 foundation model.

Model Files

ltx-2.3-22b-ic-lora-refocus.safetensors

Model Details

  • Base Model: LTX-2.3-22b
  • Training Type: IC LoRA
  • Purpose: Remove lens blur / restore focus

⚠️ Lens blur only

This LoRA is specifically trained to correct lens blur (out-of-focus / shallow DOF). For other types of blur — motion blur, gaussian blur, etc. — it will yield bad results. Use it only on footage where the blur comes from the lens being out of focus.

🔌 Using in ComfyUI

  1. Copy the LoRA weights into models/loras.
  2. Use the IC-LoRA workflow from the LTX-2 ComfyUI repository.
  3. Load the LoRA using the LTXICLoRALoaderModelOnly node.

License

See the LTX-2-community-license for full terms.

Author: oumoumad

Likes: 6

Downloads: 0

Tags: ltx-video, ic-lora, refocus, deblur, lens-blur, video-to-video, en, base_model:Lightricks/LTX-2.3, base_model:finetune:Lightricks/LTX-2.3, license:other, region:us

mudler/Carnice-Qwen3.6-MoE-35B-A3B-APEX-GGUF


license: apache-2.0 base_model: samuelcardillo/Carnice-Qwen3.6-MoE-35B-A3B tags:

  • gguf
  • quantized
  • apex
  • moe
  • mixture-of-experts
  • qwen3
  • carnice
  • agentic
  • tool-calling

Carnice Qwen3.6 MoE 35B-A3B APEX GGUF

APEX (Adaptive Precision for EXpert Models) quantizations of samuelcardillo/Carnice-Qwen3.6-MoE-35B-A3B.

Brought to you by the LocalAI team | APEX Project | Technical Report

Available Files

| File | Profile | Size | Best For | |------|---------|------|----------| | Carnice-Qwen3.6-MoE-35B-A3B-APEX-I-Balanced.gguf | I-Balanced | 24 GB | Best overall quality/size ratio | | Carnice-Qwen3.6-MoE-35B-A3B-APEX-I-Quality.gguf | I-Quality | 22 GB | Highest quality with imatrix | | Carnice-Qwen3.6-MoE-35B-A3B-APEX-Quality.gguf | Quality | 22 GB | Highest quality standard | | Carnice-Qwen3.6-MoE-35B-A3B-APEX-Balanced.gguf | Balanced | 24 GB | General purpose | | Carnice-Qwen3.6-MoE-35B-A3B-APEX-I-Compact.gguf | I-Compact | 17 GB | Consumer GPUs, best quality/size | | Carnice-Qwen3.6-MoE-35B-A3B-APEX-Compact.gguf | Compact | 17 GB | Consumer GPUs | | Carnice-Qwen3.6-MoE-35B-A3B-APEX-I-Mini.gguf | I-Mini | 14 GB | Smallest viable, fastest inference |

What is APEX?

APEX is a quantization strategy for Mixture-of-Experts (MoE) models. It classifies tensors by role (routed expert, shared expert, attention) and applies a layer-wise precision gradient — edge layers get higher precision, middle layers get more aggressive compression. I-variants use diverse imatrix calibration (chat, code, reasoning, tool-calling, agentic traces, Wikipedia).

See the APEX project for full details, technical report, and scripts.

Architecture

  • Model: Carnice Qwen3.6 MoE 35B-A3B (fine-tuned for agentic/tool-calling)
  • Base: Qwen 3.6 35B-A3B
  • Layers: 40
  • Experts: 256 routed + shared (8 active per token)
  • Total Parameters: ~35B
  • Active Parameters: ~3B per token
  • Attention: Hybrid (full attention every 4th layer, linear/Mamba otherwise)
  • APEX Config: 5+5 symmetric edge gradient across 40 layers
  • Calibration: v1.3 diverse dataset (chat, code, reasoning, multilingual, tool-calling, Wikipedia)
  • llama.cpp: Built with b8797

Run with LocalAI

local-ai run mudler/Carnice-Qwen3.6-MoE-35B-A3B-APEX-GGUF@Carnice-Qwen3.6-MoE-35B-A3B-APEX-I-Balanced.gguf

Credits

Author: mudler

Likes: 4

Downloads: 0

Tags: gguf, quantized, apex, moe, mixture-of-experts, qwen3, carnice, agentic, tool-calling, base_model:samuelcardillo/Carnice-Qwen3.6-MoE-35B-A3B, base_model:quantized:samuelcardillo/Carnice-Qwen3.6-MoE-35B-A3B, license:apache-2.0, endpoints_compatible, region:us, conversational

OpenMed/OpenMed-PII-Portuguese-SnowflakeMed-Large-568M-v1


language:

  • pt license: apache-2.0 base_model: Snowflake/snowflake-arctic-embed-l-v2.0 tags:
  • token-classification
  • ner
  • pii
  • pii-detection
  • de-identification
  • privacy
  • healthcare
  • medical
  • clinical
  • phi
  • portuguese
  • pytorch
  • transformers
  • openmed pipeline_tag: token-classification library_name: transformers metrics:
  • f1
  • precision
  • recall model-index:
  • name: OpenMed-PII-Portuguese-SnowflakeMed-Large-568M-v1 results:
    • task: type: token-classification name: Named Entity Recognition dataset: name: AI4Privacy + Synthetic Portuguese PII type: ai4privacy/pii-masking-200k split: test metrics:
      • type: f1 value: 0.8921 name: F1 (micro)
      • type: precision value: 0.8914 name: Precision
      • type: recall value: 0.8928 name: Recall widget:
  • text: "Dr. Pedro Almeida (CPF: 123.456.789-00) pode ser contatado em pedro.almeida@hospital.pt ou +351 912 345 678. Endereço: Rua das Flores 25, 1200-195 Lisboa." example_title: Clinical Note with PII (Portuguese)

OpenMed-PII-Portuguese-SnowflakeMed-Large-568M-v1

Portuguese PII Detection Model | 568M Parameters | Open Source

F1 Score Precision Recall

Model Description

OpenMed-PII-Portuguese-SnowflakeMed-Large-568M-v1 is a transformer-based token classification model fine-tuned for Personally Identifiable Information (PII) detection in Portuguese text. This model identifies and classifies 54 types of sensitive information including names, addresses, social security numbers, medical record numbers, and more.

Key Features

  • Portuguese-Optimized: Specifically trained on Portuguese text for optimal performance
  • High Accuracy: Achieves strong F1 scores across diverse PII categories
  • Comprehensive Coverage: Detects 55+ entity types spanning personal, financial, medical, and contact information
  • Privacy-Focused: Designed for de-identification and compliance with GDPR and other privacy regulations
  • Production-Ready: Optimized for real-world text processing pipelines

Performance

Evaluated on the Portuguese test split (AI4Privacy + synthetic data):

| Metric | Score | |:---|:---:| | Micro F1 | 0.8921 | | Precision | 0.8914 | | Recall | 0.8928 | | Macro F1 | 0.6062 | | Weighted F1 | 0.8862 | | Accuracy | 0.9386 |

Top 10 Portuguese PII Models

| Rank | Model | F1 | Precision | Recall | |:---:|:---|:---:|:---:|:---:| | 1 | OpenMed-PII-Portuguese-SnowflakeMed-Large-568M-v1 | 0.8921 | 0.8914 | 0.8928 | | 2 | OpenMed-PII-Portuguese-ClinicalBGE-568M-v1 | 0.8905 | 0.8896 | 0.8913 | | 3 | OpenMed-PII-Portuguese-NomicMed-Large-395M-v1 | 0.8896 | 0.8927 | 0.8866 | | 4 | OpenMed-PII-Portuguese-SuperMedical-Large-355M-v1 | 0.8889 | 0.8891 | 0.8887 | | 5 | OpenMed-PII-Portuguese-SuperClinical-Large-434M-v1 | 0.8889 | 0.8830 | 0.8948 | | 6 | OpenMed-PII-Portuguese-BioClinicalModern-Large-395M-v1 | 0.8871 | 0.8906 | 0.8836 | | 7 | OpenMed-PII-Portuguese-mSuperClinical-Base-279M-v1 | 0.8865 | 0.8796 | 0.8934 | | 8 | OpenMed-PII-Portuguese-mLiteClinical-135M-v1 | 0.8862 | 0.8888 | 0.8837 | | 9 | OpenMed-PII-Portuguese-SuperMedical-Base-125M-v1 | 0.8856 | 0.8844 | 0.8868 | | 10 | OpenMed-PII-Portuguese-ModernMed-Large-395M-v1 | 0.8856 | 0.8987 | 0.8728 |

Supported Entity Types

This model detects 54 PII entity types organized into categories:

<details> <summary><strong>Identifiers</strong> (22 types)</summary>

| Entity | Description | |:---|:---| | ACCOUNTNAME | Accountname | | BANKACCOUNT | Bankaccount | | BIC | Bic | | BITCOINADDRESS | Bitcoinaddress | | CREDITCARD | Creditcard | | CREDITCARDISSUER | Creditcardissuer | | CVV | Cvv | | ETHEREUMADDRESS | Ethereumaddress | | IBAN | Iban | | IMEI | Imei | | ... | and 12 more |

</details> <details> <summary><strong>Personal Info</strong> (11 types)</summary>

| Entity | Description | |:---|:---| | AGE | Age | | DATEOFBIRTH | Dateofbirth | | EYECOLOR | Eyecolor | | FIRSTNAME | Firstname | | GENDER | Gender | | HEIGHT | Height | | LASTNAME | Lastname | | MIDDLENAME | Middlename | | OCCUPATION | Occupation | | PREFIX | Prefix | | ... | and 1 more |

</details> <details> <summary><strong>Contact Info</strong> (2 types)</summary>

| Entity | Description | |:---|:---| | EMAIL | Email | | PHONE | Phone |

</details> <details> <summary><strong>Location</strong> (9 types)</summary>

| Entity | Description | |:---|:---| | BUILDINGNUMBER | Buildingnumber | | CITY | City | | COUNTY | County | | GPSCOORDINATES | Gpscoordinates | | ORDINALDIRECTION | Ordinaldirection | | SECONDARYADDRESS | Secondaryaddress | | STATE | State | | STREET | Street | | ZIPCODE | Zipcode |

</details> <details> <summary><strong>Organization</strong> (3 types)</summary>

| Entity | Description | |:---|:---| | JOBDEPARTMENT | Jobdepartment | | JOBTITLE | Jobtitle | | ORGANIZATION | Organization |

</details> <details> <summary><strong>Financial</strong> (5 types)</summary>

| Entity | Description | |:---|:---| | AMOUNT | Amount | | CURRENCY | Currency | | CURRENCYCODE | Currencycode | | CURRENCYNAME | Currencyname | | CURRENCYSYMBOL | Currencysymbol |

</details> <details> <summary><strong>Temporal</strong> (2 types)</summary>

| Entity | Description | |:---|:---| | DATE | Date | | TIME | Time |

</details>

Usage

Quick Start

from transformers import pipeline

# Load the PII detection pipeline
ner = pipeline("ner", model="OpenMed/OpenMed-PII-Portuguese-SnowflakeMed-Large-568M-v1", aggregation_strategy="simple")

text = """
Paciente Carlos Mendes (nascido em 15/03/1985, CPF: 987.654.321-00) foi atendido hoje.
Contato: carlos.mendes@email.pt, Telefone: +351 912 345 678.
Endereço: Avenida da Liberdade 42, 1250-096 Lisboa.
"""

entities = ner(text)
for entity in entities:
    print(f"{entity['entity_group']}: {entity['word']} (score: {entity['score']:.3f})")

De-identification Example

def redact_pii(text, entities, placeholder='[REDACTED]'):
    """Replace detected PII with placeholders."""
    # Sort entities by start position (descending) to preserve offsets
    sorted_entities = sorted(entities, key=lambda x: x['start'], reverse=True)
    redacted = text
    for ent in sorted_entities:
        redacted = redacted[:ent['start']] + f"[{ent['entity_group']}]" + redacted[ent['end']:]
    return redacted

# Apply de-identification
redacted_text = redact_pii(text, entities)
print(redacted_text)

Batch Processing

from transformers import AutoModelForTokenClassification, AutoTokenizer
import torch

model_name = "OpenMed/OpenMed-PII-Portuguese-SnowflakeMed-Large-568M-v1"
model = AutoModelForTokenClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

texts = [
    "Paciente Carlos Mendes (nascido em 15/03/1985, CPF: 987.654.321-00) foi atendido hoje.",
    "Contato: carlos.mendes@email.pt, Telefone: +351 912 345 678.",
]

inputs = tokenizer(texts, return_tensors='pt', padding=True, truncation=True)
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.argmax(outputs.logits, dim=-1)

Training Details

Dataset

This model was trained on a combination of:

  • AI4Privacy PII Masking 200K: Multilingual base dataset (200K records across 8 languages)

  • NVIDIA Nemotron-PII: Seed dataset for synthetic data generation

  • Synthetic Portuguese Data: ~80K high-quality samples generated with locale-specific formatting (CPF format, +55 phones, Brazilian names, R$ currency)

  • Format: BIO-tagged token classification

  • Labels: 76 BIO tags (54 entity types)

Training Configuration

  • Max Sequence Length: 512 tokens
  • Epochs: 3
  • Framework: Hugging Face Transformers + Trainer API

Intended Use & Limitations

Intended Use

  • De-identification: Automated redaction of PII in Portuguese clinical notes, medical records, and documents
  • Compliance: Supporting GDPR, and other privacy regulation compliance
  • Data Preprocessing: Preparing datasets for research by removing sensitive information
  • Audit Support: Identifying PII in document collections

Limitations

Important: This model is intended as an assistive tool, not a replacement for human review.

  • False Negatives: Some PII may not be detected; always verify critical applications
  • Context Sensitivity: Performance may vary with domain-specific terminology
  • Language: Optimized for Portuguese text; may not perform well on other languages

Citation

@misc{openmed-pii-2026,
  title = {OpenMed-PII-Portuguese-SnowflakeMed-Large-568M-v1: Portuguese PII Detection Model},
  author = {OpenMed Science},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/OpenMed/OpenMed-PII-Portuguese-SnowflakeMed-Large-568M-v1}
}

Links

Author: OpenMed

Likes: 4

Downloads: 0

Tags: transformers, safetensors, xlm-roberta, token-classification, ner, pii, pii-detection, de-identification, privacy, healthcare, medical, clinical, phi, portuguese, pytorch, openmed, pt, base_model:Snowflake/snowflake-arctic-embed-l-v2.0, base_model:finetune:Snowflake/snowflake-arctic-embed-l-v2.0, license:apache-2.0, model-index, endpoints_compatible, region:us

oumoumad/LTX-2.3-22b-IC-LoRA-Uncompress


base_model:

  • Lightricks/LTX-2.3 language:
  • en license: other license_name: ltx-2-community-license license_link: https://github.com/Lightricks/LTX-2/blob/main/LICENSE pipeline_tag: video-to-video tags:
  • ltx-video
  • ic-lora
  • uncompress
  • compression-artifact
  • restoration

LTX-2.3 22B IC-LoRA Uncompress

This is an Uncompress IC-LoRA trained on top of LTX-2.3-22b, designed to remove MP4 compression artifacts and restore clean video. It effectively reverses the quality loss introduced by aggressive video compression — blocking, banding, mosquito noise, ringing — to recover sharper, cleaner footage.

It is based on the LTX-2.3 foundation model.

Model Files

ltx-2.3-22b-ic-lora-uncompress.safetensors

Model Details

  • Base Model: LTX-2.3-22b
  • Training Type: IC LoRA
  • Purpose: Remove MP4 compression artifacts / restore video quality

🔌 Using in ComfyUI

  1. Copy the LoRA weights into models/loras.
  2. Use the IC-LoRA workflow from the LTX-2 ComfyUI repository.
  3. Load the LoRA using the LTXICLoRALoaderModelOnly node.

License

See the LTX-2-community-license for full terms.

Author: oumoumad

Likes: 4

Downloads: 0

Tags: ltx-video, ic-lora, uncompress, compression-artifact, restoration, video-to-video, en, base_model:Lightricks/LTX-2.3, base_model:finetune:Lightricks/LTX-2.3, license:other, region:us

huihui-ai/Huihui-Qwen3.6-35B-A3B-Claude-4.6-Opus-abliterated


base_model_relation: finetune library_name: transformers pipeline_tag: image-text-to-text license: apache-2.0 language:

  • en base_model:
  • hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled tags:
  • transformers
  • safetensors
  • qwen
  • qwen3.6
  • qwen3_5_moe
  • moe
  • unsloth
  • trl
  • reasoning
  • chain-of-thought
  • conversational
  • image-text-to-text
  • text-generation-inference
  • vllm
  • abliterated
  • uncensored

huihui-ai/Huihui-Qwen3.6-35B-A3B-Claude-4.6-Opus-abliterated

This is an uncensored version of hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled created with abliteration (see remove-refusals-with-transformers to know more about it). This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.

Usage Warnings

  • Risk of Sensitive or Controversial Outputs: This model’s safety filtering has been significantly reduced, potentially generating sensitive, controversial, or inappropriate content. Users should exercise caution and rigorously review generated outputs.

  • Not Suitable for All Audiences: Due to limited content filtering, the model’s outputs may be inappropriate for public settings, underage users, or applications requiring high security.

  • Legal and Ethical Responsibilities: Users must ensure their usage complies with local laws and ethical standards. Generated content may carry legal or ethical risks, and users are solely responsible for any consequences.

  • Research and Experimental Use: It is recommended to use this model for research, testing, or controlled environments, avoiding direct use in production or public-facing commercial applications.

  • Monitoring and Review Recommendations: Users are strongly advised to monitor model outputs in real-time and conduct manual reviews when necessary to prevent the dissemination of inappropriate content.

  • No Default Safety Guarantees: Unlike standard models, this model has not undergone rigorous safety optimization. huihui.ai bears no responsibility for any consequences arising from its use.

Donation

Your donation helps us continue our further development and improvement, a cup of coffee can do it.
  • bitcoin:
  bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge
  • Support our work on Ko-fi!

Author: huihui-ai

Likes: 4

Downloads: 0

Tags: transformers, safetensors, qwen3_5_moe, image-text-to-text, qwen, qwen3.6, moe, unsloth, trl, reasoning, chain-of-thought, conversational, text-generation-inference, vllm, abliterated, uncensored, en, base_model:hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled, base_model:finetune:hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled, license:apache-2.0, endpoints_compatible, region:us

MahmoodLab/KRONOSv2

Author: MahmoodLab

Likes: 3

Downloads: 0

Tags: multiplex, spatial-proteomics, pathology, vision, pytorch, self-supervised, vit, image-feature-extraction, en, arxiv:2506.03373, license:cc-by-nc-nd-4.0, region:us