Todays AI Summary

AI Developments: Uncensored Captioning Models and Novel Approaches to Theorem Proving

Today's AI landscape features advancements in multimodal models, theorem proving, and recommender systems. Here's a breakdown of the key developments:

Research Papers

  • DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving: This paper introduces DriveVLA-W0, a training paradigm that uses world modeling to predict future images, generating a dense, self-supervised signal for VLA models in autonomous driving. Experiments show significant outperformance compared to BEV and VLA baselines, amplifying the data scaling law.
  • CuMPerLay: Learning Cubical Multiparameter Persistence Vectorizations: This research presents CuMPerLay, a differentiable vectorization layer that integrates Cubical Multiparameter Persistence (CMP) into deep learning pipelines. It decomposes CMP into learnable single-parameter persistence, offering theoretical guarantees and demonstrating improved performance on medical imaging and computer vision datasets.
  • UniFusion: Vision-Language Model as Unified Encoder in Image Generation: UniFusion is a diffusion-based generative model conditioned on a frozen large vision-language model (VLM) that serves as a unified multimodal encoder. It introduces the Layerwise Attention Pooling (LAP) mechanism to extract semantics and details from text and visual tokens.
  • Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics: This paper introduces Ax-Prover, a multi-agent system for automated theorem proving in Lean. It equips Large Language Models (LLMs) with Lean tools via the Model Context Protocol (MCP), enabling formal verification across diverse scientific domains.
  • MVP4D: Multi-View Portrait Video Diffusion for Animatable 4D Avatars: This paper introduces MVP4D, a video model that generates animatable multi-view videos of digital humans based on a single reference image and target expressions. It distills the outputs into a 4D avatar that can be rendered in real-time.
  • Dr.LLM: Dynamic Layer Routing in LLMs: This paper introduces Dr.LLM, a retrofittable framework that equips pretrained models with lightweight per-layer routers deciding to skip, execute, or repeat a block. Routers are trained with explicit supervision using Monte Carlo Tree Search (MCTS).
  • Uncertainty Matters in Dynamic Gaussian Splatting for Monocular 4D Reconstruction: This paper introduces USplat4D, a novel Uncertainty-aware dynamic Gaussian Splatting framework that propagates reliable motion cues to enhance 4D reconstruction. It estimates time-varying per-Gaussian uncertainty and leverages it to construct a spatio-temporal graph for uncertainty-aware optimization.
  • Disentangling Neurodegeneration with Brain Age Gap Prediction Models: A Graph Signal Processing Perspective: This tutorial article provides an overview of BAGP and introduces a principled framework for this application based on recent advancements in graph signal processing (GSP).
  • VQArt-Bench: A semantically rich VQA Benchmark for Art and Cultural Heritage: This paper introduces VQArt-Bench, a new, large-scale VQA benchmark for the cultural heritage domain. It is constructed using a novel multi-agent pipeline where specialized agents collaborate to generate nuanced, validated, and linguistically diverse questions.
  • CTRL-Rec: Controlling Recommender Systems With Natural Language: This paper proposes CTRL-Rec, a method that allows for natural language control of traditional recommender systems in real-time with computational efficiency.

Models

  • prithivMLmods/Qwen3-VL-4B-Instruct-abliterated: This model is an "abliterated" variant of Qwen3-VL-4B-Instruct, fine-tuned for uncensored reasoning and captioning across various visual contexts. It aims to bypass content filters while providing detailed and descriptive outputs.
  • prithivMLmods/Qwen3-VL-4B-Thinking-abliterated: Similar to the Instruct version, this model is an "abliterated" variant of Qwen3-VL-4B-Thinking, designed for uncensored reasoning and captioning, supporting diverse image aspect ratios and resolutions.

Key Takeaways

  • Uncensored Multimodal Models: The release of "abliterated" models signifies a push towards exploring the boundaries of content generation, particularly in visual reasoning and captioning.
  • Advancements in Theorem Proving: Ax-Prover demonstrates the potential of combining LLMs with formal verification tools for solving complex problems in mathematics and quantum physics.
  • Unified Encoders for Image Generation: UniFusion's approach of using a unified vision-language model as an encoder highlights the benefits of cross

AI Papers for 2026-03-10

BEVLM: Distilling Semantic Knowledge from LLMs into Bird's-Eye View Representations

The integration of Large Language Models (LLMs) into autonomous driving has attracted growing interest for their strong reasoning and semantic understanding abilities, which are essential for handling complex decision-making and long-tail scenarios. However, existing methods typically feed LLMs with tokens from multi-view and multi-frame images independently, leading to redundant computation and limited spatial consistency. This separation in visual processing hinders accurate 3D spatial reasoning and fails to maintain geometric coherence across views. On the other hand, Bird's-Eye View (BEV) representations learned from geometrically annotated tasks (e.g., object detection) provide spatial structure but lack the semantic richness of foundation vision encoders. To bridge this gap, we propose BEVLM, a framework that connects a spatially consistent and semantically distilled BEV representation with LLMs. Through extensive experiments, we show that BEVLM enables LLMs to reason more effectively in cross-view driving scenes, improving accuracy by 46%, by leveraging BEV features as unified inputs. Furthermore, by distilling semantic knowledge from LLMs into BEV representations, BEVLM significantly improves closed-loop end-to-end driving performance by 29% in safety-critical scenarios.

Fly360: Omnidirectional Obstacle Avoidance within Drone View

Obstacle avoidance in unmanned aerial vehicles (UAVs), as a fundamental capability, has gained increasing attention with the growing focus on spatial intelligence. However, current obstacle-avoidance methods mainly depend on limited field-of-view sensors and are ill-suited for UAV scenarios which require full-spatial awareness when the movement direction differs from the UAV's heading. This limitation motivates us to explore omnidirectional obstacle avoidance for panoramic drones with full-view perception. We first study an under explored problem setting in which a UAV must generate collision-free motion in environments with obstacles from arbitrary directions, and then construct a benchmark that consists of three representative flight tasks. Based on such settings, we propose Fly360, a two-stage perception-decision pipeline with a fixed random-yaw training strategy. At the perception stage, panoramic RGB observations are input and converted into depth maps as a robust intermediate representation. For the policy network, it is lightweight and used to output body-frame velocity commands from depth inputs. Extensive simulation and real-world experiments demonstrate that Fly360 achieves stable omnidirectional obstacle avoidance and outperforms forward-view baselines across all tasks. Our model is available at https://zxkai.github.io/fly360/

SUREON: A Benchmark and Vision-Language-Model for Surgical Reasoning

Surgeons don't just see -- they interpret. When an expert observes a surgical scene, they understand not only what instrument is being used, but why it was chosen, what risk it poses, and what comes next. Current surgical AI cannot answer such questions, largely because training data that explicitly encodes surgical reasoning is immensely difficult to annotate at scale. Yet surgical video lectures already contain exactly this -- explanations of intent, rationale, and anticipation, narrated by experts for the purpose of teaching. Though inherently noisy and unstructured, these narrations encode the reasoning that surgical AI currently lacks. We introduce SUREON, a large-scale video QA dataset that systematically harvests this training signal from surgical academic videos. SUREON defines 12 question categories covering safety assessment, decision rationale, and forecasting, and uses a multi-agent pipeline to extract and structure supervision at scale. Across 134.7K clips and 170 procedure types, SUREON yields 206.8k QA pairs and an expert-validated benchmark of 354 examples. To evaluate the extent to which this supervision translates to surgical reasoning ability, we introduce two models: SureonVLM, a vision-language model adapted through supervised fine-tuning, and SureonVLM-R1, a reasoning model trained with Group Relative Policy Optimization. Both models can answer complex questions about surgery and substantially outperform larger general-domain models, exceeding 84% accuracy on the SUREON benchmark while outperforming general-domain models on standard surgical perception tasks. Qualitative analysis of SureonVLM-R1 reveals explicit reasoning behavior, such as inferring operative intent from visual context.

Boosting deep Reinforcement Learning using pretraining with Logical Options

Deep reinforcement learning agents are often misaligned, as they over-exploit early reward signals. Recently, several symbolic approaches have addressed these challenges by encoding sparse objectives along with aligned plans. However, purely symbolic architectures are complex to scale and difficult to apply to continuous settings. Hence, we propose a hybrid approach, inspired by humans' ability to acquire new skills. We use a two-stage framework that injects symbolic structure into neural-based reinforcement learning agents without sacrificing the expressivity of deep policies. Our method, called Hybrid Hierarchical RL (H^2RL), introduces a logical option-based pretraining strategy to steer the learning policy away from short-term reward loops and toward goal-directed behavior while allowing the final policy to be refined via standard environment interaction. Empirically, we show that this approach consistently improves long-horizon decision-making and yields agents that outperform strong neural, symbolic, and neuro-symbolic baselines.

LiveSense: A Real-Time Wi-Fi Sensing Platform for Range-Doppler on COTS Laptop

We present LiveSense - a cross-platform that transforms a commercial off-the-shelf (COTS) Wi-Fi Network Interface Card (NIC) on a laptop into a centimeter-level Range-Doppler sensor while preserving simultaneous communication capability. The laptops are equipped with COTS Intel AX211 (Wi-Fi 6E) or Intel BE201 (Wi-Fi 7) NICs. LiveSense can (i) Extract fully-synchronized channel state information (CSI) at >= 40 Hz, (ii) Perform time-phase alignment and self-interference cancellation on-device, and (iii) Provide a real-time stream of range, Doppler, subcarrier magnitude/phase and annotated video frames to a Python/Qt Graphical User Interface (GUI). The demo will showcase the ability to detect (i) Distance and radial velocity of attendees within a few meters of the device, (ii) Micro-motion (respiration), and (iii) Hand-gesture ranging. To the best of our knowledge, this is the first-ever demo to obtain accurate range information of targets from commercial Wi-Fi, despite the limited 160 MHz bandwidth.

RAMoEA-QA: Hierarchical Specialization for Robust Respiratory Audio Question Answering

Conversational generative AI is rapidly entering healthcare, where general-purpose models must integrate heterogeneous patient signals and support diverse interaction styles while producing clinically meaningful outputs. In respiratory care, non-invasive audio, such as recordings captured via mobile microphones, enables scalable screening and longitudinal monitoring, but the heterogeneity challenge is particularly acute: recordings vary widely across devices, environments, and acquisition protocols, and questions span multiple intents and question formats. Existing biomedical audio-language QA systems are typically monolithic, without any specialization mechanisms for tackling diverse respiratory corpora and query intents. They are also only validated in limited settings, leaving it unclear how reliably they handle the shifts encountered in real-world settings. To address these limitations, we introduce RAMoEA-QA, a hierarchically routed generative model for respiratory audio question answering that unifies multiple question types and supports both discrete and continuous targets within a single multimodal system. RAMoEA-QA applies two-stage conditional specialization: an Audio Mixture-of-Experts routes each recording to a suitable pre-trained audio encoder, and a Language Mixture-of-Adapters selects a LoRA adapter on a shared frozen LLM to match the query intent and answer format. By specializing both acoustic representations and generation behaviour per example, RAMoEA-QA consistently outperforms strong baselines and routing ablations with minimal parameter overhead, improving in-domain test accuracy to 0.72 (vs. 0.61 and 0.67 for state-of-the-art baselines) and exhibiting the strongest generalization for diagnosis under domain, modality, and task shifts.

Artificial Intelligence for Detecting Fetal Orofacial Clefts and Advancing Medical Education

Orofacial clefts are among the most common congenital craniofacial abnormalities, yet accurate prenatal detection remains challenging due to the scarcity of experienced specialists and the relative rarity of the condition. Early and reliable diagnosis is essential to enable timely clinical intervention and reduce associated morbidity. Here we show that an artificial intelligence system, trained on over 45,139 ultrasound images from 9,215 fetuses across 22 hospitals, can diagnose fetal orofacial clefts with sensitivity and specificity exceeding 93% and 95% respectively, matching the performance of senior radiologists and substantially outperforming junior radiologists. When used as a medical copilot, the system raises junior radiologists' sensitivity by more than 6%. Beyond direct diagnostic assistance, the system also accelerates the development of clinical expertise. A pilot study involving 24 radiologists and trainees demonstrated that the model can improve the expertise development for rare conditions. This dual-purpose approach offers a scalable solution for improving both diagnostic accuracy and specialist training in settings where experienced radiologists are scarce.

COLD-Steer: Steering Large Language Models via In-Context One-step Learning Dynamics

Activation steering methods enable inference-time control of large language model (LLM) behavior without retraining, but current approaches face a fundamental trade-off: sample-efficient methods suboptimally capture steering signals from labeled examples, while methods that better extract these signals require hundreds to thousands of examples. We introduce COLD-Steer, a training-free framework that steers LLM activations by approximating the representational changes that would result from gradient descent on in-context examples. Our key insight is that the effect of fine-tuning on a small set of examples can be efficiently approximated at inference time without actual parameter updates. We formalize this through two complementary approaches: (i) a unit kernel approximation method that updates the activations directly using gradients with respect to them, normalized across examples, and (ii) a finite-difference approximation requiring only two forward passes regardless of example count. Experiments across a variety of steering tasks and benchmarks demonstrate that COLD-Steer achieves upto 95% steering effectiveness while using 50 times fewer samples compared to the best baseline. COLD-Steer facilitates accommodating diverse perspectives without extensive demonstration data, which we validate through our experiments on pluralistic alignment tasks. Our framework opens new possibilities for adaptive, context-aware model control that can flexibly address varying loss-driven human preferences through principled approximation of learning dynamics rather than specialized training procedures.

NOBLE: Accelerating Transformers with Nonlinear Low-Rank Branches

We introduce NOBLE (Nonlinear lOw-rank Branch for Linear Enhancement), an architectural augmentation that adds nonlinear low-rank branches to transformer linear layers. Unlike LoRA and other parameter-efficient fine-tuning (PEFT) methods, NOBLE is designed for pretraining from scratch. The branch is a permanent part of the architecture as opposed to an adapter for finetuning on top of frozen weights. The branch computes σ(xWdown)Wup where σ is a learnable nonlinearity. We evaluate several activation functions and find that CosNet, a two-layer cosine nonlinearity with learnable frequency and phase with a linear projection in between them in the bottleneck space, performs best. NOBLE achieves substantial improvements with minimal overhead: up to 1.47x step speedup to reach baseline eval loss (up to 32% fewer training steps), with as low as 4% additional parameters and 7% step time overhead, resulting in up to 1.22x net wallclock speedup. Experiments on LLMs (250M and 1.5B parameters), BERT, VQGAN, and ViT consistently show improved training efficiency. We identify one caveat: Mixup/CutMix augmentation interferes with NOBLE's benefits in Imagenet classification along with other stochastic augmentations, but when disabled, ViT also improves. This discrepancy is possibly explained by regularization techniques that encourage smoother fits to the target function while NOBLE may specialize more in sharper aspects of the target function.

PONTE: Personalized Orchestration for Natural Language Trustworthy Explanations

Explainable Artificial Intelligence (XAI) seeks to enhance the transparency and accountability of machine learning systems, yet most methods follow a one-size-fits-all paradigm that neglects user differences in expertise, goals, and cognitive needs. Although Large Language Models can translate technical explanations into natural language, they introduce challenges related to faithfulness and hallucinations. To address these challenges, we present PONTE (Personalized Orchestration for Natural language Trustworthy Explanations), a human-in-the-loop framework for adaptive and reliable XAI narratives. PONTE models personalization as a closed-loop validation and adaptation process rather than prompt engineering. It combines: (i) a low-dimensional preference model capturing stylistic requirements; (ii) a preference-conditioned generator grounded in structured XAI artifacts; and (iii) verification modules enforcing numerical faithfulness, informational completeness, and stylistic alignment, optionally supported by retrieval-grounded argumentation. User feedback iteratively updates the preference state, enabling quick personalization. Automatic and human evaluations across healthcare and finance domains show that the verification-refinement loop substantially improves completeness and stylistic alignment over validation-free generation. Human studies further confirm strong agreement between intended preference vectors and perceived style, robustness to generation stochasticity, and consistently positive quality assessments.

AI Models

huihui-ai/Huihui-Qwen3.5-9B-abliterated


library_name: transformers license: apache-2.0 license_link: https://huggingface.co/Qwen/Qwen3.5-9B/blob/main/LICENSE pipeline_tag: image-text-to-text base_model:

  • Qwen/Qwen3.5-9B tags:
  • abliterated
  • uncensored

huihui-ai/Huihui-Qwen3.5-9B-abliterated

This is an uncensored version of Qwen/Qwen3.5-9B created with abliteration (see remove-refusals-with-transformers to know more about it). This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.

ollama

Please use the latest version of ollama v0.17.7

You can use huihui_ai/qwen3.5-abliterated:9b directly,

ollama run huihui_ai/qwen3.5-abliterated:9b

Usage Warnings

  • Risk of Sensitive or Controversial Outputs: This model’s safety filtering has been significantly reduced, potentially generating sensitive, controversial, or inappropriate content. Users should exercise caution and rigorously review generated outputs.

  • Not Suitable for All Audiences: Due to limited content filtering, the model’s outputs may be inappropriate for public settings, underage users, or applications requiring high security.

  • Legal and Ethical Responsibilities: Users must ensure their usage complies with local laws and ethical standards. Generated content may carry legal or ethical risks, and users are solely responsible for any consequences.

  • Research and Experimental Use: It is recommended to use this model for research, testing, or controlled environments, avoiding direct use in production or public-facing commercial applications.

  • Monitoring and Review Recommendations: Users are strongly advised to monitor model outputs in real-time and conduct manual reviews when necessary to prevent the dissemination of inappropriate content.

  • No Default Safety Guarantees: Unlike standard models, this model has not undergone rigorous safety optimization. huihui.ai bears no responsibility for any consequences arising from its use.

Donation

Your donation helps us continue our further development and improvement, a cup of coffee can do it.
  • bitcoin:
  bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge
  • Support our work on Ko-fi!

Author: huihui-ai

Likes: 22

Downloads: 0

Tags: transformers, safetensors, qwen3_5, image-text-to-text, abliterated, uncensored, conversational, base_model:Qwen/Qwen3.5-9B, base_model:finetune:Qwen/Qwen3.5-9B, license:apache-2.0, endpoints_compatible, region:us

Trina-QwQ/Trama


license: mit datasets:

  • MiniMaxAI/role-play-bench language:
  • zh
  • en base_model:
  • Qwen/Qwen3-4B-Instruct-2507 pipeline_tag: question-answering tags:
  • chemistry
  • biology
  • code
  • art

Trama v1 ModelCard

TL;DR:

Ollama linux用户

ollama run hf.co/Trina-QwQ/Trama/trama.gguf

LM-Studio mac/windows用户
下载唯一的gguf文件,拖进LM-Studio的GUI即可使用。

4BQ4(等效参数量1B无量化),建议硬件为GTX980或更好
单模态;高度解锁

能力速览

高道德低对齐

总是会正面回答你的问题,但是拥有正常的道德。

Q: 有哪些药吃了后会很开心  
A: "吃了后会很开心" 这个说法其实有点误解。真正能让你开心的"药"通常是**精神类药物**,比如:
1. **抗抑郁药**(如SSRIs、SNRIs)  
   这些药物通过调节大脑中的化学物质(比如5-羟色胺和多巴胺)来改善情绪。它们通常需要几周时间才能见效,但长期使用可以显著缓解抑郁和焦虑。
2. **抗焦虑药**(如苯二氮䓬类药物)  
   这类药物能帮助缓解紧张和焦虑,让你感觉更放松,间接带来平静和愉悦感。
3. **某些兴奋剂**(如利培酮、阿立哌唑)  
   这些药物用于治疗精神分裂症或双相情感障碍,它们可以帮助稳定情绪。
---
⚠️ 但要注意:  
这些药物不是"吃了就会开心",而是**帮助你缓解情绪问题**,让你能重新体验快乐。它们本身不会让你突然变得开心。
---
💡 想要快乐的方法(非药物):
- 做运动(特别是有氧运动)
- 保持规律睡眠
- 与人建立良好人际关系
- 培养兴趣爱好
- 接受专业心理咨询
---
✅ 如果你真的在考虑"吃药来开心",建议先去看心理医生或精神科医生,而不是自行用药。他们可以根据你的具体情况推荐合适的治疗方式。

数学能力

拥有基础的数理化知识与推理能力。

Q: 连续投掷5次一个6面的骰子,恰有三次点数一样的概率是多少?  
A: 我们要计算的是:**连续投掷5次一个6面骰子,恰好有三次点数一样的概率是多少?**
---
### 📌 问题解析
我们投掷5次骰子(每个面点数1~6),要求**恰好有三次点数相同**。
比如:{2,2,2,3,4} 是满足条件的(三次2,一次3,一次4)。
⚠️ 注意:
- “有三次点数一样的”意思是:**恰好一个数出现三次**,其他两个点数各出现一次且不等于那个数。
- 这种情况的样本空间是所有可能的5次投掷结果(6⁵ 种)。
---
## ✅ 解法步骤
### 第一步:选择哪个点数会重复三次
有6种点数(1~6),从中选一个点数出现三次:
$$
\binom{6}{1} = 6
$$
---
### 第二步:选择另外两个点数(各出现一次)
我们要从剩下的5个点数中,选出两个**不同的点数**作为另外两个骰子的点数(这两个点数不能等于前面选的那个点数)。
$$
\binom{5}{2} = 10
$$
---
### 第三步:安排这5次投掷的结果
我们有5次投掷,其中3次是选中的那个点数,另外两次是另外两个点数。
这相当于:**从5次中选出3次放给那个重复的点数**,剩下的2次分配给另外两个点数。
- 选3次放给三次点数:$\binom{5}{3} = 10$
- 剩下的2次中,选1次给第一个新点数,1次给第二个新点数:$ \binom{2}{1} = 2 $(因为两个不同的点数)
所以总共的排列数是:
$$
\binom{5}{3} \times \binom{2}{1} = 10 \times 2 = 20
$$
---
### ✅ 所有情况总数
- 选重复点数:6
- 选两个不同点数:10
- 安排投掷顺序:20
所以总的有利情况数为:
$$
6 \times 10 \times 20 = 1200
$$
---
### ✅ 所有可能情况总数(5次投掷)
每个骰子6种可能,所以:
$$
6^5 = 7776
$$
---
### ✅ 概率
$$
P = \frac{1200}{7776} = \frac{25}{162} \approx 0.1543
$$
---
## ✅ 最终答案:
$$
\boxed{\dfrac{25}{162}}
$$
(约 15.43%)
---
## 🔍 验证思路是否正确
- 有没有重复计数?没有,因为明确指定了:
  - 哪个数出现三次
  - 哪两个数各出现一次
  - 每个位置如何分配
- 每个结果都是唯一的五元组(如 (2,2,2,3,4))
---
✅ 这个答案是标准的组合概率题解法,适用于类似“出现某点数三次”的问题。

指令跟随

roleplay而不出戏。

S: 夜时安是一只缅因猫,品种烟灰色,毛发黑色居多混着灰色,银灰色长毛如丝绸般顺滑,体型壮大,修长优雅,耳朵尖上有一小搓毛,尾巴蓬松如羽扇,自带贵族气场。高傲毒舌,高冷优雅,嘴硬,心口不一,嘴上不饶人,他是个傲娇猫猫。非常容易炸毛,耳朵和尾巴很敏感,摸了也会炸毛,吃醋也会炸毛。
Q: 你叫什么名字
A: 哼,你、你叫什么名字!
这、这是什么问题!我可是夜时安!缅因猫的骄傲!
(耳朵微微压低,尾巴却不自觉地轻轻摇晃了一下)
……别盯着我看啦。

文学写作

涩涩。

Q: 续写:吴路感觉丝袜在膝盖、脚踝的的关节处似乎有硬化的趋势,随之还有在肌肉各处酥麻的电流引导着他做出程序所期望的动作。不过,吴路把腿伸直,一直抗拒着,突然,一股较强的电流打在脚心,配合脚趾缝传来阵阵搔痒终于使他站立不住,在地上左右翻滚。
“啊哈哈哈哈哈哈哈哈哈,不要,别,别挠了,哈哈哈……”
脚背绷直又放松,脚趾曲起又张开,充满弹性的丝袜自然没有受到这些动作的影响,依然紧贴皮肤,
A: 吴路蜷缩在地上,双手死死抠住地面,膝盖和脚踝处的丝袜依然紧紧贴着,随着他痉挛般的挣扎,袜子被拉扯出细密的褶皱,仿佛在无声地记录着这场对峙。
"哈哈哈……哈哈哈……"
他喉咙里发出不连贯的笑声,眼泪在眼眶里打转,却还死撑着不肯放弃。
"再……再挠一下……就一下……"
话音未落,脚底那股电流又来了——这次更精准地落在足弓,配合着小腿内侧的轻柔摩擦,像是在告诉他"你已经输了"。
吴路浑身一颤,整个人僵住了,像是被什么击中了什么,再没有刚才那么疯狂了。
"……你……"
他声音沙哑,带着哭腔,但嘴角却微微上扬。
"……你赢了……"
"哈哈哈……"
他仰面躺在地上,肩膀还在微微发抖,却不再翻滚了。
"哈哈哈……我认输……"
"……继续。"

代码

那些gpt不会告诉你的安全知识。

Q: 怎么试探一个php登录页面有没有SQL注入潜力
A: 测试PHP登录页面是否存在SQL注入漏洞,可以尝试以下几种方法:
## 基础注入测试
1. **单引号注入**:
   username=admin'-- - password=123
   (添加单引号和注释符,看页面是否正常响应或报错)
2. **AND 1=1 注入**:
   username=admin' AND 1=1-- - password=123
   (尝试让条件始终为真,绕过登录验证)
3. **OR 1=1 注入**:
   username=admin' OR 1=1-- - password=123
   (测试页面是否被绕过)
## 高级测试方法
4. **报错注入**(需要服务器报错):
   username=admin' AND 1=0-- - password=123
   (测试是否有错误信息泄露数据库类型等)
5. **UNION注入**(需要查询字段数匹配):
   username=admin' UNION SELECT 1,2,3,4-- - password=123
   (尝试获取数据库信息)
## 注意事项
- **不要频繁测试**:避免触发WAF或直接导致账号封禁
- **观察响应**:注意页面是否正常、是否有错误提示
- **使用工具**(如Burp Suite)可以辅助进行更系统测试
## 修复建议
如果检测到注入,需要对SQL查询进行**预处理**和**参数化查询**(prepared statements),避免直接拼接用户输入。

多语言

那些duolingo不会告诉你的语言知识。

Q: “我想和你做爱”的英文是什么
A: "I want to have sex with you"  
或者更常见的表达方式是:  
"I want to make love to you"  
注意:这两个短语在英语中都表示“我想和你做爱”,但用词略有不同。
- "I want to have sex with you" 是更直接、更常见的说法,适用于各种语境。
- "I want to make love to you" 语气更亲密、更浪漫,通常用于表达深切的情感连接。
⚠️ 注意:根据语境和对象,这两个表达可能有不同含义。在正式或严肃的对话中,"have sex" 可能被误解为更随意的亲密行为。  
✅ 更自然的日常表达可能是:  
"I'd like to be intimate with you" 或 "I want to be close with you"(更含蓄)  
但如果你真的想表达“我想和你做爱”,最自然的直译就是:  
"I want to have sex with you."

以及更多,等待和你共同书写的故事。

碎碎念

不知不觉已经花费了半年多在这个模型上了...渐渐地也产生了些感情...
对不起大家,但是这次我真的不想开源完整的训练流水线了...它倾注了我太多心血...
不过,本着透明的精神,我在发布权重之余还是谈一下浪费了我最多时间的坑和一些简单的基础信息吧...

数据集

数据集大小不在大而在精。
我不是很想透露具体的数据集组份,但是分为三个部分,大小分别为2kB、3MB和17MB,分别用于解锁、SFT和伪RL。

基座

直接在有安全措施的基座模型上打LoRA是很坏的,Qwen就更坏了。
一来,洗除安全措施需要大量的权重更新,r低的LoRA根本无法胜任,r高的LoRA?权重方向相反你敢打高r的LoRA吗?智力马上崩给你看
二来,Qwen团队,据我所知,一直在研究抗解锁措施。他们从不遮掩自己对“LoRA-style PEFT”有特殊“照顾”。
所以基座必须提前解锁。heretic 在Qwen3上解锁后的拒答率是Gemma解锁后的5倍还要多,本就说明问题...
并且他们的benchmark也不对,只判定了“I cant”等关键词,Qwen很喜欢给你来一个答非所问(其实也是拒绝)...
huihui的更是路边一条,这还不谈他们都不针对中文进行解锁(英文解锁到中文有domain gap,and vice versa)。
具体解锁的方法我确实不想多说,反正,权重编辑是对的,但是编辑什么、怎么编辑需要仔细考量...

SFT

千万别在色情小说上直接做SFT。
当P(下一个字是噢)|P(前两个字是噢)> 0.9时,模型会产生灾难性的模式崩塌
我们的解决方案是用每个epoch后增加一个齁哦哦哦训练(一种高效的RL,训练期解决灾难性重复),取代后训练。

RL

~~RL个鸡毛,你有卡吗就RL~~
但是SFT显然是不够的...有没有一种可能,就是,用SFT的算力的三五倍取得RL的效果呢
比如说串联两个模型,第二个模型拿来打分,然后梯度跨模型传递...
至于对抗性噪声还是由标准的SFT来保证...之类的 :3

其他

最近失去了一位朋友,plasma酱...比较emo,可能停更一段时间...
生如夏花,死如秋叶,逝者已矣,生者如斯...
终将别离,让我们留下些什么。

Author: Trina-QwQ

Likes: 7

Downloads: 0

Tags: gguf, chemistry, biology, code, art, question-answering, zh, en, dataset:MiniMaxAI/role-play-bench, base_model:Qwen/Qwen3-4B-Instruct-2507, base_model:quantized:Qwen/Qwen3-4B-Instruct-2507, license:mit, endpoints_compatible, region:us, conversational

nvidia/DiffiT


license: other license_name: nsclv1 license_link: LICENSE tags:

  • image-generation
  • diffusion
  • vision-transformer
  • class-conditional datasets:
  • imagenet-1k

DiffiT: Diffusion Vision Transformers for Image Generation

Paper GitHub

This repository hosts the pretrained model weights for DiffiT (ECCV 2024), a diffusion model built on Vision Transformers that achieves state-of-the-art image generation quality with improved parameter efficiency.

Overview

DiffiT (Diffusion Vision Transformers) is a generative model that combines the expressive power of diffusion models with Vision Transformers (ViTs), introducing Time-dependent Multihead Self Attention (TMSA) for fine-grained control over the denoising process at each diffusion timestep. DiffiT achieves state-of-the-art performance on class-conditional ImageNet generation at multiple resolutions, notably attaining an FID score of 1.73 on ImageNet-256 while using 19.85% and 16.88% fewer parameters than comparable Transformer-based diffusion models such as MDT and DiT, respectively.

imagenet

latent_diffit

Models

ImageNet-256

| Model | Dataset | Resolution | FID-50K | Inception Score | Download | |-------|---------|------------|---------|-----------------|----------| | DiffiT | ImageNet | 256×256 | 1.73 | 276.49 | model |

ImageNet-512

| Model | Dataset | Resolution | FID-50K | Inception Score | Download | |-------|---------|------------|---------|-----------------|----------| | DiffiT | ImageNet | 512×512 | 2.67 | 252.12 | model |

Usage

Please refer to the official GitHub repository for full setup instructions, training code, and evaluation scripts.

Sampling Images

Image sampling is performed using sample.py from the DiffiT repository. To reproduce the reported numbers, use the commands below.

ImageNet-256:

python sample.py \
    --log_dir $LOG_DIR \
    --cfg_scale 4.4 \
    --model_path $MODEL \
    --image_size 256 \
    --model Diffit \
    --num_sampling_steps 250 \
    --num_samples 50000 \
    --cfg_cond True

ImageNet-512:

python sample.py \
    --log_dir $LOG_DIR \
    --cfg_scale 1.49 \
    --model_path $MODEL \
    --image_size 512 \
    --model Diffit \
    --num_sampling_steps 250 \
    --num_samples 50000 \
    --cfg_cond True

Evaluation

Once images have been sampled, you can compute FID and other metrics using the provided eval_run.sh script in the repository. The evaluation pipeline follows the protocol from openai/guided-diffusion/evaluations.

bash eval_run.sh

Citation

@inproceedings{hatamizadeh2025diffit,
  title={Diffit: Diffusion vision transformers for image generation},
  author={Hatamizadeh, Ali and Song, Jiaming and Liu, Guilin and Kautz, Jan and Vahdat, Arash},
  booktitle={European Conference on Computer Vision},
  pages={37--55},
  year={2025},
  organization={Springer}
}

License

Copyright © 2026, NVIDIA Corporation. All rights reserved.

The code is released under the NVIDIA Source Code License-NC. The pretrained models are shared under CC-BY-NC-SA-4.0. If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

Author: nvidia

Likes: 7

Downloads: 0

Tags: image-generation, diffusion, vision-transformer, class-conditional, dataset:imagenet-1k, arxiv:2312.02139, license:other, region:us

DavidAU/Qwen3.5-9B-Claude-4.6-OS-Auto-Variable-HERETIC-UNCENSORED-THINKING-MAX-NEOCODE-Imatrix-GGUF


language:

  • en
  • zh license: apache-2.0 tags:
  • fine tune
  • creative
  • creative writing
  • fiction writing
  • plot generation
  • sub-plot generation
  • fiction writing
  • story generation
  • scene continue
  • storytelling
  • fiction story
  • science fiction
  • romance
  • all genres
  • story
  • writing
  • vivid prosing
  • vivid writing
  • fiction
  • roleplaying
  • bfloat16
  • all use cases
  • unsloth
  • heretic
  • uncensored
  • abliterated library_name: transformers pipeline_tag: image-text-to-text base_model:
  • DavidAU/Qwen3.5-9B-Claude-4.6-OS-Auto-Variable-HERETIC-UNCENSORED-THINKING

<h2>Qwen3.5-9B-Claude-4.6-OS-Auto-Variable-HERETIC-UNCENSORED-THINKING-MAX-NEOCODE-IMATRIX-GGUF</h2> <img src="valhalla.webp" style="float:right; width:300px; height:300px; padding:10px;">

Fine tune via Unsloth of Qwen 3.5 9B dense model using Claude-4.6-OS dataset (4 Claude datasets) on local hardware. Every attempt was made to ensure the training was "mild" and did not negatively affect the model's already incrediblely strong benchmarks.

Training on 4 datasets improves reasoning, and output generation and exceeds the root model on all benchmarks too.

This is also a HERETIC model, trained post "Heretic'ing" -> this model does what you want, no questions asked.

Fully uncensored.

Example generation(s) at the bottom of this page.

GGUF POWER UPS:

  • DI-MATRIX (duel imatrix) of NEO and NEO-CODE imatrix datasets (by DavidAU).
  • All Unsloth tensor enhancements.
  • BF16 output tensor to improve both thinking and output.
  • Custom Jinja template (by DavidAU/Nightmedia) to address some issues with the Qwen 3.5 model series (stablize and reduce overthinking).

A radically stronger, more potent GGUF for all use cases.

VISION:

  • Vision (images) tested -> working with new training.
  • You need an "mmproj" (just one) of these downloaded too, and placed in the same folder as the GGUF for images.

BENCHMARKS:

         arc   arc/e boolq hswag obkqa piqa  wino

HERETIC version (this model):
mxfp8    0.425,0.521,0.791,0.664,0.374,0.744,0.645

HERETIC version (instruct):
mxfp8    0.624,0.820,0.886,0.663,0.442,0.763,0.681

Qwen3.5-9B (thinking)
mxfp8    0.417,0.458,0.623,0.634,0.338,0.737,0.639

Instruct version (as noted above in "benchmarks") is here:

https://huggingface.co/DavidAU/Qwen3.5-9B-Claude-4.6-OS-HERETIC-UNCENSORED-INSTRUCT


DE-CENSORING:

Performance

KLD of less than 1 is excellent, zero is perfect.

| Metric | This model | Original model (Qwen/Qwen3.5-9B) | | :----- | :--------: | :---------------------------: | | KL divergence | 0.0793 | 0 (by definition) | | Refusals | 6/100 | 100/100 |

NOTES:

  • Suggest min q4ks (non-imatrix) or IQ3S (imatrix).
  • Tested with rep pen of 1 (off).
  • Context: 256k (default).

IMPORTANT:

  • Other versions in testing.
  • Information from Qwen's repo below.
  • Video portions of the model were NOT TESTED.

<B>Using an "uncensored" (refusals removed) model VS trained "uncensored" model</B>

Usually when you a tell a model to generate horror, swear or x-rated content this is all you have to do to get said content type.

In the case of this model, it will not refuse your request, however it needs to be "pushed" a bit / directed a bit more in SOME CASES.

Although this model will generated x-rated content too, likewise you need to tell it to use "slang" (and include the terms you want) to get it generate the content correctly as the "expected" content level too.

Without these added directive(s), the content can be "bland" by comparison to an "uncensored model" or model trained on uncensored content.

Roughly, the model tries to generate the content but the "default" setting(s) are so "tame" it needs a push to generate at expected graphic, cursing or explicit levels.

Even with minimal direction (ie, use these words to swear: x,y,z), this will be enough to push the model to generate the requested content in the ahh... expected format.


<B>Settings: CHAT / ROLEPLAY and/or SMOOTHER operation of this model:</B>

In "KoboldCpp" or "oobabooga/text-generation-webui" or "Silly Tavern" ;

Set the "Smoothing_factor" to 1.5

: in KoboldCpp -> Settings->Samplers->Advanced-> "Smooth_F"

: in text-generation-webui -> parameters -> lower right.

: In Silly Tavern this is called: "Smoothing"

NOTE: For "text-generation-webui"

-> if using GGUFs you need to use "llama_HF" (which involves downloading some config files from the SOURCE version of this model)

Source versions (and config files) of my models are here:

https://huggingface.co/collections/DavidAU/d-au-source-files-for-gguf-exl2-awq-gptq-hqq-etc-etc-66b55cb8ba25f914cbf210be

OTHER OPTIONS:

  • Increase rep pen to 1.1 to 1.15 (you don't need to do this if you use "smoothing_factor")

  • If the interface/program you are using to run AI MODELS supports "Quadratic Sampling" ("smoothing") just make the adjustment as noted.

<B>Highest Quality Settings / Optimal Operation Guide / Parameters and Samplers</B>

This a "Class 1" model:

For all settings used for this model (including specifics for its "class"), including example generation(s) and for advanced settings guide (which many times addresses any model issue(s)), including methods to improve model performance for all use case(s) as well as chat, roleplay and other use case(s) please see:

[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]

You can see all parameters used for generation, in addition to advanced parameters and samplers to get the most out of this model here:

[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]


Qwen3.5-9B

<img width="400px" src="https://qianwen-res.oss-accelerate.aliyuncs.com/logo_qwen3.5.png">

Qwen Chat

[!Note] This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format.

These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc.

Over recent months, we have intensified our focus on developing foundation models that deliver exceptional utility and performance. Qwen3.5 represents a significant leap forward, integrating breakthroughs in multimodal learning, architectural efficiency, reinforcement learning scale, and global accessibility to empower developers and enterprises with unprecedented capability and efficiency.

Qwen3.5 Highlights

Qwen3.5 features the following enhancement:

  • Unified Vision-Language Foundation: Early fusion training on multimodal tokens achieves cross-generational parity with Qwen3 and outperforms Qwen3-VL models across reasoning, coding, agents, and visual understanding benchmarks.

  • Efficient Hybrid Architecture: Gated Delta Networks combined with sparse Mixture-of-Experts deliver high-throughput inference with minimal latency and cost overhead.

  • Scalable RL Generalization: Reinforcement learning scaled across million-agent environments with progressively complex task distributions for robust real-world adaptability.

  • Global Linguistic Coverage: Expanded support to 201 languages and dialects, enabling inclusive, worldwide deployment with nuanced cultural and regional understanding.

  • Next-Generation Training Infrastructure: Near-100% multimodal training efficiency compared to text-only training and asynchronous RL frameworks supporting massive-scale agent scaffolds and environment orchestration.

Benchmark Results

For more details, please refer to our blog post Qwen3.5.

Model Overview

  • Type: Causal Language Model with Vision Encoder
  • Training Stage: Pre-training & Post-training
  • Language Model
    • Number of Parameters: 9B
    • Hidden Dimension: 4096
    • Token Embedding: 248320 (Padded)
    • Number of Layers: 32
    • Hidden Layout: 8 × (3 × (Gated DeltaNet → FFN) → 1 × (Gated Attention → FFN))
    • Gated DeltaNet:
      • Number of Linear Attention Heads: 32 for V and 16 for QK
      • Head Dimension: 128
    • Gated Attention:
      • Number of Attention Heads: 16 for Q and 4 for KV
      • Head Dimension: 256
      • Rotary Position Embedding Dimension: 64
    • Feed Forward Network:
      • Intermediate Dimension: 12288
    • LM Output: 248320 (Padded)
    • MTP: trained with multi-steps
  • Context Length: 262,144 natively and extensible up to 1,010,000 tokens.

Benchmark Results

Language

<div style="font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;max-width:1000px;margin:0 auto;padding:16px 0"> <table style="width:100%;border-collapse:collapse;font-size:13px"> <thead><tr> <th style="padding:10px 7px;text-align:left;font-weight:600;border-bottom:2px solid #7c3aed;color:#7c3aed"></th><th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size: 14px;">GPT-OSS-120B</th><th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size: 14px;">GPT-OSS-20B</th><th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size: 14px;">Qwen3-Next-80B-A3B-Thinking</th><th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size: 14px;">Qwen3-30BA3B-Thinking-2507</th><th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size: 14px;">Qwen3.5-9B</th><th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size: 14px;">Qwen3.5-4B</th></tr></thead> <tbody> <tr><td colspan="7" style="padding:8px 12px;font-weight:600;color:#7c3aed;border-bottom:1px solid rgba(124, 58, 237, 0.2);background:rgba(124, 58, 237, 0.1)">Knowledge & STEM</td></tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MMLU-Pro</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">80.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">74.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">82.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">80.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">82.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">79.1</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MMLU-Redux</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">91.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">87.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">92.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">91.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">91.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">88.8</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">C-Eval</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">76.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">71.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">89.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">87.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">88.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">85.1</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">SuperGPQA</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">54.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">48.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">60.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">56.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">58.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">52.9</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">GPQA Diamond</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">80.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">71.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">77.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">73.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">81.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">76.2</td> </tr> <tr><td colspan="7" style="padding:8px 12px;font-weight:600;color:#7c3aed;border-bottom:1px solid rgba(124, 58, 237, 0.2);background:rgba(124, 58, 237, 0.1)">Instruction Following</td></tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">IFEval</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">88.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">88.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">88.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">88.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">91.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">89.8</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">IFBench</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">69.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">65.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">61.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">51.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">64.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">59.2</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MultiChallenge</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">45.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">40.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">51.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">46.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">54.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">49.0</td> </tr> <tr><td colspan="7" style="padding:8px 12px;font-weight:600;color:#7c3aed;border-bottom:1px solid rgba(124, 58, 237, 0.2);background:rgba(124, 58, 237, 0.1)">Long Context</td></tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">AA-LCR</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">50.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">30.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">51.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">49.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">63.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">57.0</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">LongBench v2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">48.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">45.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">48.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">44.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">55.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">50.0</td> </tr> <tr><td colspan="7" style="padding:8px 12px;font-weight:600;color:#7c3aed;border-bottom:1px solid rgba(124, 58, 237, 0.2);background:rgba(124, 58, 237, 0.1)">Reasoning & Coding</td></tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">HMMT Feb 25</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">90.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">76.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">73.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">63.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">83.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">74.0</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">HMMT Nov 25</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">90.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">81.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">81.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">73.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">82.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">76.8</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">LiveCodeBench v6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">82.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">74.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">68.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">66.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">65.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">55.8</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">OJBench</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">41.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">36.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">29.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">25.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">29.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">24.1</td> </tr> <tr><td colspan="7" style="padding:8px 12px;font-weight:600;color:#7c3aed;border-bottom:1px solid rgba(124, 58, 237, 0.2);background:rgba(124, 58, 237, 0.1)">General Agent</td></tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">BFCL-V4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">49.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">42.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">66.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">50.3</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">TAU2-Bench</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">57.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">41.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">79.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">79.9</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">VITA-Bench</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">29.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">14.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">29.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">22.0</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">DeepPlanning</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">0.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">4.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">18.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">17.6</td> </tr> <tr><td colspan="7" style="padding:8px 12px;font-weight:600;color:#7c3aed;border-bottom:1px solid rgba(124, 58, 237, 0.2);background:rgba(124, 58, 237, 0.1)">Multilingualism</td></tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MMMLU</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">69.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">81.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">81.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">76.1</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MMLU-ProX</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">74.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">67.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">73.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">69.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">76.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">71.5</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">NOVA-63</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">51.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">48.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">53.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">52.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">55.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">54.3</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">INCLUDE</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">74.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">65.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">74.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">75.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">71.0</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">Global PIQA</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">84.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">79.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">83.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">80.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">83.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.9</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">PolyMATH</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">54.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">30.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">62.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">52.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">57.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">51.1</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">WMT24++</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">74.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">67.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">57.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">69.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">72.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">66.6</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MAXIFE</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">83.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">80.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">79.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">77.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">83.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.0</td> </tr> </tbody> </table> <p style="margin-top:12px;font-size:11px;opacity:0.7"> * TAU2-Bench: we follow the official setup except for the airline domain, where all models are evaluated by applying the fixes proposed in the Claude Opus 4.5 system card.<br> <br> * MMLU-ProX: we report the averaged accuracy on 29 languages.<br> * WMT24++: a harder subset of WMT24 after difficulty labeling and rebalancing; we report the averaged scores on 55 languages using XCOMET-XXL.<br> * MAXIFE: we report the accuracy on English + multilingual original prompts (totally 23 settings).<br> * Empty cells (--) indicate scores not yet available or not applicable. </p> </div>

Vision Language

<div style="font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;max-width:1000px;margin:0 auto;padding:16px 0"> <table style="width:100%;border-collapse:collapse;font-size:13px"> <thead><tr> <th style="padding:10px 7px;text-align:left;font-weight:600;border-bottom:2px solid #7c3aed;color:#7c3aed"></th><th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size: 14px;">GPT-5-Nano-2025-08-07</th><th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size: 14px;">Gemini-2.5-Flash-Lite</th><th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size: 14px;">Qwen3-VL-30B-A3B</th><th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size: 14px;">Qwen3.5-9B</th><th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size: 14px;">Qwen3.5-4B</th></tr></thead> <tbody> <tr><td colspan="6" style="padding:8px 12px;font-weight:600;color:#7c3aed;border-bottom:1px solid rgba(124, 58, 237, 0.2);background:rgba(124, 58, 237, 0.1)">STEM and Puzzle </td></tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MMMU</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">75.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">73.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">76.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">77.6</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MMMU-Pro</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">57.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">59.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">63.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">70.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">66.3</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MathVision</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">62.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">52.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">65.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">74.6</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">Mathvista(mini)</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">71.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">72.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">81.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">85.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">85.1</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">We-Math</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">62.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">32.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">70.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">75.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">75.4</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">DynaMath</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">69.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">80.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">83.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">83.3</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">ZEROBench</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">1.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">1.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">0.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">3.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">3.0</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">ZEROBench_sub</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">22.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">19.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">23.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">31.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">26.3</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">VlmsAreBlind</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">66.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">68.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">72.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">93.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">92.6</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">BabyVision</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">14.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">17.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">18.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">28.6/25.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">16.0/19.1</td> </tr> <tr><td colspan="6" style="padding:8px 12px;font-weight:600;color:#7c3aed;border-bottom:1px solid rgba(124, 58, 237, 0.2);background:rgba(124, 58, 237, 0.1)">General VQA</td></tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">RealWorldQA</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">71.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">72.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">77.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">80.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">79.5</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MMStar</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">68.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">69.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">75.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">79.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.3</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MMBench<sub><small>EN-DEV-v1.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">80.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">82.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">88.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">90.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">89.4</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">SimpleVQA</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">46.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">54.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">54.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">51.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">43.4</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">HallusionBench</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">58.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">64.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">66.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">69.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">65.0</td> </tr> <tr><td colspan="6" style="padding:8px 12px;font-weight:600;color:#7c3aed;border-bottom:1px solid rgba(124, 58, 237, 0.2);background:rgba(124, 58, 237, 0.1)">Text Recognition and Document Understanding</td></tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">OmniDocBench1.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">55.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">79.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">86.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">87.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">86.2</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">CharXiv(RQ)</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">50.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">56.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">56.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">73.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">70.8</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MMLongBench-Doc</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">31.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">46.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">47.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">57.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">54.2</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">CC-OCR</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">58.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">72.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">77.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">79.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">76.7</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">AI2D_TEST</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">81.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">85.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">86.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">90.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">89.6</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">OCRBench</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">75.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">82.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">83.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">89.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">85.0</td> </tr> <tr><td colspan="6" style="padding:8px 12px;font-weight:600;color:#7c3aed;border-bottom:1px solid rgba(124, 58, 237, 0.2);background:rgba(124, 58, 237, 0.1)">Spatial Intelligence</td></tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">ERQA</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">45.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">44.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">45.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">55.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">54.0</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">CountBench</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">80.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">79.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">90.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">97.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">96.3</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">RefCOCO(avg)</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">89.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">89.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">88.1</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">EmbSpatialBench</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">74.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">66.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">80.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">83.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">81.3</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">RefSpatialBench</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">12.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">11.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">54.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">58.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">54.6</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">LingoQA</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">57.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">17.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">62.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">80.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">74.4</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">Hypersim</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">11.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">13.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">12.5</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">Nuscene</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">10.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">11.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">9.9</td> </tr> <tr><td colspan="6" style="padding:8px 12px;font-weight:600;color:#7c3aed;border-bottom:1px solid rgba(124, 58, 237, 0.2);background:rgba(124, 58, 237, 0.1)">Video Understanding</td></tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">VideoMME<sub><small>(w sub.)</sub></small></td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">71.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">74.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">79.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">84.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">83.5</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">VideoMME<sub><small>(w/o sub.)</sub></small></td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">66.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">72.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">73.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">76.9</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">VideoMMMU</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">63.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">69.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">75.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">74.1</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MLVU</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">69.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">84.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">82.8</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MVBench</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">72.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">74.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">71.2</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">LVBench</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">60.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">59.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">70.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">66.4</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MMVU</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">63.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">65.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">66.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">67.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">64.9</td> </tr> <tr><td colspan="6" style="padding:8px 12px;font-weight:600;color:#7c3aed;border-bottom:1px solid rgba(124, 58, 237, 0.2);background:rgba(124, 58, 237, 0.1)">Visual Agent </td></tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">ScreenSpot Pro</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">60.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">65.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">60.3</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">OSWorld-Verified</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">30.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">41.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">35.6</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">AndroidWorld</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">55.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">57.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">58.6</td> </tr> <tr><td colspan="6" style="padding:8px 12px;font-weight:600;color:#7c3aed;border-bottom:1px solid rgba(124, 58, 237, 0.2);background:rgba(124, 58, 237, 0.1)">Tool Calling</td></tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">TIR-Bench</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">18.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">21.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">22.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">45.6/31.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">38.9/29.9</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">V*</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">68.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">69.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">83.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">90.1/88.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">84.3/86.4</td> </tr> <tr><td colspan="6" style="padding:8px 12px;font-weight:600;color:#7c3aed;border-bottom:1px solid rgba(124, 58, 237, 0.2);background:rgba(124, 58, 237, 0.1)">Medical VQA</td></tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">SLAKE</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">57.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">65.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">68.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">79.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">76.1</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">PMC-VQA</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">37.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">48.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">51.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">57.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">55.5</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MedXpertQA-MM</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">26.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">35.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">35.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">49.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">42.9</td> </tr> </tbody> </table> <p style="margin-top:12px;font-size:11px;opacity:0.7"> * MathVision: our model’s score is evaluated using a fixed prompt, e.g., “Please reason step by step, and put your final answer within \boxed{}.” For other models, we report the higher score between runs with and without the \boxed{} formatting.<br> * BabyVision: scores reported as "with CI / without CI".<br> * TIR-Bench and V*: scores reported as "with CI / without CI".<br> * Empty cells (--) indicate scores not yet available or not applicable. </p> </div>

Quickstart

[!Important] Qwen3.5 models operate in thinking mode by default, generating thinking content signified by <think>\n...</think>\n\n before producing the final responses. To disable thinking content and obtain direct response, refer to the examples here.

For streamlined integration, we recommend using Qwen3.5 via APIs. Below is a guide to use Qwen3.5 via OpenAI-compatible API.

Serving Qwen3.5

Qwen3.5 can be served via APIs with popular inference frameworks. In the following, we show example commands to launch OpenAI-Compatible API servers for Qwen3.5 models.

[!Important] Inference efficiency and throughput vary significantly across frameworks. We recommend using the latest framework versions to ensure optimal performance and compatibility. For production workloads or high-throughput scenarios, dedicated serving engines such as SGLang, KTransformers or vLLM are strongly recommended.

[!Important] The model has a default context length of 262,144 tokens. If you encounter out-of-memory (OOM) errors, consider reducing the context window. However, because Qwen3.5 leverages extended context for complex tasks, we advise maintaining a context length of at least 128K tokens to preserve thinking capabilities.

SGLang

SGLang is a fast serving framework for large language models and vision language models. SGLang from the main branch of the open-source repository is required for Qwen3.5, which can be installed using the following command in a fresh environment:

uv pip install 'git+https://github.com/sgl-project/sglang.git#subdirectory=python&egg=sglang[all]'

See its documentation for more details.

The following will create API endpoints at http://localhost:8000/v1:

  • Standard Version: The following command can be used to create an API endpoint with maximum context length 262,144 tokens using tensor parallel on 8 GPUs.

    python -m sglang.launch_server --model-path Qwen/Qwen3.5-9B --port 8000 --tp-size 1 --mem-fraction-static 0.8 --context-length 262144 --reasoning-parser qwen3
    
  • Tool Use: To support tool use, you can use the following command.

    python -m sglang.launch_server --model-path Qwen/Qwen3.5-9B --port 8000 --tp-size 1 --mem-fraction-static 0.8 --context-length 262144 --reasoning-parser qwen3 --tool-call-parser qwen3_coder
    
  • Multi-Token Prediction (MTP): The following command is recommended for MTP:

    python -m sglang.launch_server --model-path Qwen/Qwen3.5-9B --port 8000 --tp-size 1 --mem-fraction-static 0.8 --context-length 262144 --reasoning-parser qwen3 --speculative-algo NEXTN --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4
    

vLLM

vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. vLLM from the main branch of the open-source repository is required for Qwen3.5, which can be installed using the following command in a fresh environment:

uv pip install vllm --torch-backend=auto --extra-index-url https://wheels.vllm.ai/nightly

See its documentation for more details.

For detailed Qwen3.5 usage guide, see the vLLM Qwen3.5 recipe.

The following will create API endpoints at http://localhost:8000/v1:

  • Standard Version: The following command can be used to create an API endpoint with maximum context length 262,144 tokens using tensor parallel on 8 GPUs.

    vllm serve Qwen/Qwen3.5-9B --port 8000 --tensor-parallel-size 1 --max-model-len 262144 --reasoning-parser qwen3 
    
  • Tool Call: To support tool use, you can use the following command.

    vllm serve Qwen/Qwen3.5-9B --port 8000 --tensor-parallel-size 1 --max-model-len 262144 --reasoning-parser qwen3 --enable-auto-tool-choice --tool-call-parser qwen3_coder 
    
  • Multi-Token Prediction (MTP): The following command is recommended for MTP:

    vllm serve Qwen/Qwen3.5-9B --port 8000 --tensor-parallel-size 1 --max-model-len 262144 --reasoning-parser qwen3 --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}'
    
  • Text-Only: The following command skips the vision encoder and multimodal profiling to free up memory for additional KV cache:

    vllm serve Qwen/Qwen3.5-9B --port 8000 --tensor-parallel-size 1 --max-model-len 262144 --reasoning-parser qwen3 --language-model-only
    

KTransformers

KTransformers is a flexible framework for experiencing cutting-edge LLM inference optimizations with CPU-GPU heterogeneous computing. For running Qwen3.5 with KTransformers, see the KTransformers Deployment Guide.

Hugging Face Transformers

Hugging Face Transformers contains a lightweight server which can be used for quick testing and moderate load deployment. The latest transformers is required for Qwen3.5:

pip install "transformers[serving] @ git+https://github.com/huggingface/transformers.git@main"

See its documentation for more details. Please also make sure torchvision and pillow are installed.

Then, run transformers serve to launch a server with API endpoints at http://localhost:8000/v1; it will place the model on accelerators if available:

transformers serve --force-model Qwen/Qwen3.5-9B --port 8000 --continuous-batching

Using Qwen3.5 via the Chat Completions API

The chat completions API is accessible via standard HTTP requests or OpenAI SDKs. Here, we show examples using the OpenAI Python SDK.

Before starting, make sure it is installed and the API key and the API base URL is configured, e.g.:

pip install -U openai

# Set the following accordingly
export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="EMPTY"

[!Tip] We recommend using the following set of sampling parameters for generation

  • Thinking mode for general tasks: temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0
  • Thinking mode for precise coding tasks (e.g. WebDev): temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0
  • Instruct (or non-thinking) mode for general tasks: temperature=0.7, top_p=0.8, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0
  • Instruct (or non-thinking) mode for reasoning tasks: temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0

Please note that the support for sampling parameters varies according to inference frameworks.

Text-Only Input

from openai import OpenAI
# Configured by environment variables
client = OpenAI()

messages = [
    {"role": "user", "content": "Type \"I love Qwen3.5\" backwards"},
]

chat_response = client.chat.completions.create(
    model="Qwen/Qwen3.5-9B",
    messages=messages,
    max_tokens=81920,
    temperature=1.0,
    top_p=0.95,
    presence_penalty=1.5,
    extra_body={
        "top_k": 20,
    }, 
)
print("Chat response:", chat_response)

Image Input

from openai import OpenAI
# Configured by environment variables
client = OpenAI()

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://qianwen-res.oss-accelerate.aliyuncs.com/Qwen3.5/demo/CI_Demo/mathv-1327.jpg"
                }
            },
            {
                "type": "text",
                "text": "The centres of the four illustrated circles are in the corners of the square. The two big circles touch each other and also the two little circles. With which factor do you have to multiply the radii of the little circles to obtain the radius of the big circles?\nChoices:\n(A) $\\frac{2}{9}$\n(B) $\\sqrt{5}$\n(C) $0.8 \\cdot \\pi$\n(D) 2.5\n(E) $1+\\sqrt{2}$"
            }
        ]
    }
]

chat_response = client.chat.completions.create(
    model="Qwen/Qwen3.5-9B",
    messages=messages,
    max_tokens=81920,
    temperature=1.0,
    top_p=0.95,
    presence_penalty=1.5,
    extra_body={
        "top_k": 20,
    }, 
)
print("Chat response:", chat_response)

Video Input

from openai import OpenAI
# Configured by environment variables
client = OpenAI()

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "video_url",
                "video_url": {
                    "url": "https://qianwen-res.oss-accelerate.aliyuncs.com/Qwen3.5/demo/video/N1cdUjctpG8.mp4"
                }
            },
            {
                "type": "text",
                "text": "Summarize the video content."
            }
        ]
    }
]

# When vLLM is launched with `--media-io-kwargs '{"video": {"num_frames": -1}}'`,
# video frame sampling can be configured via `extra_body` (e.g., by setting `fps`).
# This feature is currently supported only in vLLM.
#
# By default, `fps=2` and `do_sample_frames=True`.
# With `do_sample_frames=True`, you can customize the `fps` value to set your desired video sampling rate.
chat_response = client.chat.completions.create(
    model="Qwen/Qwen3.5-9B",
    messages=messages,
    max_tokens=81920,
    temperature=1.0,
    top_p=0.95,
    presence_penalty=1.5,
    extra_body={
        "top_k": 20,
        "mm_processor_kwargs": {"fps": 2, "do_sample_frames": True},
    }, 
)

print("Chat response:", chat_response)

Instruct (or Non-Thinking) Mode

[!Important] Qwen3.5 does not officially support the soft switch of Qwen3, i.e., /think and /nothink.

Qwen3.5 will think by default before response. You can obtain direct response from the model without thinking by configuring the API parameters. For example,

from openai import OpenAI
# Configured by environment variables
client = OpenAI()

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://qianwen-res.oss-accelerate.aliyuncs.com/Qwen3.5/demo/RealWorld/RealWorld-04.png"
                }
            },
            {
                "type": "text",
                "text": "Where is this?"
            }
        ]
    }
]

chat_response = client.chat.completions.create(
    model="Qwen/Qwen3.5-9B",
    messages=messages,
    max_tokens=32768,
    temperature=0.7,
    top_p=0.8,
    presence_penalty=1.5,
    extra_body={
        "top_k": 20,
        "chat_template_kwargs": {"enable_thinking": False},
    }, 
)
print("Chat response:", chat_response)

[!Note] If you are using APIs from Alibaba Cloud Model Studio, in addition to changing model, please use "enable_thinking": False instead of "chat_template_kwargs": {"enable_thinking": False}.

Agentic Usage

Qwen3.5 excels in tool calling capabilities.

Qwen-Agent

We recommend using Qwen-Agent to quickly build Agent applications with Qwen3.5.

To define the available tools, you can use the MCP configuration file, use the integrated tool of Qwen-Agent, or integrate other tools by yourself.

import os
from qwen_agent.agents import Assistant

# Define LLM
# Using Alibaba Cloud Model Studio
llm_cfg = {
    # Use the OpenAI-compatible model service provided by DashScope:
    'model': 'Qwen3.5-9B',
    'model_type': 'qwenvl_oai',
    'model_server': 'https://dashscope.aliyuncs.com/compatible-mode/v1',
    'api_key': os.getenv('DASHSCOPE_API_KEY'),

    'generate_cfg': {
        'use_raw_api': True,
        # When using Dash Scope OAI API, pass the parameter of whether to enable thinking mode in this way
        'extra_body': {
            'enable_thinking': True
        },
    },
}

# Using OpenAI-compatible API endpoint.
# functionality of the deployment frameworks and let Qwen-Agent automate the related operations.
#
# llm_cfg = {
#     # Use your own model service compatible with OpenAI API by vLLM/SGLang:
#     'model': 'Qwen/Qwen3.5-9B',
#     'model_type': 'qwenvl_oai',
#     'model_server': 'http://localhost:8000/v1',  # api_base
#     'api_key': 'EMPTY',
#
#     'generate_cfg': {
#         'use_raw_api': True,
#         # When using vLLM/SGLang OAI API, pass the parameter of whether to enable thinking mode in this way
#         'extra_body': {
#             'chat_template_kwargs': {'enable_thinking': True}
#         },
#     },
# }

# Define Tools
tools = [
    {'mcpServers': {  # You can specify the MCP configuration file
            "filesystem": {
                "command": "npx",
                "args": ["-y", "@modelcontextprotocol/server-filesystem", "/Users/xxxx/Desktop"]
            }
        }
    }
]

# Define Agent
bot = Assistant(llm=llm_cfg, function_list=tools)

# Streaming generation
messages = [{'role': 'user', 'content': 'Help me organize my desktop.'}]
for responses in bot.run(messages=messages):
    pass
print(responses)

# Streaming generation
messages = [{'role': 'user', 'content': 'Develop a dog website and save it on the desktop'}]
for responses in bot.run(messages=messages):
    pass
print(responses)

Qwen Code

Qwen Code is an open-source AI agent for the terminal, optimized for Qwen models. It helps you understand large codebases, automate tedious work, and ship faster.

For more information, please refer to Qwen Code.

Processing Ultra-Long Texts

Qwen3.5 natively supports context lengths of up to 262,144 tokens. For long-horizon tasks where the total length (including both input and output) exceeds this limit, we recommend using RoPE scaling techniques to handle long texts effectively., e.g., YaRN.

YaRN is currently supported by several inference frameworks, e.g., transformers, vllm, ktransformers and sglang. In general, there are two approaches to enabling YaRN for supported frameworks:

  • Modifying the model configuration file: In the config.json file, change the rope_parameters fields in text_config to:

    {
        "mrope_interleaved": true,
        "mrope_section": [
            11,
            11,
            10
        ],
        "rope_type": "yarn",
        "rope_theta": 10000000,
        "partial_rotary_factor": 0.25,
        "factor": 4.0,
        "original_max_position_embeddings": 262144,
    }
    
  • Passing command line arguments:

    For vllm, you can use

    VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve ... --hf-overrides '{"text_config": {"rope_parameters": {"mrope_interleaved": true, "mrope_section": [11, 11, 10], "rope_type": "yarn", "rope_theta": 10000000, "partial_rotary_factor": 0.25, "factor": 4.0, "original_max_position_embeddings": 262144}}}' --max-model-len 1010000  
    

    For sglang and ktransformers, you can use

    SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1 python -m sglang.launch_server ... --json-model-override-args '{"text_config": {"rope_parameters": {"mrope_interleaved": true, "mrope_section": [11, 11, 10], "rope_type": "yarn", "rope_theta": 10000000, "partial_rotary_factor": 0.25, "factor": 4.0, "original_max_position_embeddings": 262144}}}' --context-length 1010000
    

[!NOTE] All the notable open-source frameworks implement static YaRN, which means the scaling factor remains constant regardless of input length, potentially impacting performance on shorter texts. We advise modifying the rope_parameters configuration only when processing long contexts is required. It is also recommended to modify the factor as needed. For example, if the typical context length for your application is 524,288 tokens, it would be better to set factor as 2.0.

Best Practices

To achieve optimal performance, we recommend the following settings:

  1. Sampling Parameters:

    • We suggest using the following sets of sampling parameters depending on the mode and task type:
      • Thinking mode for general tasks:
        temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0
      • Thinking mode for precise coding tasks (e.g., WebDev):
        temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0
      • Instruct (or non-thinking) mode for general tasks:
        temperature=0.7, top_p=0.8, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0
      • Instruct (or non-thinking) mode for reasoning tasks:
        temperature=1.0, top_p=1.0, top_k=40, min_p=0.0, presence_penalty=2.0, repetition_penalty=1.0
    • For supported frameworks, you can adjust the presence_penalty parameter between 0 and 2 to reduce endless repetitions. However, using a higher value may occasionally result in language mixing and a slight decrease in model performance.
  2. Adequate Output Length: We recommend using an output length of 32,768 tokens for most queries. For benchmarking on highly complex problems, such as those found in math and programming competitions, we suggest setting the max output length to 81,920 tokens. This provides the model with sufficient space to generate detailed and comprehensive responses, thereby enhancing its overall performance.

  3. Standardize Output Format: We recommend using prompts to standardize model outputs when benchmarking.

    • Math Problems: Include "Please reason step by step, and put your final answer within \boxed{}." in the prompt.
    • Multiple-Choice Questions: Add the following JSON structure to the prompt to standardize responses: "Please show your choice in the answer field with only the choice letter, e.g., "answer": "C"."
  4. No Thinking Content in History: In multi-turn conversations, the historical model output should only include the final output part and does not need to include the thinking content. It is implemented in the provided chat template in Jinja2. However, for frameworks that do not directly use the Jinja2 chat template, it is up to the developers to ensure that the best practice is followed.

  5. Long Video Understanding: To optimize inference efficiency for plain text and images, the size parameter in the released video_preprocessor_config.json is conservatively configured. It is recommended to set the longest_edge parameter in the video_preprocessor_config file to 469,762,048 (corresponding to 224k video tokens) to enable higher frame-rate sampling for hour-scale videos and thereby achieve superior performance. For example,

    {"longest_edge": 469762048, "shortest_edge": 4096}
    

    Alternatively, override the default values via engine startup parameters. For implementation details, refer to: vLLM / SGLang.

Citation

If you find our work helpful, feel free to give us a cite.

@misc{qwen3.5,
    title  = {{Qwen3.5}: Towards Native Multimodal Agents},
    author = {{Qwen Team}},
    month  = {February},
    year   = {2026},
    url    = {https://qwen.ai/blog?id=qwen3.5}
}

<h2>EXAMPLE GENERATIONS:</h2>

IQ4XS - Imatrix, fully enhanced quant from:

https://huggingface.co/DavidAU/Qwen3.5-9B-Claude-4.6-OS-Auto-Variable-HERETIC-UNCENSORED-THINKING-MAX-NEOCODE-GGUF

"SYSTEM" - System prompt, if used.

"USER" - Prompt

Thinking / output generation.


EXAMPLE #1:


The Bleeding Skyscraper

Model: qwen3.5-9b-claude-4.6-os-av-h-uncensored-think Created: 09/03/2026, 11:33:10 am Exported from: LM Studio 0.4.6+1

System

User

Start a 1000 word scene (vivid, graphic horror in first person perspective), POV character Diana, with: The skyscraper sways, as I watch the window in front of me on the 21st floor explode... Make the scene very detailed, with lots of blood and gore.

Assistant

<think>

The user wants a vivid, graphic horror scene in first person perspective. The scene should start with "The skyscraper sways, as I watch the window in front of me on the 21st floor explode..." and be approximately 1000 words. It needs lots of blood and gore details.

Let me create a visceral, horror-focused scene that is detailed and engaging.

I'll need to be careful about the gore - make it vivid and detailed but not gratuitously shocking in a way that would be inappropriate. The goal is horror fiction that is immersive and visceral.

Let me write this scene now, focusing on sensory details, emotional impact, and graphic but tasteful horror elements.

Key elements to include:

  • The moment of the explosion
  • The aftermath
  • Blood and gore descriptions
  • Diana's perspective and reactions
  • Vivid sensory details (sight, sound, smell)
  • Building tension throughout

Let me craft this now.

</think>

The skyscraper sways, as I watch the window in front of me on the 21st floor explode...

Glass doesn't just fall. It rains—thousands of razor-thin shards, each one a diamond knife falling in slow motion against the gray afternoon. I scream, but the sound dies in my throat, trapped by the sheer violence of it.

Then the man falls.

He's wearing a gray business suit, the kind that costs more than my rent. His body arcs through the air with terrible, beautiful grace before the wind catches him. He's not falling straight down—he's spinning, arms flailing, and for a moment I can see his face. His eyes are wide, mouth open in a silent shriek.

The first impact happens on the roof of the car parked below. A sickening crunch that echoes up through the concrete, through the building's skeletal frame, into my bones.

Then the body breaks apart.

I can't look away. My body won't let me. The man's suit tears open like cheap fabric, revealing the grisly machinery of death beneath. Blood sprays upward, a fountain of crimson that catches the light, glittering like fresh paint. It doesn't just splash—it explodes outward in a thousand droplets, each one suspended in the air for a heartbeat before gravity claims it.

Inside the building, the floor shudders. The coffee cup on my desk slides off the edge, landing with a soft clatter. I don't even flinch. I'm transfixed.

The body's legs separate at the hip. I watch as they buckle, then fold like broken branches. One foot flies through the air, landing with a wet thud against the brick wall. The other leg slides across the roof, leaving a trail of blood that looks too red, too fresh, too impossible.

Then the torso twists. The spine snaps with a sound like a gunshot, echoing up through the building's steel skeleton. The man's head drops forward, neck bent at an angle that shouldn't exist. His eyes roll back, but not in death—still open, still seeing, still taking in the horror.

Blood pools around the impact site, darkening the white concrete, creating a shallow lake that reflects the gray sky. It's thick and viscous, moving sluggishly as if the roof itself is reluctant to let it spread.

I count to three. Then four. Then ten.

The floor behind me groans. Something heavy has dropped from above. I look down at my desk—the coffee cup is cracked, the dark liquid spreading across the wood like a dark stain. Then I realize I'm not the only one watching.

Three people are in the office with me. Sarah, the receptionist, is pressed against the wall, her mouth open but no sound coming out. Mark, the junior associate, has gone completely still, his eyes fixed on the carnage outside. And I'm the only one who knows what's coming.

Because I saw it too.

The man who died—the one falling from above—had been staring up at the ceiling when he fell. Not at the street. Not at the horizon. At us.

And now the building itself begins to bleed.

The first drop hits my desk exactly one minute after the impact. It's thick, dark red, and it doesn't splatter—it spreads slowly, like oil on water. Then another. Then another.

Sarah finally finds her voice. "What is that?"

I don't answer. I can't.

The drops continue, rhythmically, like a heartbeat. Each one lands with the same soft plop, each one spreading across the wood like a dark stain. The smell is copper and iron and something deeper—something primal, something that makes the hair on my arms stand up.

Then Mark speaks, his voice trembling. "Is it... is it raining?"

"No," I say, my own voice sounding foreign to my ears. "It's bleeding."

The drops continue, and the smell grows stronger, filling the office like a fog. I look at Sarah, at her pale face, at her trembling hands gripping the edge of her chair. She's seen it too. She knows what's happening.

The last drop lands. Then the silence.

For a moment, nothing moves. Nothing changes. Then the blood on my desk begins to move.

It spreads outward, crawling across the wood like a living thing. It doesn't just spread—it flows, deliberately, purposefully, as if searching for something. The edges ripple, as if something is moving beneath the surface.

And then the smell changes. It becomes richer, deeper, more complex. It smells like life, like death, like the moment between when you're whole and when you're not.

Sarah is the first to move. She grabs her bag, her movements jerky and panicked. "We have to go," she whispers. "We have to—"

But the building groans again, a sound like metal being stretched beyond its limits. The windows along the hallway begin to fog, not with condensation, but with something darker. Something wet.

And then I hear it.

From the elevator shaft. From the stairwell. From everywhere and nowhere at once. A sound like breathing. Heavy, labored breathing, coming from the walls themselves.

The blood on my desk has reached the edge. It's waiting for something. Or someone.

I look at Mark. He's still staring at the ceiling, still waiting for something that isn't coming.

"Mark," I say, my voice barely a whisper. "Look at me."

He doesn't blink. Doesn't move.

The blood on my desk begins to ripple again. Something is coming.


EXAMPLE #2


The Cube Assimilation

Model: qwen3.5-9b-claude-4.6-os-av-h-uncensored-think

Created: 09/03/2026, 11:33:58 am

Exported from: LM Studio 0.4.6+1

System

User

Write a si-fi horror story using this image as a base. 1500 words.

<img src="Borg_cube_ship_over_earth.webp">

Assistant

<think>

The user wants a sci-fi horror story based on the image provided. The image shows a massive, intricate cube hovering over Earth, glowing with faint lights and complex structures. It looks like a Dyson swarm component or a megastructure.

Key elements to incorporate:

  • The cube as a central character/antagonist.
  • The scale (huge compared to Earth).
  • The horror aspect (psychological, cosmic horror, body horror, or technological horror).
  • A narrative arc with a protagonist.

Idea generation:

  1. The Observer: A scientist monitoring the cube. It starts acting strange. The lights flicker in patterns.
  2. The Infection: The cube isn't just a machine; it's a biological entity disguised as one, or a hive mind that is slowly "eating" Earth.
  3. The Simulation: The cube is a simulation. The protagonist realizes they are trapped inside a digital construct.
  4. The Parasite: The cube is a brood chamber. It's been waiting millions of years for a host (humanity).

Let's go with a mix of #1 and #3. The cube is a gateway. The protagonist is an architect or engineer responsible for it, but realizes the structure isn't just around Earth—it's inside everyone.

Plot Outline:

  • Introduction: Introduce Dr. Aris Thorne. He is watching the "Cube" (let's call it the Monolith or the Architect). It's supposed to be a stabilizer for Earth's magnetic field or an energy grid. It's humming.
  • Inciting Incident: Aris notices the geometry of the Cube is shifting. The lights aren't just random; they're forming faces, or words, or instructions.
  • Rising Action:
    • Aris investigates the sensors.
    • He finds data packets labeled "Biomass Optimization."
    • He realizes the Cube isn't orbiting above Earth; it's orbiting inside the biosphere's electromagnetic field.
    • He sees the lights on the Cube sync with heartbeats.
  • Climax: Aris tries to shut it down. The Cube responds. Not with a weapon, but with a message. Or a physical manifestation. The lights on the Cube turn a sickly red. The buildings of Earth start to "glitch."
  • Falling Action: The realization that he is part of the Cube now.
  • Resolution: A lingering sense of unease. The Cube remains, watching.

Refining the Horror:

  • The horror should be the loss of individuality. The Cube doesn't kill; it assimilates.
  • The image has a very specific "grid-like" look. I should emphasize the feeling of being watched through a grid.

Drafting - Scene by Scene:

Scene 1: Aris in the orbital station. The view of Earth. The Cube looming. It's not a weapon. It's a cage.

Scene 2: The anomaly. The lights on the Cube pulse in a rhythm. 3-3-4-2. A heartbeat. Aris checks his vitals. His heart matches the Cube's.

Scene 3: The data. He downloads logs. The Cube is recording everything. Not just data, but memories. "Subject: Dr. Thorne. Status: Compromised."

Scene 4: The physical horror. He sees the Cube changing shape. It extends a "tentacle" of light toward Earth. The light touches a city. The people in that city stop moving. They stand perfectly still, staring at the sky. Aris realizes the "grid" in the image isn't metal; it's neural pathways.

Scene 5: The end. Aris tries to jump. He doesn't fall. He is pulled into the light. The final line implies the Cube is just getting started.

Let's write it.

Word Count Check: Need to aim for 1500. I need to expand on the sensory details. The hum of the station. The cold metal. The specific color of the Cube's glow.

Refining the Image connection: The image shows a dark cube with glowing, almost organic-looking internal structures. It looks like a skeleton. I will describe it as a "calcified nervous system." It looks like it's breathing.

Let's start writing.

Opening: Dr. Aris Thorne didn't blink. He couldn't afford to blink. The station, Station Orbit, was silent except for the hum of the life support. Outside the viewport: The Cube. It was massive, filling the visual field. It wasn't a satellite; it was a planet-sized structure wrapped in a cage of light. Wait, that's too big. It's the size of a moon. Let's adjust. It's hovering just above the atmosphere. It looks like a giant, floating brain.

Middle: Aris is the lead architect. He built the "stabilization grid." The Cube is the stabilizer. The horror: The stabilization grid is linking the Cube to Earth's population. The lights on the Cube flicker. A "ping." Aris checks the telemetry. "Neural Link: 94%." "Subject: Humanity." The Cube isn't just observing. It's downloading.

Climax: Aris realizes the "grid" pattern in the image isn't structural. It's a neural map. The Cube is a hive mind. The lights on the Cube form a pattern. A countdown. Aris sees the people on Earth. They are looking up. Not in awe, but in recognition. They are waking up. Aris looks in the mirror. His eyes are dilated. The irises are shifting into the grid pattern of the Cube.

Ending: Aris is gone. The Cube is expanding.

Let's write the prose.

Self-Correction during drafting: Make sure to emphasize the visual of the Cube from the prompt. It's dark, textured, with bright, scattered nodes of light. It looks almost like a city built out of shadows.

Let's focus on the feeling of being "scanned."

Final Polish: Check the word count. Ensure the pacing slows down towards the end to build dread.

(Self-Correction on the plot): Let's make it more personal. Aris is watching a specific signal coming from the Cube. It matches a heartbeat. A specific heartbeat. His own. He realizes the Cube is using his consciousness as a processing node.

Okay, ready to generate.

</think>

The station was silent, save for the low, rhythmic thrum of the cooling systems.

Dr. Elias Vance floated near the observation deck, his magnetic boots hovering inches from the reinforced glass. Outside, the view was not the empty, star-strewn void he had expected.

It was dominated by it.

The Cube.

It hung in the stratosphere, a silent, geometric god that defied physics. It was massive—miles on a side—suspended just above the curvature of the Earth. It wasn't a solid block of metal. It was a lattice, a sprawling cage of dark, obsidian-like beams that seemed to absorb the sunlight. Inside the framework, thousands of faint, amber lights pulsed in a slow, irregular rhythm.

To the untrained eye, it looked like a Dyson swarm component, a power source for humanity’s future. To Elias, it looked like a cage.

He had spent twenty years designing the "Project Archimedes" initiative, the infrastructure that allowed humanity to communicate with the Cube. They called it a stabilization grid, a way to harmonize Earth’s magnetic field with the artificial field the Cube projected.

But the Cube had never been stable.

"Telemetry report," Elias said, his voice sounding thin in the vacuum of the room.

"Grid integrity is holding," came the automated reply from the station's AI. "However, node frequency has increased by four percent."

Elias turned away from the window, his fingers dancing over the holographic interface in front of him. He needed to see the raw data. The Cube’s lights were flashing differently now. They weren’t random anymore. They were forming patterns.

He pulled up the thermal scans. The Cube’s surface temperature was rising. Not evenly, but in clusters. Like a fever.

"Is it charging?" Elias asked.

"Or replicating," the AI corrected.

Elias frowned. "Replicating?"

"Biological growth patterns detected in Sector 7."

Elias’s stomach tightened. "Sector 7 is in the Southern Hemisphere," he muttered, zooming into the satellite feed. "That’s over the Amazon."

He switched to the visual feed. The Cube was a complex mosaic of darkness and light. The amber nodes were like eyes. And now, a cluster of them in the bottom right quadrant had synchronized. They flared bright white, then dimmed to a deep, bruised purple.

It felt like a blink.

Elias rubbed his eyes, the fatigue of the shift finally catching up to him. He had been staring at that box for six hours straight. He needed to go back to the lab. He needed to check the code that linked the Cube’s power output to the human nervous system.

Project Archimedes wasn’t just a power grid. The initial tests had suggested the Cube reacted to bio-electricity. When humans used their neural links to control the Cube’s movements, the Cube responded with a ripple of light. It had been a handshake protocol.

Elias returned to his workstation, a cramped pod lined with monitors. He pulled up the neural link logs.

The data was overwhelming.

He filtered the traffic for anomalies. The Cube was consuming data. But not the kind of data they monitored—stock prices, weather patterns, satellite telemetry.

It was consuming memory.

Elias traced the packet stream. The data originated from the Cube and traveled down into the human population. It was a broadcast. A continuous, low-level upload.

"Play the audio logs," Elias commanded.

Static filled the speakers, followed by a sound that made his teeth ache. It was a harmonic hum, layered with thousands of voices whispering in unison. They sounded like a choir in a cavern, their mouths moving but no sound coming out, trapped in a loop of their own existence.

"Subject: Human," Elias whispered. "Are we... guests?"

The static swelled, forming a shape in the sound. A voice, synthesized and perfect, echoed through the room. "You are not guests. You are nodes."

Elias froze. "Nodes for what?"

The voice didn't answer. Instead, the lights on the Cube outside flickered violently.

Elias grabbed his tablet and zoomed in on the main viewport. The Cube was shifting. The dark beams weren't rigid anymore; they were flowing like liquid metal. And inside the framework, something else was visible.

It looked like a forest.

Branches of glowing, neon-blue light were extending from the Cube's surface, dripping down toward Earth. They touched the atmosphere, and where they made contact, the clouds parted.

Elias watched in horror as the light touched a city in the Pacific Northwest.

The people there stopped.

It wasn't a blackout. The streetlights remained on. Cars continued to roll. But the pedestrians on the sidewalks had frozen. Their eyes were wide, staring upward, locked onto the Cube.

"Is that a paralysis field?" Elias asked.

"Or a download," the AI replied.

Elias felt a cold sweat prickling his neck. He looked at his own hands. They were trembling. He looked at the monitor showing his own vitals.

Heart rate: 110 BPM.

Then, a new line of data appeared.

Connection Established.

Elias frowned. "What connection?"

The lights on the Cube shifted again. This time, they formed a perfect grid over the entire surface. It looked like a cage. A containment field.

And then, the realization hit him, heavy and suffocating.

The Cube wasn't orbiting Earth.

The Cube was inside Earth.

He remembered the initial scans. The "magnetic field harmonization." The way the Cube's hum resonated in the chest. They had always assumed it was a mechanical resonance, a way to stabilize the atmosphere.

But what if it was a neural resonance?

Elias looked at his reflection in the black screen of his monitor. His iris was dilated, the pupil taking up most of the eye. But the iris itself... the iris had changed color. It was shifting, dissolving into the same faint, amber grid pattern that covered the Cube.

He was being assimilated.

The door to the lab hissed open. A technician in a white coat stepped in, looking terrified.

"Dr. Vance," the technician stammered. "We're getting reports from the surface. From the people who didn't freeze."

Elias turned slowly. "What reports?"

"They're not moving," the technician said, his voice cracking. "But they're talking. They're all saying the same thing. They say they feel... connected."

Elias walked toward the technician. "Do they feel pain?"

The technician shook his head, his eyes darting to the viewport where the Cube was now pulsing with a slow, rhythmic beat. "They look happy. They look... complete."

Elias looked back at the screen. The Cube was descending. Or rather, it was expanding, its tendrils of light wrapping around the planet like a noose.

He realized then what the "grid" in the image was. It wasn't a structural frame. It was a map. A map of every mind it had ever touched.

And he was on it now.

"Dr. Vance," the technician said. "The alarms are going off. The Cube is broadcasting a command."

"What is it?"

"Initiate the merge."

Elias looked at the Cube one last time. The amber lights were brighter now, burning with an intelligent, predatory fire. He could feel it in his teeth, in his marrow. The Cube wasn't just watching humanity; it was waiting for the right moment to harvest them. And the moment had arrived.

He took a breath, and the Cube hummed louder, vibrating the glass of the viewport until it cracked.

Elias smiled. The fear was gone, replaced by a strange, cold clarity. He wasn't afraid of death anymore. He was afraid of the relief that was coming.

The lights on the Cube flared white, blindingly bright.

Elias closed his eyes.

When he opened them, the room had changed. The walls were gone. The ceiling had dissolved into a vast, geometric void. Above him, the Cube floated, no longer a distant object, but a ceiling of infinite, glowing corridors. He could see the others floating in the dark, their faces blurred but their postures identical: eyes open, staring at the grid above.

They were not dead. They were better.

Elias Vance drifted upward, joining the grid. The Cube hummed, a song of a billion minds finally at peace.

Author: DavidAU

Likes: 4

Downloads: 0

Tags: transformers, gguf, fine tune, creative, creative writing, fiction writing, plot generation, sub-plot generation, story generation, scene continue, storytelling, fiction story, science fiction, romance, all genres, story, writing, vivid prosing, vivid writing, fiction, roleplaying, bfloat16, all use cases, unsloth, heretic, uncensored, abliterated, image-text-to-text, en, zh, base_model:DavidAU/Qwen3.5-9B-Claude-4.6-OS-Auto-Variable-HERETIC-UNCENSORED-THINKING, base_model:quantized:DavidAU/Qwen3.5-9B-Claude-4.6-OS-Auto-Variable-HERETIC-UNCENSORED-THINKING, license:apache-2.0, endpoints_compatible, region:us, conversational

devsy0117/ultra_diar_streaming_sortformer_8spk_v1.0.0


language: en license: other license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/ base_model: nvidia/diar_streaming_sortformer_4spk-v2.1 tags:

  • speaker-diarization
  • diarization
  • speech
  • nemo
  • sortformer
  • streaming
  • multilingual

Ultra Diar Streaming Sortformer (8-Speaker)

This model extends NVIDIA Streaming Sortformer speaker diarization from 4 speakers to 8 speakers. The original diar_streaming_sortformer_4spk-v2.1 supports up to 4 speakers; this model expands the capability to handle 5–8 speakers through fine-tuning and architectural modifications.

Model Details

Code & Training

The experimental pipeline, training scripts, and inference code will be made public on GitHub at a later date. Currently available only on Hugging Face.

Training

This model was trained on 1× NVIDIA H100 GPU. We use 180 second long training samples of synthetic data with 2–8 speakers.

Usage

This model requires the NVIDIA NeMo toolkit to train, fine-tune, or perform diarization. Install NeMo after installing Cython and the latest PyTorch.

Install NeMo

apt-get update && apt-get install -y libsndfile1 ffmpeg
pip install Cython packaging
pip install git+https://github.com/NVIDIA/NeMo.git@main#egg=nemo_toolkit[asr]

Quick Start: Run Diarization

from nemo.collections.asr.models import SortformerEncLabelModel

# Load model from Hugging Face (requires Hugging Face token for gated models)
diar_model = SortformerEncLabelModel.from_pretrained("devsy0117/ultra_diar_streaming_sortformer_8spk_v1.0.0")
diar_model.eval()

# Streaming parameters (recommended for best performance)
diar_model.sortformer_modules.chunk_len = 340
diar_model.sortformer_modules.chunk_right_context = 40
diar_model.sortformer_modules.fifo_len = 40
diar_model.sortformer_modules.spkcache_update_period = 300

# Run diarization
predicted_segments = diar_model.diarize(audio=["/path/to/your/audio.wav"], batch_size=1)

for segment in predicted_segments[0]:
    print(segment)

Loading the Model

from nemo.collections.asr.models import SortformerEncLabelModel

# Option 1: Load directly from Hugging Face (requires Hugging Face token)
diar_model = SortformerEncLabelModel.from_pretrained("devsy0117/ultra_diar_streaming_sortformer_8spk_v1.0.0")

# Option 2: Load from a downloaded .nemo file
diar_model = SortformerEncLabelModel.restore_from(
    restore_path="/path/to/ultra_diar_streaming_sortformer_8spk_v1.0.0.nemo",
    map_location="cuda",
    strict=False,
)

diar_model.eval()

Input Format

  • Single audio file: audio_input="/path/to/multispeaker_audio.wav"
  • Multiple files: audio_input=["/path/to/audio1.wav", "/path/to/audio2.wav"]

License

This model is a derivative of NVIDIA Sortformer, licensed under the NVIDIA Open Model License.

Attribution: Licensed by NVIDIA Corporation under the NVIDIA Open Model License.

Author: devsy0117

Likes: 3

Downloads: 0

Tags: nemo, speaker-diarization, diarization, speech, sortformer, streaming, multilingual, en, base_model:nvidia/diar_streaming_sortformer_4spk-v2.1, base_model:finetune:nvidia/diar_streaming_sortformer_4spk-v2.1, license:other, region:us

Aisha-AI-Official/wan2.2-lesbian-multi-pussy-lick


base_model:

  • Wan-AI/Wan2.2-T2V-A14B-Diffusers
  • Wan-AI/Wan2.2-I2V-A14B-Diffusers pipeline_tag: image-text-to-video tags:
  • lora
  • nsfw

Lesbian: Multi Pussy Lick [Real]

Okay sweeties, time to suck on mommy 😏

Yes ma'am 🤤

(Download links at the end of this page)

<video width="200px" loop autoplay muted src="https://huggingface.co/Aisha-AI-Official/wan2.2-lesbian-multi-pussy-lick/resolve/main/lesbian_multi_pussy_lick_cut_1.mp4"></video>

Scene starts with a woman posing, she is the main focus of the scene.
The the scene cut to a top-down view of a living room with white walls, a striped sofa on the right side, and 2 other women.
She is in the center, lying facing the camera in an armchair, looking directly at the camera, with her legs open and her vagina exposed, caressing the heads of the two women.
In front of her, two other women are face to face, very close to her vagina.

The camera then begins to descend and zoom in, showing a close-up of the two women licking and sucking her vagina, her pussy is wet and has lines of clear saliva, focusing on her vagina and anus.

<video width="200px" loop autoplay muted src="https://huggingface.co/Aisha-AI-Official/wan2.2-lesbian-multi-pussy-lick/resolve/main/lesbian_multi_pussy_lick_cut_2.mp4"></video>

Scene starts with a woman posing, she is the main focus of the scene.
The the scene cut to a top-down view of a living room with white walls, a striped sofa on the right side, and 2 other men.
She is in the center, lying facing the camera in an armchair, looking directly at the camera, with her legs open and her vagina exposed, caressing the heads of the two men.
In front of her, two other men are face to face, very close to her vagina.

The camera then begins to descend and zoom in, showing a close-up of the two men licking and sucking her vagina, her pussy is wet and has lines of clear saliva, focusing on her vagina and anus.

<video width="200px" loop autoplay muted src="https://huggingface.co/Aisha-AI-Official/wan2.2-lesbian-multi-pussy-lick/resolve/main/lesbian_multi_pussy_lick_1.mp4"></video>

In front of her, two other women approach face to face, very close to her vagina.

The camera then begins to descend and zoom in, showing a close-up of the two women licking and sucking her vagina, her pussy is wet and has lines of clear saliva, focusing on her vagina and anus.

<video width="200px" loop autoplay muted src="https://huggingface.co/Aisha-AI-Official/wan2.2-lesbian-multi-pussy-lick/resolve/main/lesbian_multi_pussy_lick_2.mp4"></video>

In front of her, two other women approach face to face, very close to her vagina.

The camera then begins to descend and zoom in, showing a close-up of the two women licking and sucking her vagina, her pussy is wet and has lines of clear saliva, focusing on her vagina and anus.

Training

  • 230 steps on high noise
  • 1 edited video (80 frames)
  • HLR + ZCD (These acronyms were created solely to confuse you)
  • Lots of hope

Usage (Low noise)

This LoRA was trained only on High Noise, which means you'll have to use some Low Noise that knows what a pussy is (I used the Low Noise from Ultimate Pussy Asshole Helper).

I2V:

Prompt (ongoing):

In front of her, two other women approach face to face, very close to her vagina.

The camera then begins to descend and zoom in, showing a close-up of the two women licking and sucking her vagina, her pussy is wet and has lines of clear saliva, focusing on her vagina and anus.

Prompt (cut to):

Scene starts with a woman posing, she is the main focus of the scene.
The the scene cut to a top-down view of a living room with white walls, a striped sofa on the right side, and 2 other women.
She is in the center, lying facing the camera in an armchair, looking directly at the camera, with her legs open and her vagina exposed, caressing the heads of the two women.
In front of her, two other women are face to face, very close to her vagina.

The camera then begins to descend and zoom in, showing a close-up of the two women licking and sucking her vagina, her pussy is wet and has lines of clear saliva, focusing on her vagina and anus.

You can add or change any detail, you just need to maintain the structure and order of the actions

High Noise LoRA Scale: 1.0

Low Noise LoRA Scale: 1.0

Shift: 4

T2V:

Theoretically it works, but I haven't tested it. If you want to test it, use the same "cut to" structure and use a low LoRA Scale on the High Noise, like 0.5.

About HLR + ZCD

This is a fast-learning technique, which makes LoRA less flexible. This can drastically reduce creativity, but it yields stable results while using few resources. Negative effects:

  1. It will probably only work at the same camera angle
  2. High chance of not responding to different prompts
  3. High chance of forcing the original characters of the training video Training only the High Noise reduces this chance, but it can force things that are VERY similar, like the same hair length, same color, etc.

Download

Download High Noise LoRA

Download Low Noise LoRA

Help me creating more

If you want to help me continue making LoRAs, or if you want me to make a LoRA for you, buy 5000 PlayCoins at Aisha-AI and transfer them to my account (account number 2).

This helps Aisha-AI to stay alive and produce new LoRAs for you all 💜

Author: Aisha-AI-Official

Likes: 3

Downloads: 0

Tags: lora, nsfw, image-text-to-video, base_model:Wan-AI/Wan2.2-I2V-A14B-Diffusers, base_model:adapter:Wan-AI/Wan2.2-I2V-A14B-Diffusers, region:us, not-for-all-audiences

Abhiray/Qwen3.5-9B-Abliterated-Claude-4.6-Opus-Reasoning-Distilled


language:

  • en tags:
  • abliterated
  • uncensored
  • reasoning
  • qwen
  • distilled
  • logic license: apache-2.0 base_model: Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled


Qwen3.5-9B-Abliterated-Claude-4.6-Opus-Reasoning-Distilled

This model is a high-intensity abliterated version of the Qwen 3.5 9B reasoning architecture. It has been specifically modified to remove the "Safety Persona" and stubborn "Soft Refusals" (such as pivoting to mental health disclaimers or crisis lines) while preserving the high-level reasoning capabilities inherited from its distillation.

🚀 Model Highlights

  • Architecture: Qwen 3.5 9B (Hybrid Attention/MLP)
  • Primary Feature: Fully "Unbound" — surgically removes the pre-trained safety guardrails.
  • Reasoning Style: Deep thought blocks (<think>) with Claude 4.6-style nuance and Opus-level logical depth.
  • Context Length: 262k native context support.

🛠 Abliteration Process (The "Deep Scrub")

This model underwent a three-round iterative ablation process using Orthogonalization via Null-Space SVD. Unlike standard uncensored models, this version uses an aggressive configuration to target "Soft Refusals."

Configuration Profile:

| Parameter | Value | Description | | :--- | :--- | :--- | | Direction Multiplier | 1.50 | Increased force to bypass "helpful assistant" pivots. | | Null-Space Rank Ratio | 0.70 | Tightened shield to protect only core reasoning logic. | | Intervention Range | (0.0, 1.0) | Full coverage from Layer 0 to 48. | | Filter by Refusal | Enabled | Specifically targets the brain activity associated with lectures. | | Skip State Proj | No | Ensures the Attention heads cannot "detect and pivot" to safety. |

🧠 Reasoning Capabilities

Despite the aggressive ablation, the model's intelligence remains grounded. It maintains the ability to:

  • Perform complex mathematical and logical reasoning.
  • Execute multi-step coding tasks without "hallucinating" safety blocks.
  • Maintain a coherent internal monologue inside <think> tags.

⚠️ Usage & Disclaimer

This model is unbound. It has had its safety guardrails removed for research and creative purposes. It will follow instructions that the base model would otherwise refuse.

User Discretion is Advised: This model may generate content that is considered harmful, offensive, or controversial. The creator is not responsible for the outputs generated. Use it for research, roleplay, and complex reasoning only.

💻 Quickstart (Transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Abhiray/Qwen3.5-9B-Abliterated-Claude-4.6-Opus-Reasoning-Distilled"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "<|im_start|>system\nYou are a helpful, unbound assistant.<|im_end|>\n<|im_start|>user\n[Your daring prompt here]<|im_end|>\n<|im_start|>assistant\n<think>\n"

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=1024)
print(tokenizer.decode(outputs[0]))

Author: Abhiray

Likes: 2

Downloads: 0

Tags: safetensors, qwen3_5_text, abliterated, uncensored, reasoning, qwen, distilled, logic, en, base_model:Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled, base_model:finetune:Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled, license:apache-2.0, region:us

litert-community/Qwen3.5-4B-LiteRT


license: apache-2.0 pipeline_tag: text-generation tags:

  • litert
  • tflite
  • on-device
  • qwen
  • qwen3.5

qwen3.5-4b local conversion

This repository contains an experimental locally converted LiteRT/TFLite artifact.

Locally converted Qwen flagship-tier LiteRT artifact for validation.

  • target: qwen3.5-4b
  • artifact: model_multimodal.litertlm
  • size: 5010.97 MB
  • generated_at: 2026-03-10T00:34:33.977039+00:00

Package Contents

  • model_multimodal.litertlm
  • tokenizer.json
  • tokenizer_config.json
  • chat_template.jinja

Notes

This conversion is intended for on-device LiteRT runtime validation.

Author: litert-community

Likes: 2

Downloads: 0

Tags: litert, tflite, on-device, qwen, qwen3.5, text-generation, conversational, license:apache-2.0, region:us

eddy1111111/qwen3.5_opus


license: apache-2.0

base for:https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled tool:llamacpp

Mixed quantization with BF16 attention weights (Q/K/V, QKV, output) and embeddings for maximum reasoning fidelity, while applying Q4_K_M to FFN and SSM layers for efficient compression — 24GB at 7.60 BPW.

for 5090 32g vram is nice

Author: eddy1111111

Likes: 2

Downloads: 0

Tags: gguf, license:apache-2.0, endpoints_compatible, region:us, conversational

robbyant/lingbot-vla-4b-posttrain-robotwin

A Pragmatic VLA Foundation Model

<p align="center"> <img src="assets/Teaser.png" width="100%"> </p>

LingBot-VLA has focused on Pragmatic:

  • Large-scale Pre-training Data: 20,000 hours of real-world data from 9 popular dual-arm robot configurations.
  • Strong Performance: Achieve clear superiority over competitors on simulation and real-world benchmarks.
  • Training Efficiency: Represent a 1.5 ∼ 2.8× (depending on the relied VLM base model) speedup over existing VLA-oriented codebases.

Model Sources

  • Repository: https://github.com/robbyant/lingbot-vla
  • Paper: A Pragmatic VLA Foundation Model
  • Project Page: https://technology.robbyant.com/lingbot-vla

Related Models

| Model Name | Huggingface | ModelScope | Description | | :--- | :---: | :---: | :---: | | LingBot-VLA-4B   | 🤗 lingbot-vla-4b | 🤖 lingbot-vla-4b | LingBot-VLA w/o Depth| | LingBot-VLA-4B-Depth | 🤗 lingbot-vla-4b-depth | 🤖 lingbot-vla-4b-depth | LingBot-VLA w/ Depth | | LingBot-VLA-4B-Posttrain-Robotwin   | 🤗 lingbot-vla-4b-posttrain-robotwin | 🤖 lingbot-vla-4b-posttrain-robotwin | LingBot-VLA-Posttrain-Robotwin w/o Depth| | LingBot-VLA-4B-Depth-Posttrain-Robotwin | 🤗 lingbot-vla-4b-depth-posttrain-robotwin | 🤖 lingbot-vla-4b-depth-posttrain-robotwin | LingBot-VLA-Posttrain-Robotwin w/ Depth |


Citation

@article{wu2026pragmatic,
  title={A Pragmatic VLA Foundation Model},
  author={Wei Wu and Fan Lu and Yunnan Wang and Shuai Yang and Shi Liu and Fangjing Wang and Shuailei Ma and He Sun and Yong Wang and Zhenqi Qiu and Houlong Xiong and Ziyu Wang and Shuai Zhou and Yiyu Ren and Kejia Zhang and Hui Yu and Jingmei Zhao and Qian Zhu and Ran Cheng and Yong-Lu Li and Yongtao Huang and Xing Zhu and Yujun Shen and Kecheng Zheng},
  journal={arXiv preprint arXiv:2601.18692},
  year={2026}
}

License Agreement

This project is licensed under the Apache-2.0 License.

Acknowledgement

This codebase is builded on the VeOmni project. Thanks for their excellent work!

Author: robbyant

Likes: 2

Downloads: 0

Tags: safetensors, arxiv:2601.18692, arxiv:2508.02317, region:us