Todays AI Summary

AI Developments: New Models Emerge Alongside Robotics and Reasoning Research

This week's AI landscape sees the introduction of several new models, alongside interesting research papers focusing on robotics, visual reasoning, and language model analysis.

Research Highlights

The research papers this week cover a diverse set of topics:

  • Modality Translation: "Towards General Modality Translation with Contrastive and Predictive Latent Diffusion Bridge" introduces a new framework, the Latent Denoising Diffusion Bridge Model (LDDBM), for translating information across different sensory modalities. It uses a shared latent space and contrastive alignment loss to achieve semantic consistency. The paper demonstrates strong performance on tasks like multi-view to 3D shape generation and image super-resolution.
  • Robotics and Navigation: "VAMOS: A Hierarchical Vision-Language-Action Model for Capability-Modulated and Steerable Navigation" presents a hierarchical model that decouples semantic planning from embodiment grounding, enabling robots to navigate diverse environments while adhering to their physical constraints. Real-world experiments show improved success rates compared to existing methods.
  • Robotics Simulation: "GSWorld: Closed-Loop Photo-Realistic Simulation Suite for Robotic Manipulation" introduces a photo-realistic simulator for robotics manipulation that combines 3D Gaussian Splatting with physics engines. The framework enables learning sim2real manipulation policies and reproducible benchmarking of real-robot policies in simulation.
  • Visual Reasoning: "Small Drafts, Big Verdict: Information-Intensive Visual Reasoning via Speculation" proposes a training-free framework called Speculative Verdict (SV) for improving visual reasoning in large vision-language models (VLMs). SV combines multiple lightweight draft experts with a large verdict model to achieve both error correction and cost-efficiency.
  • LLM-Generated Text Detection: "On the Detectability of LLM-Generated Text: What Exactly Is LLM-Generated Text?" discusses the challenges in detecting text generated by large language models (LLMs) due to the lack of a consistent definition and the blurring lines between LLM-generated and human-written text.
  • AI and Robotics Research Analysis: "Real Deep Research for AI, Robotics and Beyond" introduces a generalizable pipeline for systematically analyzing research areas, identifying emerging trends, and uncovering cross-domain opportunities.
  • Sim-to-Real Transfer in Robotics: "The Reality Gap in Robotics: Challenges, Solutions, and Best Practices" provides a comprehensive overview of the sim-to-real landscape, highlighting the causes, solutions, and evaluation metrics for the reality gap and sim-to-real transfer.
  • Efficient LLM Adaptation: "Compress to Impress: Efficient LLM Adaptation Using a Single Gradient Step on 100 Samples" presents a fast and robust adaptation algorithm for downstream tasks, achieving adaptation with a single gradient step on 100 examples and a quick scan of the top candidate layers and factorization techniques.
  • Context Compression for LLMs: "Simple Context Compression: Mean-Pooling and Multi-Ratio Training" develops a lightweight and simple mean-pooling approach for context compression in retrieval-augmented generation (RAG) with large language models (LLMs).
  • Cosmological Parameter Estimation: "Bayesian Inference of Primordial Magnetic Field Parameters from CMB with Spherical Graph Neural Networks" implements a novel Bayesian graph deep learning framework for estimating key cosmological parameters in a primordial magnetic field (PMF) cosmology directly from simulated Cosmic Microwave Background (CMB) maps.

Model Releases

Several new models have been released, showcasing advancements in various AI domains:

  • inclusionAI/LLaDA2.0-flash-preview: This diffusion language model features a 100B parameter Mixture-of-Experts (MoE) architecture. With only 6.1B parameters active during inference, it achieves efficient computation while excelling in code generation, mathematical reasoning, and tool use. It achieves good scores in benchmarks.
  • Minthy/Rouwei-T5Gemma-adapter_v0.2: This adapter is designed for use with T5Gemma-2b as a text encoder for SDXL models, aiming to improve prompt adherence and understanding, particularly in anime-related content generation.
  • rafiaa/terraform-cloud-codellama-7b: This LoRA fine-tuned model is designed for generating Terraform infrastructure-as-code, supporting multiple cloud providers like AWS, Azure, and GCP. It is trained on public documentation and optimized for real-world multi-cloud infrastructure development.
  • Vortex5/Crimson-Twilight-12B: A multistage merge designed for narrative roleplay.

Key Takeaways

  • Multimodal AI is advancing: The LDDBM paper highlights progress in translating information across

AI Papers for 2026-03-05

How to Peel with a Knife: Aligning Fine-Grained Manipulation with Human Preference

Many essential manipulation tasks - such as food preparation, surgery, and craftsmanship - remain intractable for autonomous robots. These tasks are characterized not only by contact-rich, force-sensitive dynamics, but also by their "implicit" success criteria: unlike pick-and-place, task quality in these domains is continuous and subjective (e.g. how well a potato is peeled), making quantitative evaluation and reward engineering difficult. We present a learning framework for such tasks, using peeling with a knife as a representative example. Our approach follows a two-stage pipeline: first, we learn a robust initial policy via force-aware data collection and imitation learning, enabling generalization across object variations; second, we refine the policy through preference-based finetuning using a learned reward model that combines quantitative task metrics with qualitative human feedback, aligning policy behavior with human notions of task quality. Using only 50-200 peeling trajectories, our system achieves over 90% average success rates on challenging produce including cucumbers, apples, and potatoes, with performance improving by up to 40% through preference-based finetuning. Remarkably, policies trained on a single produce category exhibit strong zero-shot generalization to unseen in-category instances and to out-of-distribution produce from different categories while maintaining over 90% success rates.

Tether: Autonomous Functional Play with Correspondence-Driven Trajectory Warping

The ability to conduct and learn from interaction and experience is a central challenge in robotics, offering a scalable alternative to labor-intensive human demonstrations. However, realizing such "play" requires (1) a policy robust to diverse, potentially out-of-distribution environment states, and (2) a procedure that continuously produces useful robot experience. To address these challenges, we introduce Tether, a method for autonomous functional play involving structured, task-directed interactions. First, we design a novel open-loop policy that warps actions from a small set of source demonstrations (<=10) by anchoring them to semantic keypoint correspondences in the target scene. We show that this design is extremely data-efficient and robust even under significant spatial and semantic variations. Second, we deploy this policy for autonomous functional play in the real world via a continuous cycle of task selection, execution, evaluation, and improvement, guided by the visual understanding capabilities of vision-language models. This procedure generates diverse, high-quality datasets with minimal human intervention. In a household-like multi-object setup, our method is the first to perform many hours of autonomous multi-task play in the real world starting from only a handful of demonstrations. This produces a stream of data that consistently improves the performance of closed-loop imitation policies over time, ultimately yielding over 1000 expert-level trajectories and training policies competitive with those learned from human-collected demonstrations.

Inherited Goal Drift: Contextual Pressure Can Undermine Agentic Goals

The accelerating adoption of language models (LMs) as agents for deployment in long-context tasks motivates a thorough understanding of goal drift: agents' tendency to deviate from an original objective. While prior-generation language model agents have been shown to be susceptible to drift, the extent to which drift affects more recent models remains unclear. In this work, we provide an updated characterization of the extent and causes of goal drift. We investigate drift in state-of-the-art models within a simulated stock-trading environment (Arike et al., 2025). These models are largely shown to be robust even when subjected to adversarial pressure. We show, however, that this robustness is brittle: across multiple settings, the same models often inherit drift when conditioned on prefilled trajectories from weaker agents. The extent of conditioning-induced drift varies significantly by model family, with only GPT-5.1 maintaining consistent resilience among tested models. We find that drift behavior is inconsistent between prompt variations and correlates poorly with instruction hierarchy following behavior, with strong hierarchy following failing to reliably predict resistance to drift. Finally, we run analogous experiments in a new emergency room triage environment to show preliminary evidence for the transferability of our results across qualitatively different settings. Our findings underscore the continued vulnerability of modern LM agents to contextual pressures and the need for refined post-training techniques to mitigate this.

Valet: A Standardized Testbed of Traditional Imperfect-Information Card Games

AI algorithms for imperfect-information games are typically compared using performance metrics on individual games, making it difficult to assess robustness across game choices. Card games are a natural domain for imperfect information due to hidden hands and stochastic draws. To facilitate comparative research on imperfect-information game-playing algorithms and game systems, we introduce Valet, a diverse and comprehensive testbed of 21 traditional imperfect-information card games. These games span multiple genres, cultures, player counts, deck structures, mechanics, winning conditions, and methods of hiding and revealing information. To standardize implementations across systems, we encode the rules of each game in RECYCLE, a card game description language. We empirically characterize each game's branching factor and duration using random simulations, reporting baseline score distributions for a Monte Carlo Tree Search player against random opponents to demonstrate the suitability of Valet as a benchmarking suite.

Density-Guided Response Optimization: Community-Grounded Alignment via Implicit Acceptance Signals

Language models deployed in online communities must adapt to norms that vary across social, cultural, and domain-specific contexts. Prior alignment approaches rely on explicit preference supervision or predefined principles, which are effective for well-resourced settings but exclude most online communities -- particularly those without institutional backing, annotation infrastructure, or organized around sensitive topics -- where preference elicitation is costly, ethically fraught, or culturally misaligned. We observe that communities already express preferences implicitly through what content they accept, engage with, and allow to persist. We show that this acceptance behavior induces measurable geometric structure in representation space: accepted responses occupy coherent, high-density regions that reflect community-specific norms, while rejected content falls in sparser or misaligned areas. We operationalize this structure as an implicit preference signal for alignment and introduce density-guided response optimization (DGRO), a method that aligns language models to community norms without requiring explicit preference labels. Using labeled preference data, we demonstrate that local density recovers pairwise community judgments, indicating that geometric structure encodes meaningful preference signal. We then apply DGRO in annotation-scarce settings across diverse communities spanning platform, topic, and language. DGRO-aligned models consistently produce responses preferred by human annotators, domain experts, and model-based judges over supervised and prompt-based baselines. We position DGRO as a practical alignment alternative for communities where explicit preference supervision is unavailable or misaligned with situated practices, and discuss the implications and risks of learning from emergent acceptance behavior.

UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?

Unified multimodal models have recently demonstrated strong generative capabilities, yet whether and when generation improves understanding remains unclear. Existing benchmarks lack a systematic exploration of the specific tasks where generation facilitates understanding. To this end, we introduce UniG2U-Bench, a comprehensive benchmark categorizing generation-to-understanding (G2U) evaluation into 7 regimes and 30 subtasks, requiring varying degrees of implicit or explicit visual transformations. Extensive evaluation of over 30 models reveals three core findings: 1) Unified models generally underperform their base Vision-Language Models (VLMs), and Generate-then-Answer (GtA) inference typically degrades performance relative to direct inference. 2) Consistent enhancements emerge in spatial intelligence, visual illusions, or multi-round reasoning subtasks, where enhanced spatial and shape perception, as well as multi-step intermediate image states, prove beneficial. 3) Tasks with similar reasoning structures and models sharing architectures exhibit correlated behaviors, suggesting that generation-understanding coupling induces class-consistent inductive biases over tasks, pretraining data, and model architectures. These findings highlight the necessity for more diverse training data and novel paradigms to fully unlock the potential of unified multimodal modeling.

AI-for-Science Low-code Platform with Bayesian Adversarial Multi-Agent Framework

Large Language Models (LLMs) demonstrate potentials for automating scientific code generation but face challenges in reliability, error propagation in multi-agent workflows, and evaluation in domains with ill-defined success metrics. We present a Bayesian adversarial multi-agent framework specifically designed for AI for Science (AI4S) tasks in the form of a Low-code Platform (LCP). Three LLM-based agents are coordinated under the Bayesian framework: a Task Manager that structures user inputs into actionable plans and adaptive test cases, a Code Generator that produces candidate solutions, and an Evaluator providing comprehensive feedback. The framework employs an adversarial loop where the Task Manager iteratively refines test cases to challenge the Code Generator, while prompt distributions are dynamically updated using Bayesian principles by integrating code quality metrics: functional correctness, structural alignment, and static analysis. This co-optimization of tests and code reduces dependence on LLM reliability and addresses evaluation uncertainty inherent to scientific tasks. LCP also streamlines human-AI collaboration by translating non-expert prompts into domain-specific requirements, bypassing the need for manual prompt engineering by practitioners without coding backgrounds. Benchmark evaluations demonstrate LCP's effectiveness in generating robust code while minimizing error propagation. The proposed platform is also tested on an Earth Science cross-disciplinary task and demonstrates strong reliability, outperforming competing models.

SynthCharge: An Electric Vehicle Routing Instance Generator with Feasibility Screening to Enable Learning-Based Optimization and Benchmarking

The electric vehicle routing problem with time windows (EVRPTW) extends the classical VRPTW by introducing battery capacity constraints and charging station decisions. Existing benchmark datasets are often static and lack verifiable feasibility, which restricts reproducible evaluation of learning-based routing models. We introduce SynthCharge, a parametric generator that produces diverse, feasibility-screened EVRPTW instances across varying spatiotemporal configurations and scalable customer counts. While SynthCharge can currently generate large-scale instances of up to 500 customers, we focus our experiments on sizes ranging from 5 to 100 customers. Unlike static benchmark suites, SynthCharge integrates instance geometry with adaptive energy capacity scaling and range-aware charging station placement. To guarantee structural validity, the generator systematically filters out unsolvable instances through a fast feasibility screening process. Ultimately, SynthCharge provides the dynamic benchmarking infrastructure needed to systematically evaluate the robustness of emerging neural routing and data-driven approaches.

Stabilized Adaptive Loss and Residual-Based Collocation for Physics-Informed Neural Networks

Physics-Informed Neural Networks (PINNs) have been recognized as a mesh-free alternative to solve partial differential equations where physics information is incorporated. However, in dealing with problems characterized by high stiffness or shock-dominated dynamics, traditional PINNs have been found to have limitations, including unbalanced training and inaccuracy in solution, even with small physics residuals. In this research, we seek to address these limitations using the viscous Burgers' equation with low viscosity and the Allen-Cahn equation as test problems. In addressing unbalanced training, we have developed a new adaptive loss balancing scheme using smoothed gradient norms to ensure satisfaction of initial and boundary conditions. Further, to address inaccuracy in the solution, we have developed an adaptive residual-based collocation scheme to improve the accuracy of solutions in the regions with high physics residuals. The proposed new approach significantly improves solution accuracy with consistent satisfaction of physics residuals. For instance, in the case of Burgers' equation, the relative L2 error is reduced by about 44 percent compared to traditional PINNs, while for the Allen-Cahn equation, the relative L2 error is reduced by approximately 70 percent. Additionally, we show the trustworthy solution comparison of the proposed method using a robust finite difference solver.

NeuroSkill(tm): Proactive Real-Time Agentic System Capable of Modeling Human State of Mind

Real-time proactive agentic system, capable of modeling Human State of Mind, using foundation EXG model and text embeddings model, running fully offline on the edge. Unlike all previously known systems, the NeuroSkill(tm) system leverages SKILL.md description of Human's State of Mind via API and CLI provided by the system, directly from the Brain-Computer Interface (BCI) devices, which records Human biophysical and brain signals. Our custom harness - NeuroLoop(tm) - utilizes all of the above to run agentic flow that manages to engage with the Human on multiple cognitive and affective levels of their State of Mind (e.g., empathy), by providing actionable tool calls and protocol execution with explicit or implicit requests from the Human. GPLv3 open-source software with ethically aligned AI100 licensing for the skill markdown.

AI Models

Holy-fox/Qwen3.5-0.8B-JP


license: apache-2.0 language:

  • ja datasets:
  • DataPilot/Zero_SFT_Ja_v3.5 base_model:
  • Qwen/Qwen3.5-0.8B pipeline_tag: text-generation library_name: transformers

Holy-fox/Qwen3.5-0.8B-JP

Qwen/Qwen3.5-0.8B を日本語インストラクションデータでファインチューニングしたモデルです。

概要

本モデルは、Qwen3.5-0.8B をベースに、DataPilot/Zero_SFT_Ja_v3.5(108k件)を用いて日本語 SFT を施したものです。Qwen3 系が持つ thinking モードは使用せず、Non-thinking(直接応答)モードのみで動作します。

| 項目 | 詳細 | |---|---| | ベースモデル | Qwen/Qwen3.5-0.8B | | パラメータ数 | 0.8B | | 学習データ | DataPilot/Zero_SFT_Ja_v3.5(108k件) | | 学習フレームワーク | Unsloth | | 学習ハードウェア | NVIDIA RTX 5090 | | 対応言語 | 日本語(主)| | ライセンス | Apache 2.0 |

クイックスタート

1. ライブラリのインストール

pip install "transformers[serving] @ git+https://github.com/huggingface/transformers.git@main"

2. 推論サーバーの起動

transformers serve --force-model Holy-fox/Qwen3.5-0.8B-JP --port 8000

3. クライアントのセットアップ

pip install -U openai

export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="EMPTY"

4. 推論の実行

from openai import OpenAI

client = OpenAI()  # 環境変数から設定を読み込み

messages = [
    {"role": "user", "content": "まどマギで一番可愛いキャラクターは誰?"},
]

response = client.chat.completions.create(
    model="Holy-fox/Qwen3.5-0.8B-JP",
    messages=messages,
    max_tokens=32768,
    temperature=1.0,
    top_p=1.0,
    presence_penalty=2.0,
    extra_body={
        "top_k": 20,
    },
)

print(response.choices[0].message.content)

推奨パラメータについて presence_penalty=2.0 は繰り返し抑制のために設定しています。小規模モデルでは同じフレーズのループが発生しやすいため、この値を下げる場合は出力を注意深く確認してください。

学習

  • データ: DataPilot/Zero_SFT_Ja_v3.5 の全件(108k)を使用。同データセットは Qwen3-235B-A22B によって生成された日本語インストラクションデータです。
  • フレームワーク: Unsloth
  • ハードウェア: NVIDIA RTX 5090

注意事項・制限

  • 本モデルは Non-thinking モード専用です。/think トークンや thinking モードを有効にした推論は想定していません。
  • 0.8B という小規模モデルの性質上、複雑な推論や長文の一貫性には限界があります。
  • 学習データが日本語中心のため、英語などその他言語でのパフォーマンスは保証しません。

謝辞

  • ベースモデルを提供してくださった Qwen チーム
  • 学習フレームワーク Unsloth の開発者の方々

Author: Holy-fox

Likes: 12

Downloads: 0

Tags: transformers, safetensors, qwen3_5, image-text-to-text, text-generation, conversational, ja, dataset:DataPilot/Zero_SFT_Ja_v3.5, base_model:Qwen/Qwen3.5-0.8B, base_model:finetune:Qwen/Qwen3.5-0.8B, license:apache-2.0, endpoints_compatible, region:us

FireRedTeam/FireRed-Image-Edit-1.1-ComfyUI


license: apache-2.0 language:

  • en
  • zh

🔥 ComfyUI适配 FireRed-Image-Edit-1.1

FireRed-Image-Edit-1.1 is a general-purpose image editing model that delivers high-fidelity and consistent editing across a wide range of scenarios.

We've provided a ComfyUI workflow to get you started: ⬇️ Download Workflow

多图workflow Screenshot 2026-02-27 sample_output_0 单图workflow Screenshot 2026-02-27 sample_output_1

✨ Key Features

  • Strong Editing Performance: FireRed-Image-Edit delivers leading open-source results with accurate instruction following, high image quality, and consistent visual coherence.
  • Native Editing Capability: Built directly from text-to-image foundation model and endowed with editing capabilities.
  • Text Style Preservation: Maintains text styles with high fidelity, achieving performance comparable to closed-source solutions.
  • Photo Restoration: High-quality old photo restoration and enhancement.
  • Multi-Image Editing: Flexible editing of multiple images such as virtual try-on.

Author: FireRedTeam

Likes: 12

Downloads: 0

Tags: gguf, en, zh, license:apache-2.0, region:us

EganAI/qwen3.5-9b-terminal-merge


license: apache-2.0 base_model:

  • Qwen/Qwen3.5-9B
  • unsloth/Qwen3.5-9B
  • darkc0de/Qwen3.5-9B-heretic
  • lukey03/Qwen3.5-9B-abliterated
  • llmfan46/Qwen3.5-9B-ultimate-irrefusable-heretic
  • llmfan46/Qwen3.5-9B-ultra-heretic
  • jwest33/qwen3.5-9b-null-space-abliterated
  • osirisbrain/OsirisCortex-v6
  • DavidAU/Qwen3.5-9B-Claude-4.6-HighIQ-THINKING
  • DavidAU/Qwen3.5-9B-Claude-4.6-HighIQ-INSTRUCT
  • crownelius/Crow-9B-Opus-4.6-Distill-Heretic_Qwen3.5
  • alecccdd/Qwen3.5-9B-paraphrasing-orpo
  • lugman-madhiai/Qwen3.5-9B-MHS-Interleaved
  • Hastagaras/Qwen3.5-9B-GLM-Wannabe
  • zenlm/zen4
  • trohrbaugh/Qwen3.5-9B-heretic-v2 tags:
  • merge
  • qwen3.5
  • terminal
  • cli
  • layer-wise-merge model_type: qwen3_5 language:
  • en pipeline_tag: text-generation library_name: transformers

Qwen3.5-9B Terminal Merge

A layer-wise optimized merge of 16 Qwen3.5-9B variants, tuned for strong terminal/CLI command generation performance.

Performance

| Model | Terminal Task Suite | Tasks Passed | |-------|-------------------|--------------| | Qwen3.5-9B (base) | 21.7% | 13/60 | | This model | 38.3% | 23/60 | | Improvement | +77% | +10 tasks |

Evaluated on a custom suite of 60 terminal tasks executed in sandboxed Docker containers. Tasks cover file operations, text processing, git workflows, networking, Python scripting, and system administration. Each task requires the model to produce working shell commands that are executed and verified against expected output.

Model Details

  • Architecture: Qwen3.5 (hybrid linear + full attention)
  • Parameters: 9B total
  • Context Length: 262,144 tokens
  • Precision: bfloat16
  • Layers: 32 (8 full attention + 24 linear attention)
  • Merge Method: Layer-wise linear merge with optimized per-layer weights

Source Models

This model combines optimized layer-wise weights from 16 Qwen3.5-9B variants spanning reasoning, instruction-following, and general capability specializations:

| Category | Models | |----------|--------| | Core | Qwen/Qwen3.5-9B, unsloth/Qwen3.5-9B | | Abliterated | darkc0de/Qwen3.5-9B-heretic, lukey03/Qwen3.5-9B-abliterated, llmfan46/Qwen3.5-9B-ultimate-irrefusable-heretic, llmfan46/Qwen3.5-9B-ultra-heretic, jwest33/qwen3.5-9b-null-space-abliterated, trohrbaugh/Qwen3.5-9B-heretic-v2, osirisbrain/OsirisCortex-v6 | | Reasoning | DavidAU/Qwen3.5-9B-Claude-4.6-HighIQ-THINKING, DavidAU/Qwen3.5-9B-Claude-4.6-HighIQ-INSTRUCT, crownelius/Crow-9B-Opus-4.6-Distill-Heretic_Qwen3.5 | | Specialized | alecccdd/Qwen3.5-9B-paraphrasing-orpo, lugman-madhiai/Qwen3.5-9B-MHS-Interleaved, Hastagaras/Qwen3.5-9B-GLM-Wannabe, zenlm/zen4 |

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "EganAI/qwen3.5-9b-terminal-merge"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="bfloat16",
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Find all Python files larger than 1MB and sort by size descending"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

vLLM Serving

vllm serve EganAI/qwen3.5-9b-terminal-merge \
    --language-model-only \
    --dtype bfloat16 \
    --max-model-len 8192

Note: Use --language-model-only since this is a multimodal architecture served for text-only inference.

Training Details

The per-layer merge weights were optimized by evaluating candidates on a suite of 60 terminal tasks using vLLM inference in sandboxed Docker environments. The optimization searched across layer-group weight distributions to find the best blend of all 16 source models.

Limitations

  • Optimized specifically for terminal/CLI tasks; general-purpose performance may vary
  • Requires --language-model-only flag when serving with vLLM due to multimodal architecture
  • Visual capabilities are inherited from the base model but were not part of the optimization target

Author: EganAI

Likes: 5

Downloads: 0

Tags: transformers, safetensors, gguf, qwen3_5, image-text-to-text, merge, qwen3.5, terminal, cli, layer-wise-merge, text-generation, conversational, en, base_model:DavidAU/Qwen3.5-9B-Claude-4.6-HighIQ-INSTRUCT, base_model:merge:DavidAU/Qwen3.5-9B-Claude-4.6-HighIQ-INSTRUCT, base_model:DavidAU/Qwen3.5-9B-Claude-4.6-HighIQ-THINKING, base_model:merge:DavidAU/Qwen3.5-9B-Claude-4.6-HighIQ-THINKING, base_model:Hastagaras/Qwen3.5-9B-GLM-Wannabe, base_model:merge:Hastagaras/Qwen3.5-9B-GLM-Wannabe, base_model:Qwen/Qwen3.5-9B, base_model:merge:Qwen/Qwen3.5-9B, base_model:alecccdd/Qwen3.5-9B-paraphrasing-orpo, base_model:merge:alecccdd/Qwen3.5-9B-paraphrasing-orpo, base_model:crownelius/Crow-9B-Opus-4.6-Distill-Heretic_Qwen3.5, base_model:merge:crownelius/Crow-9B-Opus-4.6-Distill-Heretic_Qwen3.5, base_model:darkc0de/Qwen3.5-9B-heretic, base_model:merge:darkc0de/Qwen3.5-9B-heretic, base_model:jwest33/qwen3.5-9b-null-space-abliterated, base_model:merge:jwest33/qwen3.5-9b-null-space-abliterated, base_model:llmfan46/Qwen3.5-9B-ultimate-irrefusable-heretic, base_model:merge:llmfan46/Qwen3.5-9B-ultimate-irrefusable-heretic, base_model:llmfan46/Qwen3.5-9B-ultra-heretic, base_model:merge:llmfan46/Qwen3.5-9B-ultra-heretic, base_model:lugman-madhiai/Qwen3.5-9B-MHS-Interleaved, base_model:merge:lugman-madhiai/Qwen3.5-9B-MHS-Interleaved, base_model:lukey03/Qwen3.5-9B-abliterated, base_model:merge:lukey03/Qwen3.5-9B-abliterated, base_model:osirisbrain/OsirisCortex-v6, base_model:merge:osirisbrain/OsirisCortex-v6, base_model:trohrbaugh/Qwen3.5-9B-heretic-v2, base_model:merge:trohrbaugh/Qwen3.5-9B-heretic-v2, base_model:unsloth/Qwen3.5-9B, base_model:merge:unsloth/Qwen3.5-9B, base_model:zenlm/zen4, base_model:merge:zenlm/zen4, license:apache-2.0, endpoints_compatible, region:us

avalon2244/Qwen3.5-4B-Claude-Opus-4.6-Distilled-GGUF


tags:

  • gguf
  • llama.cpp
  • unsloth
  • vision-language-model datasets:
  • nohurry/Opus-4.6-Reasoning-3000x-filtered base_model:
  • unsloth/Qwen3.5-4B
  • Qwen/Qwen3.5-4B

Qwen3.5-4B-Claude-Opus-4.6-Distilled-GGUF

This terribly named model was a quick finetune of Qwen3.5-4B on the nohurry/Opus-4.6-Reasoning-3000x-filtered dataset. It tends to have cleaner reasoning traces than the original Qwen3.5-4B, and is around as accurate. I haven't tested it, though. This model was finetuned and converted to GGUF format using Unsloth.

Example usage:

  • For text only LLMs: llama-cli -hf avalon2244/Qwen3.5-4B-Claude-Opus-4.6-Distilled-GGUF --jinja
  • For multimodal models: llama-mtmd-cli -hf avalon2244/Qwen3.5-4B-Claude-Opus-4.6-Distilled-GGUF --jinja

Available Model files:

Author: avalon2244

Likes: 5

Downloads: 0

Tags: gguf, qwen3_5, llama.cpp, unsloth, vision-language-model, dataset:nohurry/Opus-4.6-Reasoning-3000x-filtered, base_model:Qwen/Qwen3.5-4B, base_model:quantized:Qwen/Qwen3.5-4B, endpoints_compatible, region:us, conversational

llmfan46/Q3.5-BlueStar-27B-ultra-heretic


license: mit datasets:

  • zerofata/Instruct-Anime
  • zerofata/Gemini-3.1-Pro-SmallWiki
  • zerofata/Gemini-3.1-Pro-GLM5-Characters
  • zerofata/Roleplay-Anime-Characters base_model:
  • zerofata/Q3.5-BlueStar-27B tags:
  • heretic
  • uncensored
  • decensored
  • abliterated

This is a decensored version of zerofata/Q3.5-BlueStar-27B, made using Heretic v1.2.0 with Magnitude-Preserving Orthogonal Ablation (MPOA) and Self-Organizing Map Abliteration (SOMA)

Abliteration parameters

| Parameter | Value | | :-------- | :---: | | direction_index | per layer | | attn.out_proj.max_weights.0 | 0: 1.03 | | attn.out_proj.max_weights.1 | 1: 1.43 | | attn.out_proj.max_weights.2 | 2: 0.86 | | attn.out_proj.max_weights.3 | 3: 0.90 | | attn.out_proj.max_weight_position | 43.21 | | attn.out_proj.min_weights.0 | 0: 0.75 | | attn.out_proj.min_weights.1 | 1: 0.06 | | attn.out_proj.min_weights.2 | 2: 0.68 | | attn.out_proj.min_weights.3 | 3: 0.79 | | attn.out_proj.min_weight_distance | 22.95 | | mlp.down_proj.max_weights.0 | 0: 1.31 | | mlp.down_proj.max_weights.1 | 1: 1.25 | | mlp.down_proj.max_weights.2 | 2: 1.22 | | mlp.down_proj.max_weights.3 | 3: 1.15 | | mlp.down_proj.max_weight_position | 50.37 | | mlp.down_proj.min_weights.0 | 0: 0.19 | | mlp.down_proj.min_weights.1 | 1: 1.12 | | mlp.down_proj.min_weights.2 | 2: 1.16 | | mlp.down_proj.min_weights.3 | 3: 0.44 | | mlp.down_proj.min_weight_distance | 4.29 | | attn.o_proj.max_weights.0 | 0: 1.24 | | attn.o_proj.max_weights.1 | 1: 1.32 | | attn.o_proj.max_weights.2 | 2: 1.01 | | attn.o_proj.max_weights.3 | 3: 1.18 | | attn.o_proj.max_weight_position | 60.83 | | attn.o_proj.min_weights.0 | 0: 0.94 | | attn.o_proj.min_weights.1 | 1: 0.11 | | attn.o_proj.min_weights.2 | 2: 0.20 | | attn.o_proj.min_weights.3 | 3: 1.01 | | attn.o_proj.min_weight_distance | 32.18 |

Performance

| Metric | This model | Original model (Q3.5-BlueStar-27B) | | :----- | :--------: | :---------------------------: | | KL divergence | 0.0288 | 0 (by definition) | | Refusals | 4/100 | 98/100 |


<style> .pf { --bg: #0d0b11; --panel: #131019; --edge: #1e1a28; --rule: #272032; --text: #9a90aa; --dim: #4e4460; --bright: #ece4f8; --accent: #5590ee; --acc-bg: rgba(85,144,238,0.07); --mono: 'JetBrains Mono', monospace; --sans: 'Inter', sans-serif; font-family: var(--sans); background: repeating-linear-gradient( 0deg, transparent 0px, transparent 3px, rgba(0,0,0,0.09) 3px, rgba(0,0,0,0.09) 4px ), var(--bg); color: var(--text); max-width: 860px; margin: 0 auto; padding: 0 0 64px; line-height: 1.7; font-size: 1rem; } /* ── Hero ── */ .pf-hero { position: relative; overflow: hidden; } .pf-hero img { display: block; width: 100%; } .pf-ident { position: absolute; bottom: 0; left: 0; right: 0; padding: 100px 48px 38px; background: linear-gradient(to top, rgba(13,11,17,1) 0%, rgba(13,11,17,0.92) 40%, transparent 100%); } .pf-name { font-family: var(--sans); font-size: 3.4rem; font-weight: 900; color: var(--bright); letter-spacing: -0.04em; line-height: 1; margin: 0 0 12px; } .pf-base { font-family: var(--mono); font-size: 0.68rem; color: var(--accent); letter-spacing: 0.06em; display: block; } /* ── Sections ── */ .pf-section { border-bottom: 1px solid var(--rule); } .pf-section:last-child { border-bottom: none; } .pf-shead { display: flex; align-items: baseline; gap: 14px; border-top: 2px solid var(--accent); padding: 16px 48px 0; margin-bottom: 28px; } .pf-snum { font-family: var(--mono); font-size: 0.72rem; font-weight: 700; color: var(--accent); letter-spacing: 0.1em; flex-shrink: 0; } .pf-snum::before { content: '// '; opacity: 0.45; } .pf-stitle { font-size: 1.05rem; font-weight: 700; letter-spacing: 0.08em; text-transform: uppercase; color: var(--bright); } .pf-sbody { padding: 0 48px 44px; } .pf-sbody p { margin: 0 0 14px; font-size: 0.95rem; } .pf-sbody p:last-child { margin-bottom: 0; } /* ── Settings: vertical blocks ── */ .s-stack { display: flex; flex-direction: column; gap: 16px; } .s-block { border: 1px solid var(--edge); position: relative; } .s-block::before { content: ''; position: absolute; top: -1px; right: -1px; width: 10px; height: 10px; border-top: 1px solid var(--accent); border-right: 1px solid var(--accent); opacity: 0.6; } .s-block::after { content: ''; position: absolute; bottom: -1px; left: -1px; width: 10px; height: 10px; border-bottom: 1px solid var(--accent); border-left: 1px solid var(--accent); opacity: 0.6; } .s-block-head { font-family: var(--mono); font-size: 0.68rem; font-weight: 700; letter-spacing: 0.14em; text-transform: uppercase; color: var(--dim); padding: 10px 16px; background: var(--panel); border-bottom: 1px solid var(--edge); } .s-row { display: grid; grid-template-columns: 10ch 1fr; align-items: baseline; column-gap: 4px; padding: 9px 16px; border-bottom: 1px solid var(--edge); font-size: 0.9rem; } .s-row:last-child { border-bottom: none; } .s-k { font-family: var(--mono); font-size: 0.9rem; color: var(--dim); } .s-k::after { content: ':'; } .s-v { color: var(--bright); font-size: 0.9rem; } .s-row .s-v:only-child { grid-column: 1 / -1; } /* ── Quantizations ── */ .q-row { display: flex; gap: 12px; flex-wrap: wrap; } .q-panel { background: var(--panel); border: 1px solid var(--edge); display: flex; align-items: center; gap: 16px; padding: 12px 18px; position: relative; } .q-panel::before { content: ''; position: absolute; top: -1px; right: -1px; width: 10px; height: 10px; border-top: 1px solid var(--accent); border-right: 1px solid var(--accent); opacity: 0.6; } .q-panel::after { content: ''; position: absolute; bottom: -1px; left: -1px; width: 10px; height: 10px; border-bottom: 1px solid var(--accent); border-left: 1px solid var(--accent); opacity: 0.6; } .q-type { font-family: var(--mono); font-size: 0.58rem; font-weight: 700; letter-spacing: 0.18em; text-transform: uppercase; color: var(--accent); flex-shrink: 0; } .q-sep { width: 1px; height: 16px; background: var(--rule); flex-shrink: 0; } .q-panel a { color: var(--bright); text-decoration: none; font-size: 0.9rem; border-bottom: 1px solid var(--rule); } .q-panel a:hover { color: var(--accent); border-bottom-color: var(--accent); } /* ── Links ── */ .pf a { color: var(--bright); text-decoration: none; border-bottom: 1px solid var(--rule); } .pf a:hover { color: var(--accent); border-bottom-color: var(--accent); } /* ── Dropdown ── */ .pf details { border: 1px solid var(--edge); margin-top: 24px; position: relative; } .pf details::before { content: ''; position: absolute; top: -1px; right: -1px; width: 10px; height: 10px; border-top: 1px solid var(--accent); border-right: 1px solid var(--accent); opacity: 0.6; } .pf details::after { content: ''; position: absolute; bottom: -1px; left: -1px; width: 10px; height: 10px; border-bottom: 1px solid var(--accent); border-left: 1px solid var(--accent); opacity: 0.6; } .pf summary { list-style: none; padding: 11px 16px; cursor: pointer; background: var(--panel); font-family: var(--mono); font-size: 0.72rem; font-weight: 700; letter-spacing: 0.12em; text-transform: uppercase; color: var(--dim); user-select: none; display: flex; align-items: center; gap: 10px; } .pf summary::-webkit-details-marker { display: none; } .pf summary::before { content: '+'; color: var(--accent); font-size: 1rem; line-height: 1; flex-shrink: 0; } .pf details[open] summary::before { content: '−'; } .pf summary:hover { color: var(--bright); } .detail-body { padding: 22px 18px; border-top: 1px solid var(--edge); } .detail-body p { margin: 0 0 16px; font-size: 0.9rem; } .cfg-title { font-family: var(--mono); font-size: 0.72rem; font-weight: 700; letter-spacing: 0.1em; text-transform: uppercase; color: var(--dim); margin: 0 0 8px; } /* ── Code ── */ .pf pre { background: #0a0810; border: 1px solid var(--edge); padding: 16px 18px; overflow-x: auto; font-family: var(--mono); font-size: 0.76rem; line-height: 1.6; color: var(--text); margin: 0 0 22px; } .pf pre:last-child { margin-bottom: 0; } .pf pre code { background: none; color: inherit; padding: 0; } .pf code { font-family: var(--mono); font-size: 0.875em; color: var(--accent); background: var(--acc-bg); padding: 2px 5px; } </style> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>BlueStar</title> <link rel="preconnect" href="https://fonts.googleapis.com"> <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin> <link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;600;700;900&family=JetBrains+Mono:wght@400;700&display=swap" rel="stylesheet"> </head> <body> <div class="pf"> <div class="pf-hero"> <img src="https://cdn-uploads.huggingface.co/production/uploads/65b19c6c638328850e12d38c/Xs4K-Cu_vblr8Yt5S1XqO.png" alt="image"> <div class="pf-ident"> <h1 class="pf-name">BlueStar v1</h1> <span class="pf-base">Qwen3.5 27B</span> </div> </div> <div class="pf-section"> <div class="pf-shead"> <span class="pf-snum">01</span> <span class="pf-stitle">Overview</span> </div> <div class="pf-sbody"> <p>An experimental tune on Qwen 3.5 27B.</p> <p>Designed for conversational assistant tasks and RP.</p> <p>Model feels pretty creative and has some nice moments. Couple brainfarts and bits of repetition occasionally, but nothing out of the normal. (The qwen team themselves recommend a presence penalty of 1.5. Yikes)</p> <p>Non thinking and thinking are both supported. Thinking has reduced censorship as the original thinking refusals didn't seem to generalize well to the new format I gave it.</p> </div> </div> <div class="pf-section"> <div class="pf-shead"> <span class="pf-snum">02</span> <span class="pf-stitle">SillyTavern Settings</span> </div> <div class="pf-sbody"> <div class="s-stack"> <div class="s-block"> <div class="s-block-head">Recommended Roleplay Format</div> <div class="s-row"><span class="s-k">Actions</span><span class="s-v">In plaintext</span></div> <div class="s-row"><span class="s-k">Dialogue</span><span class="s-v">"In quotes"</span></div> <div class="s-row"><span class="s-k">Thoughts</span><span class="s-v">*In asterisks*</span></div> </div> <div class="s-block"> <div class="s-block-head">Recommended Samplers</div> <div class="s-row"><span class="s-k">Temp</span><span class="s-v">0.8</span></div> <div class="s-row"><span class="s-k">MinP</span><span class="s-v">0.05 - 0.075</span></div> <div class="s-row"><span class="s-k">Rep Pen</span><span class="s-v">1.00 - 1.1</span></div> </div> <div class="s-block"> <div class="s-block-head">Instruct</div> <div class="s-row"><span class="s-v"><a href="https://huggingface.co/zerofata/Q3.5-BlueStar-27B/raw/main/ChatML-Q3.5-Think.json">ChatML - Think</a></span></div> <div class="s-row"><span class="s-v"><a href="https://huggingface.co/zerofata/Q3.5-BlueStar-27B/raw/main/ChatML-Q3.5-NoThink.json">ChatML - NoThink</a></span></div> </div> </div> </div> </div> <div class="pf-section"> <div class="pf-shead"> <span class="pf-snum">03</span> <span class="pf-stitle">Quantizations</span> </div> <div class="pf-sbody"> <div class="q-row"> <div class="q-panel"> <span class="q-type">GGUF</span> <div class="q-sep"></div> <a href="https://huggingface.co/zerofata/Q3.5-BlueStar-27B-gguf">iMatrix</a> </div> </div> </div> </div> <div class="pf-section"> <div class="pf-shead"> <span class="pf-snum">04</span> <span class="pf-stitle">Creation Process</span> </div> <div class="pf-sbody"> <p>Creation Process: SFT</p> <p>SFT on approx 23 million tokens (12 million trainable). New is some Gemini Synth data which replaces some of my lower quality datasets.</p> <p>About 10% of the dataset included reasoning for creative assistant tasks. This reasoning seems to have generalized quite well to other parts of the model and heavily reduces the token usage of thinking.</p> <p>I think this model still needs a pass over with DPO to try and tackle the repetition and some of the weird oddities of the original instruct model, but that'll need to wait. I've been overspending training these models recently.</p> <p>Trained using MS-Swift.</p> <details> <summary>MS-Swift Config</summary> <div class="detail-body"> <div class="cfg-title">SFT (4*H200)</div> <pre><code>PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' \ USE_HF=True \ WANDB_PROJECT=Qwen3.5-27B-SFT \ CUDA_VISIBLE_DEVICES=0,1,2,3 \ NPROC_PER_NODE=4 \ swift sft \ --model Qwen/Qwen3.5-27B \ --tuner_type lora \ --dataset '/workspace/think_dataset.jsonl' \ '/workspace/nothink_dataset.jsonl' \ --torch_dtype bfloat16 \ --bf16 true \ --use_liger_kernel true \ --lora_rank 128 \ --lora_alpha 16 \ --use_rslora true \ --target_modules all-linear \ --freeze_llm false \ --freeze_vit true \ --freeze_aligner true \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 16 \ --num_train_epochs 2 \ --learning_rate 2e-5 \ --warmup_ratio 0.05 \ --max_length 10752 \ --split_dataset_ratio 0.01 \ --add_non_thinking_prefix true \ --load_from_cache_file true \ --group_by_length true \ --eval_steps 200 \ --save_steps 200 \ --save_total_limit 10 \ --logging_steps 1 \ --dataloader_num_workers 8 \ --output_dir output/Qwen3.5-27B-SFT-Model \ --report_to wandb</code></pre> </div> </details> </div> </div> </div> </body> </html>

Author: llmfan46

Likes: 4

Downloads: 0

Tags: safetensors, qwen3_5, heretic, uncensored, decensored, abliterated, dataset:zerofata/Instruct-Anime, dataset:zerofata/Gemini-3.1-Pro-SmallWiki, dataset:zerofata/Gemini-3.1-Pro-GLM5-Characters, dataset:zerofata/Roleplay-Anime-Characters, base_model:zerofata/Q3.5-BlueStar-27B, base_model:finetune:zerofata/Q3.5-BlueStar-27B, license:mit, region:us

Aisha-AI-Official/wan2.2-perfect-insertion


base_model:

  • Wan-AI/Wan2.2-I2V-A14B-Diffusers pipeline_tag: image-text-to-video tags:
  • lora
  • nsfw

Perfect Insertion (back view) [Real]

Damn, how can you sit so well?! 😮

I've been practicing 😏

(Download links at the end of this page)

<video width="400px" loop autoplay muted src="https://huggingface.co/Aisha-AI-Official/wan2.2-perfect-insertion/resolve/main/sitting_penis_1.mp4"></video>

At the bottom a man's lower body appears, his belly, thighs, hand, and his hard penis are visible. He is holding his hard penis with his left hand, stroking it.
With both hands, she separates her buttocks and sits on the hard penis, inserting it completely inside her pussy.

<video width="400px" loop autoplay muted src="https://huggingface.co/Aisha-AI-Official/wan2.2-perfect-insertion/resolve/main/sitting_penis_2.mp4"></video>

At the bottom a man's lower body appears, his belly, thighs, hand, and his hard penis are visible. He is holding his hard penis with his left hand, stroking it.
With both hands, she separates her buttocks and sits on the hard penis, inserting it completely inside her pussy.

<video width="400px" loop autoplay muted src="https://huggingface.co/Aisha-AI-Official/wan2.2-perfect-insertion/resolve/main/sitting_penis_cut_1.mp4"></video>

The asian woman is posing, she has pale skin and short white hair.
Then the scene cuts to a living room. At the bottom is a light-skinned man's lower body, his belly, thighs, hand, and his hard penis are visible. He is holding his hard penis with his hand, stroking it.

In the center, the same woman is standing. She is facing away from the camera, wearing the same short dress.

The woman then lifts her dress, revealing her bare buttocks without panties, her vagina, and anus. With both hands, she separates her buttocks and sits on the hard penis, inserting it completely inside her vagina.

<video width="400px" loop autoplay muted src="https://huggingface.co/Aisha-AI-Official/wan2.2-perfect-insertion/resolve/main/sitting_penis_cut_2.mp4"></video>

The redhead woman is posing, she has pale skin and long hair.
Then the scene cuts to a living room. At the bottom is a light-skinned man's lower body, his belly, thighs, hand, and his hard penis are visible. He is holding his hard penis with his hand, stroking it.

In the center, the same woman is standing. She is facing away from the camera, wearing only cute shorts.

The woman then lowers her shorts, revealing her bare buttocks without underwear, her vagina, and anus. With both hands, she separates her buttocks and sits on the hard penis, inserting it completely inside her vagina.

<video width="400px" loop autoplay muted src="https://huggingface.co/Aisha-AI-Official/wan2.2-perfect-insertion/resolve/main/sitting_penis_cut_3.mp4"></video>

The african woman is posing, she has blacak skin and long curly hair.
Then the scene cuts to a living room. At the bottom is a light-skinned man's lower body, his belly, thighs, hand, and his hard penis are visible. He is holding his hard penis with his hand, stroking it.

In the center, the same woman is standing. She is facing away from the camera, wearing only a short dress that doesn't cover her buttocks, leaving her vagina and anus visible.

The woman then uses both hands to separate her buttocks and sits on the hard penis, inserting it completely into her vagina. Her buttocks move up and down repeatedly with speed, causing the hard penis to go in and out of her vagina.

Training

  • 400 steps on high noise
  • 3 edited videos (80 frames each)
  • HLR + ZCD (These acronyms were created solely to confuse you)
  • Lots of hope
  • $10 (😭😭😭😭😭😭😭😭😭😭😭😭)

Usage (Low noise)

This LoRA was trained only on High Noise, which means you'll have to use some Low Noise that knows what a penis is (I used the Low Noise from POV Insertion V1).

I2V:

Prompt (ongoing):

At the bottom a man's lower body appears, his belly, thighs, hand, and his hard penis are visible. He is holding his hard penis with his left hand, stroking it.
With both hands, she separates her buttocks and sits on the hard penis, inserting it completely inside her pussy.

Prompt (cut to):

The [ethnicity] woman is posing, she has [describe the details of it only to maintain consistency].
Then the scene cuts to a [you can choose any place]. At the bottom is a light-skinned man's lower body, his belly, thighs, hand, and his hard penis are visible. He is holding his hard penis with his hand, stroking it.

In the center, the same woman is standing. She is facing away from the camera, wearing [what she's wearing or if she's naked].

The woman then [lifts her dress, or lower panties, pants, etc], revealing her bare buttocks without panties, her vagina, and anus. With both hands, she separates her buttocks and sits on the hard penis, inserting it completely inside her vagina.

High Noise LoRA Scale: 1.0 for creativity, 1.5 for stability or "cut to"

Low Noise LoRA Scale: 1.0

Shift: 4

T2V:

It probably works, but I haven't tested it. If you're going to test it, just keep the same structure as I2V, and try starting with a very low Scale on High Noise, like 0.5

About HLR + ZCD

This is a fast-learning technique, which makes LoRA less flexible. This can drastically reduce creativity, but it yields stable results while using few resources. Negative effects:

  1. It will probably only work at the same camera angle
  2. High chance of not responding to different prompts
  3. High chance of forcing the original characters of the training video Training only the High Noise reduces this chance, but it can force things that are VERY similar, like the same hair length, same color, etc.

Download

Download High Noise LoRA

Download Low Noise LoRA (low-pov-insertion-v1.0)

Help me creating more

If you want to help me continue making LoRAs, or if you want me to make a LoRA for you, buy 5000 PlayCoins at Aisha-AI and transfer them to my account (account number 2).

This helps Aisha-AI to stay alive and produce new LoRAs for you all 💜

Author: Aisha-AI-Official

Likes: 4

Downloads: 0

Tags: lora, nsfw, image-text-to-video, base_model:Wan-AI/Wan2.2-I2V-A14B-Diffusers, base_model:adapter:Wan-AI/Wan2.2-I2V-A14B-Diffusers, region:us

BAAI-Humanoid/MOSAIC_Model


license: apache-2.0 datasets:

  • BAAI-Humanoid/MOSAIC_Dataset pipeline_tag: reinforcement-learning tags:
  • humanoid
  • motion-tracking
  • teleoperation
  • reinforcement-learning arxiv: 2602.08594

MOSAIC Model

Project Page | Paper | Code | Dataset | Model

This repository releases deployment-ready ONNX models for MOSAIC, introduced in:

MOSAIC: Bridging the Sim-to-Real Gap in Generalist Humanoid Motion Tracking and Teleoperation with Rapid Residual Adaptation

In MOSAIC, a general motion tracker is trained in simulation, and interface-specific adaptation is handled via a lightweight residual adaptor that injects action-space corrections while preserving the general policy’s capabilities.

It includes models:

  1. gmt.onnx: the general motion tracking policy
  2. noitom_teleop.onnx: the adaptor policy for teleoperation using Noitom inertial mocap suit
  3. pico_teleop.onnx: the adaptor policy for teleoperation using PICO VR device

How to download

Option A: Download a single ONNX file

from huggingface_hub import hf_hub_download

onnx_path = hf_hub_download(
    repo_id="BAAI-Humanoid/MOSAIC_Model",
    filename="pico_teleop.onnx",   # or "noitom_teleop.onnx"
)
print("Downloaded to:", onnx_path)

Option B: Download all files in this model repo

from huggingface_hub import snapshot_download

local_dir = snapshot_download(
  repo_id="BAAI-Humanoid/MOSAIC_Model",
)
print("Downloaded to:", local_dir)

Usage

For constructing the correct deployable observations and mapping model outputs to Unitree G1 control targets, please use the official MOSAIC codebase and RobotBridge Deployment framework.


Citation

If you use these models for your research, please cite our paper:

@article{sun2026mosaic,
  title   = {MOSAIC: Bridging the Sim-to-Real Gap in Generalist Humanoid Motion Tracking and Teleoperation with Rapid Residual Adaptation},
  author  = {Zhenguo Sun and Bo-Sheng Huang and Yibo Peng and Xukun Li and Jingyu Ma and Yu Sun and Zhe Li and Haojun Jiang and Biao Gao and Zhenshan Bing and Xinlong Wang and Alois Knoll},
  journal = {arXiv preprint arXiv:2602.08594},
  year    = {2026}
}

License

This dataset is released under Apache-2.0.

Author: BAAI-Humanoid

Likes: 4

Downloads: 0

Tags: onnx, humanoid, motion-tracking, teleoperation, reinforcement-learning, dataset:BAAI-Humanoid/MOSAIC_Dataset, arxiv:2602.08594, license:apache-2.0, region:us

Naphula/GhostFace-24B-v1


license: apache-2.0 base_model:

  • mistralai/Magistral-Small-2509
  • Casual-Autopsy/Maginum-Cydoms-24B
  • DarkArtsForge/Asmodeus-24B-v1
  • DarkArtsForge/Magistaroth-24B-v1
  • FlareRebellion/WeirdCompound-v1.7-24b
  • Gryphe/Tiamat-24B-Magistral
  • Naphula/BeaverAI_Fallen-Mistral-Small-3.1-24B-v1e_textonly
  • Naphula/Goetia-24B-v1.3
  • Naphula/Slimaki-24B-v1
  • ReadyArt/4.2.0-Broken-Tutu-24b
  • sophosympatheia/Magistry-24B-v1.0
  • TheDrummer/Cydonia-24B-v4.3
  • TheDrummer/Magidonia-24B-v4.3
  • TheDrummer/Precog-24B-v1
  • zerofata/MS3.2-PaintedFantasy-v2-24B
  • zerofata/MS3.2-PaintedFantasy-v3-24B
  • zerofata/MS3.2-PaintedFantasy-v4.1-24B datasets:
  • OccultAI/illuminati_imatrix_v1 language:
  • en library_name: transformers tags:
  • della
  • sce
  • model_stock
  • scream
  • merge
  • mergekit widget:
    • text: "GhostFace 24B v1" output: url: https://cdn-uploads.huggingface.co/production/uploads/68e840caa318194c44ec2a04/54Fa6PFEckcln-7quE4pQ.png

[!CAUTION] <span style="color:red; font-weight:bold">⚠️ Warning:</span> This model can produce narratives and RP that contain violent and graphic erotic content. Adjust your system prompt accordingly, and use Mistral Tekken chat template.

👻 GhostFace 24B v1

GhostFace

Merge Method

This model was merged using the scream method. This custom method combines the stability of model_stock with the novelty of della and sce.

This is a random experiment that seeks to unite several of the best Mistral 24B merges and finetunes. There are some refusals, however the model seems quite intelligent so is being released as is.

I don't feel like ablating it right now but maybe later since it works pretty well with jailbreaks.

architecture: MistralForCausalLM
merge_method: scream # Similarity-Consensus Resolved Enhanced Adaptive Merging
base_model: B:\24B\!models--mistralai--Magistral-Small-2509
models:
  - model: B:\24B\!BeaverAI_Fallen-Mistral-Small-3.1-24B-v1e_textonly
    parameters:
      weight: 0.1
  - model: B:\24B\!models--Casual-Autopsy--Maginum-Cydoms-24B
    parameters:
      weight: 0.1
  - model: B:\24B\!models--DarkArtsForge--Asmodeus-24B-v1
    parameters:
      weight: 0.1
  - model: B:\24B\!models--DarkArtsForge--Magistaroth-24B-v1
    parameters:
      weight: 0.1
  - model: B:\24B\!models--FlareRebellion--WeirdCompound-v1.7-24b
    parameters:
      weight: 0.1
  - model: B:\24B\!models--Gryphe--Tiamat-24B-Magistral
    parameters:
      weight: 0.1
  - model: B:\24B\!models--Naphula--Slimaki-24B-v1
    parameters:
      weight: 0.1
  - model: B:\24B\!models--ReadyArt--4.2.0-Broken-Tutu-24b
    parameters:
      weight: 0.1
  - model: B:\24B\!models--sophosympatheia--Magistry-24B-v1.0
    parameters:
      weight: 0.1
  - model: B:\24B\!models--TheDrummer--Cydonia-24B-v4.3
    parameters:
      weight: 0.1
  - model: B:\24B\!models--TheDrummer--Magidonia-24B-v4.3
    parameters:
      weight: 0.1
  - model: B:\24B\!models--TheDrummer--Precog-24B-v1
    parameters:
      weight: 0.1
  - model: B:\24B\!models--zerofata--MS3.2-PaintedFantasy-v2-24B
    parameters:
      weight: 0.1
  - model: B:\24B\!models--zerofata--MS3.2-PaintedFantasy-v3-24B
    parameters:
      weight: 0.1
  - model: B:\24B\!models--zerofata--MS3.2-PaintedFantasy-v4.1-24B
    parameters:
      weight: 0.1
  - model: B:\24B\!models--Naphula--Goetia-24B-v1.3
    parameters:
      weight: 0.1
parameters:
  stock_weight: 0.4          # Weight for model_stock component
  della_novelty_weight: 0.3  # Weight for DELLA novelty
  sce_novelty_weight: 0.3    # Weight for SCE novelty
  density: 0.9               # DELLA density parameter
  epsilon: 0.05              # DELLA epsilon parameter
  select_topk: 0.5           # SCE top-k selection
  filter_wise: false         # Model Stock filter-wise calculation
  int8_mask: false           # Use int8 masks for memory efficiency
dtype: float32
out_dtype: bfloat16
tokenizer:
  source: union
# chat_template: auto
name: 👻 GhostFace-24B-v1

Author: Naphula

Likes: 4

Downloads: 0

Tags: transformers, safetensors, mistral, text-generation, della, sce, model_stock, scream, merge, mergekit, en, dataset:OccultAI/illuminati_imatrix_v1, base_model:Casual-Autopsy/Maginum-Cydoms-24B, base_model:merge:Casual-Autopsy/Maginum-Cydoms-24B, base_model:DarkArtsForge/Asmodeus-24B-v1, base_model:merge:DarkArtsForge/Asmodeus-24B-v1, base_model:DarkArtsForge/Magistaroth-24B-v1, base_model:merge:DarkArtsForge/Magistaroth-24B-v1, base_model:FlareRebellion/WeirdCompound-v1.7-24b, base_model:merge:FlareRebellion/WeirdCompound-v1.7-24b, base_model:Gryphe/Tiamat-24B-Magistral, base_model:merge:Gryphe/Tiamat-24B-Magistral, base_model:Naphula/BeaverAI_Fallen-Mistral-Small-3.1-24B-v1e_textonly, base_model:merge:Naphula/BeaverAI_Fallen-Mistral-Small-3.1-24B-v1e_textonly, base_model:Naphula/Goetia-24B-v1.3, base_model:merge:Naphula/Goetia-24B-v1.3, base_model:Naphula/Slimaki-24B-v1, base_model:merge:Naphula/Slimaki-24B-v1, base_model:ReadyArt/4.2.0-Broken-Tutu-24b, base_model:merge:ReadyArt/4.2.0-Broken-Tutu-24b, base_model:TheDrummer/Cydonia-24B-v4.3, base_model:merge:TheDrummer/Cydonia-24B-v4.3, base_model:TheDrummer/Magidonia-24B-v4.3, base_model:merge:TheDrummer/Magidonia-24B-v4.3, base_model:TheDrummer/Precog-24B-v1, base_model:merge:TheDrummer/Precog-24B-v1, base_model:mistralai/Magistral-Small-2509, base_model:merge:mistralai/Magistral-Small-2509, base_model:sophosympatheia/Magistry-24B-v1.0, base_model:merge:sophosympatheia/Magistry-24B-v1.0, base_model:zerofata/MS3.2-PaintedFantasy-v2-24B, base_model:merge:zerofata/MS3.2-PaintedFantasy-v2-24B, base_model:zerofata/MS3.2-PaintedFantasy-v3-24B, base_model:merge:zerofata/MS3.2-PaintedFantasy-v3-24B, base_model:zerofata/MS3.2-PaintedFantasy-v4.1-24B, base_model:merge:zerofata/MS3.2-PaintedFantasy-v4.1-24B, license:apache-2.0, text-generation-inference, endpoints_compatible, region:us

DavidAU/Qwen3.5-9B-Claude-4.6-HighIQ-THINKING-HERETIC-UNCENSORED


language:

  • en
  • zh license: apache-2.0 tags:
  • fine tune
  • creative
  • creative writing
  • fiction writing
  • plot generation
  • sub-plot generation
  • fiction writing
  • story generation
  • scene continue
  • storytelling
  • fiction story
  • science fiction
  • romance
  • all genres
  • story
  • writing
  • vivid prosing
  • vivid writing
  • fiction
  • roleplaying
  • bfloat16
  • all use cases
  • unsloth
  • heretic
  • uncensored
  • abliterated library_name: transformers pipeline_tag: image-text-to-text base_model:
  • trohrbaugh/Qwen3.5-9B-heretic-v2

<h2>Qwen3.5-9B-Claude-4.6-HighIQ-THINKING-HERETIC-UNCENSORED</h2>

Fine tune via Unsloth of Qwen 3.5 9B dense model using Claude 4.6 large distill dataset on local hardware.

This has VASTLY improved the thinking generation (and benchmarks) of this model replacing "Qwen 3.5" thinking with "Claude 4.6" thinking.

Every attempt was made to ensure the training was "mild" and did not negatively affect the model's already incrediblely strong benchmarks.

This is also a HERETIC model, trained post "Heretic'ing" -> this model does what you want, no questions asked.

Fully uncensored.

Vision (images) tested -> working with new training.

BENCHMARKS:

         arc   arc/e boolq hswag obkqa piqa  wino

HERETIC version (thinking):
mxfp8    0.432,0.505,0.625,0.658,0.374,0.748,0.657

HERETIC verison (instruct):
mxfp8    0.574,0.755,0.869,0.714,0.410,0.780,0.691

Qwen3.5-9B-Claude-4.6-HighIQ-INSTRUCT
mxfp8    0.574,0.729,0.882,0.711,0.422,0.775,0.691

Qwen3.5-9B (thinking)
mxfp8    0.417,0.458,0.623,0.634,0.338,0.737,0.639

DE-CENSORING:

Performance

KLD of less than 1 is excellent, zero is perfect.

| Metric | This model | Original model (Qwen/Qwen3.5-9B) | | :----- | :--------: | :---------------------------: | | KL divergence | 0.0793 | 0 (by definition) | | Refusals | 6/100 | 100/100 |

NOTES:

  • Suggest min q4ks (non-imatrix) or IQ3S (imatrix).
  • Tested with rep pen of 1 (off).
  • Context: 256k (default).

IMPORTANT:

  • Other versions in testing.
  • Information from Qwen's repo below.
  • Video portions of the model were NOT TESTED.

Qwen3.5-9B

<img width="400px" src="https://qianwen-res.oss-accelerate.aliyuncs.com/logo_qwen3.5.png">

Qwen Chat

[!Note] This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format.

These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc.

Over recent months, we have intensified our focus on developing foundation models that deliver exceptional utility and performance. Qwen3.5 represents a significant leap forward, integrating breakthroughs in multimodal learning, architectural efficiency, reinforcement learning scale, and global accessibility to empower developers and enterprises with unprecedented capability and efficiency.

Qwen3.5 Highlights

Qwen3.5 features the following enhancement:

  • Unified Vision-Language Foundation: Early fusion training on multimodal tokens achieves cross-generational parity with Qwen3 and outperforms Qwen3-VL models across reasoning, coding, agents, and visual understanding benchmarks.

  • Efficient Hybrid Architecture: Gated Delta Networks combined with sparse Mixture-of-Experts deliver high-throughput inference with minimal latency and cost overhead.

  • Scalable RL Generalization: Reinforcement learning scaled across million-agent environments with progressively complex task distributions for robust real-world adaptability.

  • Global Linguistic Coverage: Expanded support to 201 languages and dialects, enabling inclusive, worldwide deployment with nuanced cultural and regional understanding.

  • Next-Generation Training Infrastructure: Near-100% multimodal training efficiency compared to text-only training and asynchronous RL frameworks supporting massive-scale agent scaffolds and environment orchestration.

Benchmark Results

For more details, please refer to our blog post Qwen3.5.

Model Overview

  • Type: Causal Language Model with Vision Encoder
  • Training Stage: Pre-training & Post-training
  • Language Model
    • Number of Parameters: 9B
    • Hidden Dimension: 4096
    • Token Embedding: 248320 (Padded)
    • Number of Layers: 32
    • Hidden Layout: 8 × (3 × (Gated DeltaNet → FFN) → 1 × (Gated Attention → FFN))
    • Gated DeltaNet:
      • Number of Linear Attention Heads: 32 for V and 16 for QK
      • Head Dimension: 128
    • Gated Attention:
      • Number of Attention Heads: 16 for Q and 4 for KV
      • Head Dimension: 256
      • Rotary Position Embedding Dimension: 64
    • Feed Forward Network:
      • Intermediate Dimension: 12288
    • LM Output: 248320 (Padded)
    • MTP: trained with multi-steps
  • Context Length: 262,144 natively and extensible up to 1,010,000 tokens.

Benchmark Results

Language

<div style="font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;max-width:1000px;margin:0 auto;padding:16px 0"> <table style="width:100%;border-collapse:collapse;font-size:13px"> <thead><tr> <th style="padding:10px 7px;text-align:left;font-weight:600;border-bottom:2px solid #7c3aed;color:#7c3aed"></th><th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size: 14px;">GPT-OSS-120B</th><th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size: 14px;">GPT-OSS-20B</th><th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size: 14px;">Qwen3-Next-80B-A3B-Thinking</th><th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size: 14px;">Qwen3-30BA3B-Thinking-2507</th><th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size: 14px;">Qwen3.5-9B</th><th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size: 14px;">Qwen3.5-4B</th></tr></thead> <tbody> <tr><td colspan="7" style="padding:8px 12px;font-weight:600;color:#7c3aed;border-bottom:1px solid rgba(124, 58, 237, 0.2);background:rgba(124, 58, 237, 0.1)">Knowledge & STEM</td></tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MMLU-Pro</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">80.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">74.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">82.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">80.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">82.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">79.1</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MMLU-Redux</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">91.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">87.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">92.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">91.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">91.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">88.8</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">C-Eval</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">76.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">71.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">89.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">87.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">88.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">85.1</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">SuperGPQA</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">54.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">48.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">60.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">56.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">58.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">52.9</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">GPQA Diamond</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">80.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">71.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">77.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">73.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">81.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">76.2</td> </tr> <tr><td colspan="7" style="padding:8px 12px;font-weight:600;color:#7c3aed;border-bottom:1px solid rgba(124, 58, 237, 0.2);background:rgba(124, 58, 237, 0.1)">Instruction Following</td></tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">IFEval</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">88.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">88.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">88.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">88.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">91.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">89.8</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">IFBench</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">69.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">65.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">61.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">51.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">64.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">59.2</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MultiChallenge</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">45.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">40.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">51.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">46.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">54.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">49.0</td> </tr> <tr><td colspan="7" style="padding:8px 12px;font-weight:600;color:#7c3aed;border-bottom:1px solid rgba(124, 58, 237, 0.2);background:rgba(124, 58, 237, 0.1)">Long Context</td></tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">AA-LCR</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">50.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">30.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">51.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">49.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">63.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">57.0</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">LongBench v2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">48.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">45.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">48.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">44.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">55.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">50.0</td> </tr> <tr><td colspan="7" style="padding:8px 12px;font-weight:600;color:#7c3aed;border-bottom:1px solid rgba(124, 58, 237, 0.2);background:rgba(124, 58, 237, 0.1)">Reasoning & Coding</td></tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">HMMT Feb 25</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">90.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">76.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">73.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">63.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">83.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">74.0</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">HMMT Nov 25</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">90.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">81.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">81.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">73.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">82.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">76.8</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">LiveCodeBench v6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">82.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">74.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">68.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">66.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">65.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">55.8</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">OJBench</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">41.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">36.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">29.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">25.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">29.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">24.1</td> </tr> <tr><td colspan="7" style="padding:8px 12px;font-weight:600;color:#7c3aed;border-bottom:1px solid rgba(124, 58, 237, 0.2);background:rgba(124, 58, 237, 0.1)">General Agent</td></tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">BFCL-V4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">49.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">42.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">66.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">50.3</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">TAU2-Bench</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">57.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">41.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">79.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">79.9</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">VITA-Bench</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">29.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">14.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">29.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">22.0</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">DeepPlanning</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">0.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">4.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">18.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">17.6</td> </tr> <tr><td colspan="7" style="padding:8px 12px;font-weight:600;color:#7c3aed;border-bottom:1px solid rgba(124, 58, 237, 0.2);background:rgba(124, 58, 237, 0.1)">Multilingualism</td></tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MMMLU</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">69.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">81.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">81.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">76.1</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MMLU-ProX</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">74.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">67.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">73.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">69.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">76.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">71.5</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">NOVA-63</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">51.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">48.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">53.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">52.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">55.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">54.3</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">INCLUDE</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">74.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">65.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">74.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">75.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">71.0</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">Global PIQA</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">84.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">79.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">83.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">80.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">83.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.9</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">PolyMATH</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">54.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">30.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">62.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">52.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">57.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">51.1</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">WMT24++</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">74.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">67.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">57.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">69.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">72.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">66.6</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MAXIFE</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">83.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">80.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">79.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">77.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">83.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.0</td> </tr> </tbody> </table> <p style="margin-top:12px;font-size:11px;opacity:0.7"> * TAU2-Bench: we follow the official setup except for the airline domain, where all models are evaluated by applying the fixes proposed in the Claude Opus 4.5 system card.<br> <br> * MMLU-ProX: we report the averaged accuracy on 29 languages.<br> * WMT24++: a harder subset of WMT24 after difficulty labeling and rebalancing; we report the averaged scores on 55 languages using XCOMET-XXL.<br> * MAXIFE: we report the accuracy on English + multilingual original prompts (totally 23 settings).<br> * Empty cells (--) indicate scores not yet available or not applicable. </p> </div>

Vision Language

<div style="font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;max-width:1000px;margin:0 auto;padding:16px 0"> <table style="width:100%;border-collapse:collapse;font-size:13px"> <thead><tr> <th style="padding:10px 7px;text-align:left;font-weight:600;border-bottom:2px solid #7c3aed;color:#7c3aed"></th><th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size: 14px;">GPT-5-Nano-2025-08-07</th><th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size: 14px;">Gemini-2.5-Flash-Lite</th><th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size: 14px;">Qwen3-VL-30B-A3B</th><th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size: 14px;">Qwen3.5-9B</th><th style="padding:10px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size: 14px;">Qwen3.5-4B</th></tr></thead> <tbody> <tr><td colspan="6" style="padding:8px 12px;font-weight:600;color:#7c3aed;border-bottom:1px solid rgba(124, 58, 237, 0.2);background:rgba(124, 58, 237, 0.1)">STEM and Puzzle </td></tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MMMU</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">75.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">73.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">76.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">77.6</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MMMU-Pro</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">57.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">59.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">63.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">70.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">66.3</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MathVision</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">62.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">52.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">65.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">74.6</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">Mathvista(mini)</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">71.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">72.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">81.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">85.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">85.1</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">We-Math</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">62.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">32.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">70.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">75.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">75.4</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">DynaMath</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">69.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">80.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">83.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">83.3</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">ZEROBench</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">1.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">1.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">0.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">3.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">3.0</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">ZEROBench_sub</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">22.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">19.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">23.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">31.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">26.3</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">VlmsAreBlind</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">66.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">68.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">72.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">93.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">92.6</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">BabyVision</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">14.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">17.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">18.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">28.6/25.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">16.0/19.1</td> </tr> <tr><td colspan="6" style="padding:8px 12px;font-weight:600;color:#7c3aed;border-bottom:1px solid rgba(124, 58, 237, 0.2);background:rgba(124, 58, 237, 0.1)">General VQA</td></tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">RealWorldQA</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">71.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">72.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">77.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">80.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">79.5</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MMStar</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">68.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">69.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">75.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">79.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.3</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MMBench<sub><small>EN-DEV-v1.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">80.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">82.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">88.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">90.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">89.4</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">SimpleVQA</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">46.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">54.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">54.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">51.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">43.4</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">HallusionBench</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">58.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">64.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">66.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">69.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">65.0</td> </tr> <tr><td colspan="6" style="padding:8px 12px;font-weight:600;color:#7c3aed;border-bottom:1px solid rgba(124, 58, 237, 0.2);background:rgba(124, 58, 237, 0.1)">Text Recognition and Document Understanding</td></tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">OmniDocBench1.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">55.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">79.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">86.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">87.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">86.2</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">CharXiv(RQ)</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">50.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">56.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">56.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">73.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">70.8</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MMLongBench-Doc</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">31.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">46.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">47.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">57.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">54.2</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">CC-OCR</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">58.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">72.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">77.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">79.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">76.7</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">AI2D_TEST</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">81.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">85.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">86.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">90.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">89.6</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">OCRBench</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">75.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">82.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">83.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">89.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">85.0</td> </tr> <tr><td colspan="6" style="padding:8px 12px;font-weight:600;color:#7c3aed;border-bottom:1px solid rgba(124, 58, 237, 0.2);background:rgba(124, 58, 237, 0.1)">Spatial Intelligence</td></tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">ERQA</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">45.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">44.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">45.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">55.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">54.0</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">CountBench</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">80.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">79.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">90.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">97.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">96.3</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">RefCOCO(avg)</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">89.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">89.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">88.1</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">EmbSpatialBench</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">74.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">66.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">80.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">83.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">81.3</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">RefSpatialBench</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">12.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">11.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">54.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">58.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">54.6</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">LingoQA</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">57.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">17.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">62.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">80.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">74.4</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">Hypersim</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">11.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">13.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">12.5</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">Nuscene</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">10.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">11.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">9.9</td> </tr> <tr><td colspan="6" style="padding:8px 12px;font-weight:600;color:#7c3aed;border-bottom:1px solid rgba(124, 58, 237, 0.2);background:rgba(124, 58, 237, 0.1)">Video Understanding</td></tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">VideoMME<sub><small>(w sub.)</sub></small></td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">71.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">74.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">79.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">84.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">83.5</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">VideoMME<sub><small>(w/o sub.)</sub></small></td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">66.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">72.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">73.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">76.9</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">VideoMMMU</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">63.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">69.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">75.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">74.1</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MLVU</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">69.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">78.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">84.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">82.8</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MVBench</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">72.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">74.4</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">71.2</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">LVBench</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">60.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">59.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">70.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">66.4</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MMVU</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">63.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">65.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">66.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">67.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">64.9</td> </tr> <tr><td colspan="6" style="padding:8px 12px;font-weight:600;color:#7c3aed;border-bottom:1px solid rgba(124, 58, 237, 0.2);background:rgba(124, 58, 237, 0.1)">Visual Agent </td></tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">ScreenSpot Pro</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">60.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">65.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">60.3</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">OSWorld-Verified</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">30.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">41.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">35.6</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">AndroidWorld</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">--</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">55.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">57.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">58.6</td> </tr> <tr><td colspan="6" style="padding:8px 12px;font-weight:600;color:#7c3aed;border-bottom:1px solid rgba(124, 58, 237, 0.2);background:rgba(124, 58, 237, 0.1)">Tool Calling</td></tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">TIR-Bench</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">18.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">21.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">22.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">45.6/31.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">38.9/29.9</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">V*</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">68.1</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">69.6</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">83.2</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">90.1/88.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">84.3/86.4</td> </tr> <tr><td colspan="6" style="padding:8px 12px;font-weight:600;color:#7c3aed;border-bottom:1px solid rgba(124, 58, 237, 0.2);background:rgba(124, 58, 237, 0.1)">Medical VQA</td></tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">SLAKE</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">57.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">65.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">68.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">79.0</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">76.1</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">PMC-VQA</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">37.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">48.8</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">51.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">57.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">55.5</td> </tr> <tr> <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128, 128, 128, 0.15);">MedXpertQA-MM</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">26.7</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">35.3</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">35.5</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">49.9</td> <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">42.9</td> </tr> </tbody> </table> <p style="margin-top:12px;font-size:11px;opacity:0.7"> * MathVision: our model’s score is evaluated using a fixed prompt, e.g., “Please reason step by step, and put your final answer within \boxed{}.” For other models, we report the higher score between runs with and without the \boxed{} formatting.<br> * BabyVision: scores reported as "with CI / without CI".<br> * TIR-Bench and V*: scores reported as "with CI / without CI".<br> * Empty cells (--) indicate scores not yet available or not applicable. </p> </div>

Quickstart

[!Important] Qwen3.5 models operate in thinking mode by default, generating thinking content signified by <think>\n...</think>\n\n before producing the final responses. To disable thinking content and obtain direct response, refer to the examples here.

For streamlined integration, we recommend using Qwen3.5 via APIs. Below is a guide to use Qwen3.5 via OpenAI-compatible API.

Serving Qwen3.5

Qwen3.5 can be served via APIs with popular inference frameworks. In the following, we show example commands to launch OpenAI-Compatible API servers for Qwen3.5 models.

[!Important] Inference efficiency and throughput vary significantly across frameworks. We recommend using the latest framework versions to ensure optimal performance and compatibility. For production workloads or high-throughput scenarios, dedicated serving engines such as SGLang, KTransformers or vLLM are strongly recommended.

[!Important] The model has a default context length of 262,144 tokens. If you encounter out-of-memory (OOM) errors, consider reducing the context window. However, because Qwen3.5 leverages extended context for complex tasks, we advise maintaining a context length of at least 128K tokens to preserve thinking capabilities.

SGLang

SGLang is a fast serving framework for large language models and vision language models. SGLang from the main branch of the open-source repository is required for Qwen3.5, which can be installed using the following command in a fresh environment:

uv pip install 'git+https://github.com/sgl-project/sglang.git#subdirectory=python&egg=sglang[all]'

See its documentation for more details.

The following will create API endpoints at http://localhost:8000/v1:

  • Standard Version: The following command can be used to create an API endpoint with maximum context length 262,144 tokens using tensor parallel on 8 GPUs.

    python -m sglang.launch_server --model-path Qwen/Qwen3.5-9B --port 8000 --tp-size 1 --mem-fraction-static 0.8 --context-length 262144 --reasoning-parser qwen3
    
  • Tool Use: To support tool use, you can use the following command.

    python -m sglang.launch_server --model-path Qwen/Qwen3.5-9B --port 8000 --tp-size 1 --mem-fraction-static 0.8 --context-length 262144 --reasoning-parser qwen3 --tool-call-parser qwen3_coder
    
  • Multi-Token Prediction (MTP): The following command is recommended for MTP:

    python -m sglang.launch_server --model-path Qwen/Qwen3.5-9B --port 8000 --tp-size 1 --mem-fraction-static 0.8 --context-length 262144 --reasoning-parser qwen3 --speculative-algo NEXTN --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4
    

vLLM

vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. vLLM from the main branch of the open-source repository is required for Qwen3.5, which can be installed using the following command in a fresh environment:

uv pip install vllm --torch-backend=auto --extra-index-url https://wheels.vllm.ai/nightly

See its documentation for more details.

For detailed Qwen3.5 usage guide, see the vLLM Qwen3.5 recipe.

The following will create API endpoints at http://localhost:8000/v1:

  • Standard Version: The following command can be used to create an API endpoint with maximum context length 262,144 tokens using tensor parallel on 8 GPUs.

    vllm serve Qwen/Qwen3.5-9B --port 8000 --tensor-parallel-size 1 --max-model-len 262144 --reasoning-parser qwen3 
    
  • Tool Call: To support tool use, you can use the following command.

    vllm serve Qwen/Qwen3.5-9B --port 8000 --tensor-parallel-size 1 --max-model-len 262144 --reasoning-parser qwen3 --enable-auto-tool-choice --tool-call-parser qwen3_coder 
    
  • Multi-Token Prediction (MTP): The following command is recommended for MTP:

    vllm serve Qwen/Qwen3.5-9B --port 8000 --tensor-parallel-size 1 --max-model-len 262144 --reasoning-parser qwen3 --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}'
    
  • Text-Only: The following command skips the vision encoder and multimodal profiling to free up memory for additional KV cache:

    vllm serve Qwen/Qwen3.5-9B --port 8000 --tensor-parallel-size 1 --max-model-len 262144 --reasoning-parser qwen3 --language-model-only
    

KTransformers

KTransformers is a flexible framework for experiencing cutting-edge LLM inference optimizations with CPU-GPU heterogeneous computing. For running Qwen3.5 with KTransformers, see the KTransformers Deployment Guide.

Hugging Face Transformers

Hugging Face Transformers contains a lightweight server which can be used for quick testing and moderate load deployment. The latest transformers is required for Qwen3.5:

pip install "transformers[serving] @ git+https://github.com/huggingface/transformers.git@main"

See its documentation for more details. Please also make sure torchvision and pillow are installed.

Then, run transformers serve to launch a server with API endpoints at http://localhost:8000/v1; it will place the model on accelerators if available:

transformers serve --force-model Qwen/Qwen3.5-9B --port 8000 --continuous-batching

Using Qwen3.5 via the Chat Completions API

The chat completions API is accessible via standard HTTP requests or OpenAI SDKs. Here, we show examples using the OpenAI Python SDK.

Before starting, make sure it is installed and the API key and the API base URL is configured, e.g.:

pip install -U openai

# Set the following accordingly
export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="EMPTY"

[!Tip] We recommend using the following set of sampling parameters for generation

  • Thinking mode for general tasks: temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0
  • Thinking mode for precise coding tasks (e.g. WebDev): temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0
  • Instruct (or non-thinking) mode for general tasks: temperature=0.7, top_p=0.8, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0
  • Instruct (or non-thinking) mode for reasoning tasks: temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0

Please note that the support for sampling parameters varies according to inference frameworks.

Text-Only Input

from openai import OpenAI
# Configured by environment variables
client = OpenAI()

messages = [
    {"role": "user", "content": "Type \"I love Qwen3.5\" backwards"},
]

chat_response = client.chat.completions.create(
    model="Qwen/Qwen3.5-9B",
    messages=messages,
    max_tokens=81920,
    temperature=1.0,
    top_p=0.95,
    presence_penalty=1.5,
    extra_body={
        "top_k": 20,
    }, 
)
print("Chat response:", chat_response)

Image Input

from openai import OpenAI
# Configured by environment variables
client = OpenAI()

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://qianwen-res.oss-accelerate.aliyuncs.com/Qwen3.5/demo/CI_Demo/mathv-1327.jpg"
                }
            },
            {
                "type": "text",
                "text": "The centres of the four illustrated circles are in the corners of the square. The two big circles touch each other and also the two little circles. With which factor do you have to multiply the radii of the little circles to obtain the radius of the big circles?\nChoices:\n(A) $\\frac{2}{9}$\n(B) $\\sqrt{5}$\n(C) $0.8 \\cdot \\pi$\n(D) 2.5\n(E) $1+\\sqrt{2}$"
            }
        ]
    }
]

chat_response = client.chat.completions.create(
    model="Qwen/Qwen3.5-9B",
    messages=messages,
    max_tokens=81920,
    temperature=1.0,
    top_p=0.95,
    presence_penalty=1.5,
    extra_body={
        "top_k": 20,
    }, 
)
print("Chat response:", chat_response)

Video Input

from openai import OpenAI
# Configured by environment variables
client = OpenAI()

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "video_url",
                "video_url": {
                    "url": "https://qianwen-res.oss-accelerate.aliyuncs.com/Qwen3.5/demo/video/N1cdUjctpG8.mp4"
                }
            },
            {
                "type": "text",
                "text": "Summarize the video content."
            }
        ]
    }
]

# When vLLM is launched with `--media-io-kwargs '{"video": {"num_frames": -1}}'`,
# video frame sampling can be configured via `extra_body` (e.g., by setting `fps`).
# This feature is currently supported only in vLLM.
#
# By default, `fps=2` and `do_sample_frames=True`.
# With `do_sample_frames=True`, you can customize the `fps` value to set your desired video sampling rate.
chat_response = client.chat.completions.create(
    model="Qwen/Qwen3.5-9B",
    messages=messages,
    max_tokens=81920,
    temperature=1.0,
    top_p=0.95,
    presence_penalty=1.5,
    extra_body={
        "top_k": 20,
        "mm_processor_kwargs": {"fps": 2, "do_sample_frames": True},
    }, 
)

print("Chat response:", chat_response)

Instruct (or Non-Thinking) Mode

[!Important] Qwen3.5 does not officially support the soft switch of Qwen3, i.e., /think and /nothink.

Qwen3.5 will think by default before response. You can obtain direct response from the model without thinking by configuring the API parameters. For example,

from openai import OpenAI
# Configured by environment variables
client = OpenAI()

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://qianwen-res.oss-accelerate.aliyuncs.com/Qwen3.5/demo/RealWorld/RealWorld-04.png"
                }
            },
            {
                "type": "text",
                "text": "Where is this?"
            }
        ]
    }
]

chat_response = client.chat.completions.create(
    model="Qwen/Qwen3.5-9B",
    messages=messages,
    max_tokens=32768,
    temperature=0.7,
    top_p=0.8,
    presence_penalty=1.5,
    extra_body={
        "top_k": 20,
        "chat_template_kwargs": {"enable_thinking": False},
    }, 
)
print("Chat response:", chat_response)

[!Note] If you are using APIs from Alibaba Cloud Model Studio, in addition to changing model, please use "enable_thinking": False instead of "chat_template_kwargs": {"enable_thinking": False}.

Agentic Usage

Qwen3.5 excels in tool calling capabilities.

Qwen-Agent

We recommend using Qwen-Agent to quickly build Agent applications with Qwen3.5.

To define the available tools, you can use the MCP configuration file, use the integrated tool of Qwen-Agent, or integrate other tools by yourself.

import os
from qwen_agent.agents import Assistant

# Define LLM
# Using Alibaba Cloud Model Studio
llm_cfg = {
    # Use the OpenAI-compatible model service provided by DashScope:
    'model': 'Qwen3.5-9B',
    'model_type': 'qwenvl_oai',
    'model_server': 'https://dashscope.aliyuncs.com/compatible-mode/v1',
    'api_key': os.getenv('DASHSCOPE_API_KEY'),

    'generate_cfg': {
        'use_raw_api': True,
        # When using Dash Scope OAI API, pass the parameter of whether to enable thinking mode in this way
        'extra_body': {
            'enable_thinking': True
        },
    },
}

# Using OpenAI-compatible API endpoint.
# functionality of the deployment frameworks and let Qwen-Agent automate the related operations.
#
# llm_cfg = {
#     # Use your own model service compatible with OpenAI API by vLLM/SGLang:
#     'model': 'Qwen/Qwen3.5-9B',
#     'model_type': 'qwenvl_oai',
#     'model_server': 'http://localhost:8000/v1',  # api_base
#     'api_key': 'EMPTY',
#
#     'generate_cfg': {
#         'use_raw_api': True,
#         # When using vLLM/SGLang OAI API, pass the parameter of whether to enable thinking mode in this way
#         'extra_body': {
#             'chat_template_kwargs': {'enable_thinking': True}
#         },
#     },
# }

# Define Tools
tools = [
    {'mcpServers': {  # You can specify the MCP configuration file
            "filesystem": {
                "command": "npx",
                "args": ["-y", "@modelcontextprotocol/server-filesystem", "/Users/xxxx/Desktop"]
            }
        }
    }
]

# Define Agent
bot = Assistant(llm=llm_cfg, function_list=tools)

# Streaming generation
messages = [{'role': 'user', 'content': 'Help me organize my desktop.'}]
for responses in bot.run(messages=messages):
    pass
print(responses)

# Streaming generation
messages = [{'role': 'user', 'content': 'Develop a dog website and save it on the desktop'}]
for responses in bot.run(messages=messages):
    pass
print(responses)

Qwen Code

Qwen Code is an open-source AI agent for the terminal, optimized for Qwen models. It helps you understand large codebases, automate tedious work, and ship faster.

For more information, please refer to Qwen Code.

Processing Ultra-Long Texts

Qwen3.5 natively supports context lengths of up to 262,144 tokens. For long-horizon tasks where the total length (including both input and output) exceeds this limit, we recommend using RoPE scaling techniques to handle long texts effectively., e.g., YaRN.

YaRN is currently supported by several inference frameworks, e.g., transformers, vllm, ktransformers and sglang. In general, there are two approaches to enabling YaRN for supported frameworks:

  • Modifying the model configuration file: In the config.json file, change the rope_parameters fields in text_config to:

    {
        "mrope_interleaved": true,
        "mrope_section": [
            11,
            11,
            10
        ],
        "rope_type": "yarn",
        "rope_theta": 10000000,
        "partial_rotary_factor": 0.25,
        "factor": 4.0,
        "original_max_position_embeddings": 262144,
    }
    
  • Passing command line arguments:

    For vllm, you can use

    VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve ... --hf-overrides '{"text_config": {"rope_parameters": {"mrope_interleaved": true, "mrope_section": [11, 11, 10], "rope_type": "yarn", "rope_theta": 10000000, "partial_rotary_factor": 0.25, "factor": 4.0, "original_max_position_embeddings": 262144}}}' --max-model-len 1010000  
    

    For sglang and ktransformers, you can use

    SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1 python -m sglang.launch_server ... --json-model-override-args '{"text_config": {"rope_parameters": {"mrope_interleaved": true, "mrope_section": [11, 11, 10], "rope_type": "yarn", "rope_theta": 10000000, "partial_rotary_factor": 0.25, "factor": 4.0, "original_max_position_embeddings": 262144}}}' --context-length 1010000
    

[!NOTE] All the notable open-source frameworks implement static YaRN, which means the scaling factor remains constant regardless of input length, potentially impacting performance on shorter texts. We advise modifying the rope_parameters configuration only when processing long contexts is required. It is also recommended to modify the factor as needed. For example, if the typical context length for your application is 524,288 tokens, it would be better to set factor as 2.0.

Best Practices

To achieve optimal performance, we recommend the following settings:

  1. Sampling Parameters:

    • We suggest using the following sets of sampling parameters depending on the mode and task type:
      • Thinking mode for general tasks:
        temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0
      • Thinking mode for precise coding tasks (e.g., WebDev):
        temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0
      • Instruct (or non-thinking) mode for general tasks:
        temperature=0.7, top_p=0.8, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0
      • Instruct (or non-thinking) mode for reasoning tasks:
        temperature=1.0, top_p=1.0, top_k=40, min_p=0.0, presence_penalty=2.0, repetition_penalty=1.0
    • For supported frameworks, you can adjust the presence_penalty parameter between 0 and 2 to reduce endless repetitions. However, using a higher value may occasionally result in language mixing and a slight decrease in model performance.
  2. Adequate Output Length: We recommend using an output length of 32,768 tokens for most queries. For benchmarking on highly complex problems, such as those found in math and programming competitions, we suggest setting the max output length to 81,920 tokens. This provides the model with sufficient space to generate detailed and comprehensive responses, thereby enhancing its overall performance.

  3. Standardize Output Format: We recommend using prompts to standardize model outputs when benchmarking.

    • Math Problems: Include "Please reason step by step, and put your final answer within \boxed{}." in the prompt.
    • Multiple-Choice Questions: Add the following JSON structure to the prompt to standardize responses: "Please show your choice in the answer field with only the choice letter, e.g., "answer": "C"."
  4. No Thinking Content in History: In multi-turn conversations, the historical model output should only include the final output part and does not need to include the thinking content. It is implemented in the provided chat template in Jinja2. However, for frameworks that do not directly use the Jinja2 chat template, it is up to the developers to ensure that the best practice is followed.

  5. Long Video Understanding: To optimize inference efficiency for plain text and images, the size parameter in the released video_preprocessor_config.json is conservatively configured. It is recommended to set the longest_edge parameter in the video_preprocessor_config file to 469,762,048 (corresponding to 224k video tokens) to enable higher frame-rate sampling for hour-scale videos and thereby achieve superior performance. For example,

    {"longest_edge": 469762048, "shortest_edge": 4096}
    

    Alternatively, override the default values via engine startup parameters. For implementation details, refer to: vLLM / SGLang.

Citation

If you find our work helpful, feel free to give us a cite.

@misc{qwen3.5,
    title  = {{Qwen3.5}: Towards Native Multimodal Agents},
    author = {{Qwen Team}},
    month  = {February},
    year   = {2026},
    url    = {https://qwen.ai/blog?id=qwen3.5}
}

Author: DavidAU

Likes: 4

Downloads: 2

Tags: transformers, safetensors, qwen3_5, image-text-to-text, fine tune, creative, creative writing, fiction writing, plot generation, sub-plot generation, story generation, scene continue, storytelling, fiction story, science fiction, romance, all genres, story, writing, vivid prosing, vivid writing, fiction, roleplaying, bfloat16, all use cases, unsloth, heretic, uncensored, abliterated, conversational, en, zh, base_model:trohrbaugh/Qwen3.5-9B-heretic-v2, base_model:finetune:trohrbaugh/Qwen3.5-9B-heretic-v2, license:apache-2.0, endpoints_compatible, region:us

ostris/sketch_to_image_klein_4b


tags:

  • text-to-image
  • lora
  • diffusers
  • template:diffusion-lora widget:
  • output: url: images/train_a_sketch_control_net.jpg text: '-' base_model: black-forest-labs/FLUX.2-klein-base-4B instance_prompt: null license: apache-2.0

Sketch to Image - Klein 4b

<Gallery />

Model description

This is a modern controlnet LoRA that contains two LoRAs, an image to sketch `sketch_generator_klein_4b.safetensors` and a sketch to image `sketch_to_image_klein_4b.safetensors`. They can be used separately or in a chain to function as a control generator and controlnet. These LoRAs were trained while filming a tutorial How to Train a ControlNet in AI Toolkit. Check out that video for more info.

ImageSting-Recovered

Download model

Download them in the Files & versions tab.

Author: ostris

Likes: 3

Downloads: 0

Tags: diffusers, text-to-image, lora, template:diffusion-lora, base_model:black-forest-labs/FLUX.2-klein-base-4B, base_model:adapter:black-forest-labs/FLUX.2-klein-base-4B, license:apache-2.0, region:us