Todays AI Summary

AI Developments: Enhanced Video Understanding, Noise Recalibration, and More

Today's AI landscape features advancements in video understanding, image generation, and generative modeling, alongside tools for safer and more reliable AI systems.

Research Highlights

Several research papers introduce novel approaches to improve AI models:

  • Video Understanding: The "VideoNSA" paper introduces a method to adapt Native Sparse Attention (NSA) to video-language models, specifically Qwen2.5-VL. This approach enhances long-video understanding, temporal reasoning, and spatial benchmarks by employing a hardware-aware hybrid attention mechanism.
  • Image Generation: "NoiseShift" addresses the challenge of generating high-quality low-resolution images with diffusion models. The paper identifies that noise schedulers have unequal perceptual effects across resolutions and proposes a training-free method to recalibrate noise levels, significantly improving low-resolution image quality in models like Stable Diffusion 3 and 3.5.
  • Generative Modeling: "Equilibrium Matching (EqM)" presents a generative modeling framework that learns the equilibrium gradient of an implicit energy landscape, surpassing the generation performance of traditional diffusion and flow models.
  • Interactive Training: The "Interactive Training" paper introduces an open-source framework that enables real-time, feedback-driven intervention during neural network training, allowing dynamic adjustments to optimizer hyperparameters, training data, and model checkpoints.
  • Uncertainty Estimation: "Addressing Pitfalls in the Evaluation of Uncertainty Estimation Methods for Natural Language Generation" addresses the issue of hallucinations in LLMs and proposes alternative risk indicators for risk correlation experiments to improve the robustness of empirical assessment of UE algorithms for NLG.

Model Releases

Several new models have been released, focusing on vision-language capabilities, uncensored content generation, and specialized applications:

  • Qwen3-VL Models: yairpatch has released GGUF versions of Qwen3-VL-30B-A3B-Thinking and Qwen3-VL-30B-A3B-Instruct. These models are part of the Qwen3-VL series, known for their advanced vision-language capabilities, including visual agent functionalities, visual coding boosts, and enhanced multimodal reasoning.
  • Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated: huihui-ai has released an uncensored version of Qwen3-VL-30B-A3B-Instruct, created using abliteration techniques to remove refusal responses.
  • UIGEN-FX-Agentic-32B: Tesslate has released UIGEN-FX-Agentic-32B, a model focused on frontend design and agentic capabilities, fine-tuned from Qwen3-32B.
  • Looking-Glass-Alice-Thinking-NSFW-RP-GGUF: D1rtyB1rd has released Looking-Glass-Alice-Thinking-NSFW-RP-GGUF, a model tuned for immersive role-play, logical reasoning, philosophical depth, and sensual storytelling.
  • BioGenesis-ToT: khazarai has released BioGenesis-ToT, a fine-tuned version of Qwen3-1.7B optimized for mechanistic reasoning and explanatory understanding in biology.
  • PurrBERT-v1: purrgpt-community has released PurrBERT-v1, a lightweight content-safety classifier built on top of DistilBERT, designed to flag harmful or unsafe user prompts.

Key Takeaways

  • Video and Vision-Language Models Advance: Significant progress is being made in enhancing video understanding and vision-language models, enabling more complex and coherent multimodal interactions.
  • Noise Recalibration Improves Image Generation: Addressing noise imbalances in diffusion models can lead to substantial improvements in low-resolution image generation without retraining.
  • Interactive Training Offers Greater Control: Frameworks like Interactive Training provide real-time control over neural network training, improving stability and adaptability.
  • Model Safety and Specialization: New models are emerging with a focus on content safety and specialized applications, such as biology and frontend design.

AI Papers for 2026-03-15

The Latent Color Subspace: Emergent Order in High-Dimensional Chaos

Text-to-image generation models have advanced rapidly, yet achieving fine-grained control over generated images remains difficult, largely due to limited understanding of how semantic information is encoded. We develop an interpretation of the color representation in the Variational Autoencoder latent space of FLUX.1 [Dev], revealing a structure reflecting Hue, Saturation, and Lightness. We verify our Latent Color Subspace (LCS) interpretation by demonstrating that it can both predict and explicitly control color, introducing a fully training-free method in FLUX based solely on closed-form latent-space manipulation. Code is available at https://github.com/ExplainableML/LCS.

SciMDR: Benchmarking and Advancing Scientific Multimodal Document Reasoning

Constructing scientific multimodal document reasoning datasets for foundation model training involves an inherent trade-off among scale, faithfulness, and realism. To address this challenge, we introduce the synthesize-and-reground framework, a two-stage pipeline comprising: (1) Claim-Centric QA Synthesis, which generates faithful, isolated QA pairs and reasoning on focused segments, and (2) Document-Scale Regrounding, which programmatically re-embeds these pairs into full-document tasks to ensure realistic complexity. Using this framework, we construct SciMDR, a large-scale training dataset for cross-modal comprehension, comprising 300K QA pairs with explicit reasoning chains across 20K scientific papers. We further construct SciMDR-Eval, an expert-annotated benchmark to evaluate multimodal comprehension within full-length scientific workflows. Experiments demonstrate that models fine-tuned on SciMDR achieve significant improvements across multiple scientific QA benchmarks, particularly in those tasks requiring complex document-level reasoning.

Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training

Reasoning LLMs-as-Judges, which can benefit from inference-time scaling, provide a promising path for extending the success of reasoning models to non-verifiable domains where the output correctness/quality cannot be directly checked. However, while reasoning judges have shown better performance on static evaluation benchmarks, their effectiveness in actual policy training has not been systematically examined. Therefore, we conduct a rigorous study to investigate the actual impact of non-reasoning and reasoning judges in reinforcement-learning-based LLM alignment. Our controlled synthetic setting, where a "gold-standard" judge (gpt-oss-120b) provides preference annotations to train smaller judges, reveals key differences between non-reasoning and reasoning judges: non-reasoning judges lead to reward hacking easily, while reasoning judges can lead to policies that achieve strong performance when evaluated by the gold-standard judge. Interestingly, we find that the reasoning-judge-trained policies achieve such strong performance by learning to generate highly effective adversarial outputs that can also score well on popular benchmarks such as Arena-Hard by deceiving other LLM-judges. Combined with our further analysis, our study highlights both important findings and room for improvements for applying (reasoning) LLM-judges in non-verifiable LLM post-training.

Separable neural architectures as a primitive for unified predictive and generative intelligence

Intelligent systems across physics, language and perception often exhibit factorisable structure, yet are typically modelled by monolithic neural architectures that do not explicitly exploit this structure. The separable neural architecture (SNA) addresses this by formalising a representational class that unifies additive, quadratic and tensor-decomposed neural models. By constraining interaction order and tensor rank, SNAs impose a structural inductive bias that factorises high-dimensional mappings into low-arity components. Separability need not be a property of the system itself: it often emerges in the coordinates or representations through which the system is expressed. Crucially, this coordinate-aware formulation reveals a structural analogy between chaotic spatiotemporal dynamics and linguistic autoregression. By treating continuous physical states as smooth, separable embeddings, SNAs enable distributional modelling of chaotic systems. This approach mitigates the nonphysical drift characteristics of deterministic operators whilst remaining applicable to discrete sequences. The compositional versatility of this approach is demonstrated across four domains: autonomous waypoint navigation via reinforcement learning, inverse generation of multifunctional microstructures, distributional modelling of turbulent flow and neural language modelling. These results establish the separable neural architecture as a domain-agnostic primitive for predictive and generative intelligence, capable of unifying both deterministic and distributional representations.

Incremental Neural Network Verification via Learned Conflicts

Neural network verification is often used as a core component within larger analysis procedures, which generate sequences of closely related verification queries over the same network. In existing neural network verifiers, each query is typically solved independently, and information learned during previous runs is discarded, leading to repeated exploration of the same infeasible regions of the search space. In this work, we aim to expedite verification by reducing this redundancy. We propose an incremental verification technique that reuses learned conflicts across related verification queries. The technique can be added on top of any branch-and-bound-based neural network verifier. During verification, the verifier records conflicts corresponding to learned infeasible combinations of activation phases, and retains them across runs. We formalize a refinement relation between verification queries and show that conflicts learned for a query remain valid under refinement, enabling sound conflict inheritance. Inherited conflicts are handled using a SAT solver to perform consistency checks and propagation, allowing infeasible subproblems to be detected and pruned early during search. We implement the proposed technique in the Marabou verifier and evaluate it on three verification tasks: local robustness radius determination, verification with input splitting, and minimal sufficient feature set extraction. Our experiments show that incremental conflict reuse reduces verification effort and yields speedups of up to $1.9\times$ over a non-incremental baseline.

Security Considerations for Artificial Intelligence Agents

This article, a lightly adapted version of Perplexity's response to NIST/CAISI Request for Information 2025-0035, details our observations and recommendations concerning the security of frontier AI agents. These insights are informed by Perplexity's experience operating general-purpose agentic systems used by millions of users and thousands of enterprises in both controlled and open-world environments. Agent architectures change core assumptions around code-data separation, authority boundaries, and execution predictability, creating new confidentiality, integrity, and availability failure modes. We map principal attack surfaces across tools, connectors, hosting boundaries, and multi-agent coordination, with particular emphasis on indirect prompt injection, confused-deputy behavior, and cascading failures in long-running workflows. We then assess current defenses as a layered stack: input-level and model-level mitigations, sandboxed execution, and deterministic policy enforcement for high-consequence actions. Finally, we identify standards and research gaps, including adaptive security benchmarks, policy models for delegation and privilege control, and guidance for secure multi-agent system design aligned with NIST risk management principles.

Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights

Pretraining produces a learned parameter vector that is typically treated as a starting point for further iterative adaptation. In this work, we instead view the outcome of pretraining as a distribution over parameter vectors, whose support already contains task-specific experts. We show that in small models such expert solutions occupy a negligible fraction of the volume of this distribution, making their discovery reliant on structured optimization methods such as gradient descent. In contrast, in large, well-pretrained models the density of task-experts increases dramatically, so that diverse, task-improving specialists populate a substantial fraction of the neighborhood around the pretrained weights. Motivated by this perspective, we explore a simple, fully parallel post-training method that samples $N$ parameter perturbations at random, selects the top $K$, and ensembles predictions via majority vote. Despite its simplicity, this approach is competitive with standard post-training methods such as PPO, GRPO, and ES for contemporary large-scale models.

Sparking Scientific Creativity via LLM-Driven Interdisciplinary Inspiration

Despite interdisciplinary research leading to larger and longer-term impact, most work remains confined to single-domain academic silos. Recent AI-based approaches to scientific discovery show promise for interdisciplinary research, but many prioritize rapidly designing experiments and solutions, bypassing the exploratory, collaborative reasoning processes that drive creative interdisciplinary breakthroughs. As a result, prior efforts largely prioritize automating scientific discovery rather than augmenting the reasoning processes that underlie scientific disruption. We present Idea-Catalyst, a novel framework that systematically identifies interdisciplinary insights to support creative reasoning in both humans and large language models. Starting from an abstract research goal, Idea-Catalyst is designed to assist the brainstorming stage, explicitly avoiding premature anchoring on specific solutions. The framework embodies key metacognitive features of interdisciplinary reasoning: (a) defining and assessing research goals, (b) awareness of a domain's opportunities and unresolved challenges, and (c) strategic exploration of interdisciplinary ideas based on impact potential. Concretely, Idea-Catalyst decomposes an abstract goal (e.g., improving human-AI collaboration) into core target-domain research questions that guide the analysis of progress and open challenges within that domain. These challenges are reformulated as domain-agnostic conceptual problems, enabling retrieval from external disciplines (e.g., Psychology, Sociology) that address analogous issues. By synthesizing and recontextualizing insights from these domains back into the target domain, Idea-Catalyst ranks source domains by their interdisciplinary potential. Empirically, this targeted integration improves average novelty by 21% and insightfulness by 16%, while remaining grounded in the original research problem.

Portfolio of Solving Strategies in CEGAR-based Object Packing and Scheduling for Sequential 3D Printing

Computing power that used to be available only in supercomputers decades ago especially their parallelism is currently available in standard personal computer CPUs even in CPUs for mobile telephones. We show how to effectively utilize the computing power of modern multi-core personal computer CPU to solve the complex combinatorial problem of object arrangement and scheduling for sequential 3D printing. We achieved this by parallelizing the existing CEGAR-SEQ algorithm that solves the sequential object arrangement and scheduling by expressing it as a linear arithmetic formula which is then solved by a technique inspired by counterexample guided abstraction refinement (CEGAR). The original CEGAR-SEQ algorithm uses an object arrangement strategy that places objects towards the center of the printing plate. We propose alternative object arrangement strategies such as placing objects towards a corner of the printing plate and scheduling objects according to their height. Our parallelization is done at the high-level where we execute the CEGAR-SEQ algorithm in parallel with a portfolio of object arrangement strategies, an algorithm is called Porfolio-CEGAR-SEQ. Our experimental evaluation indicates that Porfolio-CEGAR-SEQ outperforms the original CEGAR-SEQ. When a batch of objects for multiple printing plates is scheduled, Portfolio-CEGAR-SEQ often uses fewer printing plates than CEGAR-SEQ.

RDNet: Region Proportion-Aware Dynamic Adaptive Salient Object Detection Network in Optical Remote Sensing Images

Salient object detection (SOD) in remote sensing images faces significant challenges due to large variations in object sizes, the computational cost of self-attention mechanisms, and the limitations of CNN-based extractors in capturing global context and long-range dependencies. Existing methods that rely on fixed convolution kernels often struggle to adapt to diverse object scales, leading to detail loss or irrelevant feature aggregation. To address these issues, this work aims to enhance robustness to scale variations and achieve precise object localization. We propose the Region Proportion-Aware Dynamic Adaptive Salient Object Detection Network (RDNet), which replaces the CNN backbone with the SwinTransformer for global context modeling and introduces three key modules: (1) the Dynamic Adaptive Detail-aware (DAD) module, which applies varied convolution kernels guided by object region proportions; (2) the Frequency-matching Context Enhancement (FCE) module, which enriches contextual information through wavelet interactions and attention; and (3) the Region Proportion-aware Localization (RPL) module, which employs cross-attention to highlight semantic details and integrates a Proportion Guidance (PG) block to assist the DAD module. By combining these modules, RDNet achieves robustness against scale variations and accurate localization, delivering superior detection performance compared with state-of-the-art methods.

AI Models

dx8152/Qwen-Image-Edit-2511-Style-Transfer


license: apache-2.0 base_model:

  • Qwen/Qwen-Image-Edit-2511 pipeline_tag: image-text-to-image tags:
  • lora

This model is trained (code-free!) on ModelScope. Thanks to ModelScope team for providing the training infra: https://www.modelscope.ai/civision/modelTraining/


The trigger word is "style transfer," and the prompt word is "Change the style of Figure 1 to the style of Figure 2." Tutorial links: https://youtu.be/4Z8097iDX1k https://www.bilibili.com/video/BV1HmwWzxEcM/

Welcome to join our Discord channel for discussion, or contact me to collaborate on custom LoRa projects: https://discord.gg/yVAVa43mWk

Download and online run address: https://www.modelscope.ai/models/daniel8152/style-transfer-1

Online run: www.runninghub.ai/post/2031922726943854593?inviteCode=rh-v1331


a1

a2

a3

a4

Author: dx8152

Likes: 12

Downloads: 0

Tags: lora, image-text-to-image, base_model:Qwen/Qwen-Image-Edit-2511, base_model:adapter:Qwen/Qwen-Image-Edit-2511, license:apache-2.0, region:us

huihui-ai/Huihui-Qwen3.5-27B-Claude-4.6-Opus-abliterated


library_name: transformers license: apache-2.0 license_link: https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled/blob/main/LICENSE pipeline_tag: image-text-to-text base_model:

  • Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled tags:
  • abliterated
  • uncensored
  • Claude
  • reasoning
  • chain-of-thought
  • Dense

huihui-ai/Huihui-Qwen3.5-27B-Claude-4.6-Opus-abliterated

This is an uncensored version of Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled created with abliteration (see remove-refusals-with-transformers to know more about it). This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.

ollama

Please use the latest version of ollama v0.17.7

You can use huihui_ai/qwen3.5-abliterated:27b-Claude directly,

ollama run huihui_ai/qwen3.5-abliterated:27b-Claude

Usage Warnings

  • Risk of Sensitive or Controversial Outputs: This model’s safety filtering has been significantly reduced, potentially generating sensitive, controversial, or inappropriate content. Users should exercise caution and rigorously review generated outputs.

  • Not Suitable for All Audiences: Due to limited content filtering, the model’s outputs may be inappropriate for public settings, underage users, or applications requiring high security.

  • Legal and Ethical Responsibilities: Users must ensure their usage complies with local laws and ethical standards. Generated content may carry legal or ethical risks, and users are solely responsible for any consequences.

  • Research and Experimental Use: It is recommended to use this model for research, testing, or controlled environments, avoiding direct use in production or public-facing commercial applications.

  • Monitoring and Review Recommendations: Users are strongly advised to monitor model outputs in real-time and conduct manual reviews when necessary to prevent the dissemination of inappropriate content.

  • No Default Safety Guarantees: Unlike standard models, this model has not undergone rigorous safety optimization. huihui.ai bears no responsibility for any consequences arising from its use.

Donation

Your donation helps us continue our further development and improvement, a cup of coffee can do it.
  • bitcoin:
  bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge
  • Support our work on Ko-fi!

Author: huihui-ai

Likes: 10

Downloads: 0

Tags: transformers, safetensors, qwen3_5, image-text-to-text, abliterated, uncensored, Claude, reasoning, chain-of-thought, Dense, conversational, base_model:Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled, base_model:finetune:Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled, license:apache-2.0, endpoints_compatible, region:us

z-lab/Qwen3.5-27B-PARO


library_name: transformers license: apache-2.0 pipeline_tag: image-text-to-text base_model:

  • Qwen/Qwen3.5-27B

z-lab/Qwen3.5-27B-PARO

Pairwise Rotation Quantization for Efficient Reasoning LLM Inference

<p> <a href="https://arxiv.org/abs/2511.10645"><img src="https://img.shields.io/badge/arXiv-2511.10645-b31b1b.svg" alt="Paper"></a> <a href="https://paroquant.z-lab.ai"><img src="https://img.shields.io/badge/Blog-ParoQuant-blue" alt="Blog"></a> <a href="https://huggingface.co/collections/z-lab/paroquant"><img src="https://img.shields.io/badge/%F0%9F%A4%97-Models-yellow" alt="Models"></a> <a href="https://pypi.org/project/paroquant/"><img src="https://img.shields.io/pypi/v/paroquant" alt="PyPI"></a> </p>

ParoQuant is the state-of-the-art INT4 quantization for LLMs. It closes the accuracy gap with FP16 while running at near-AWQ speed. Supports NVIDIA GPUs (vLLM, Transformers) and Apple Silicon (MLX). For more information, see https://github.com/z-lab/paroquant.

z-lab/Qwen3.5-27B-PARO is a 4-bit Qwen/Qwen3.5-27B quantized with ParoQuant. Check out other ParoQuant models from the Hugging Face collection.

Quick Start

Installation

# NVIDIA GPU (CUDA 12.9)
pip install "paroquant[vllm]"

# NVIDIA GPU (CUDA 13.0)
pip install "paroquant[vllm] vllm==0.17.1" \
  --extra-index-url https://wheels.vllm.ai/0.17.1/cu130 \
  --extra-index-url https://download.pytorch.org/whl/cu130

# Apple Silicon
pip install "paroquant[mlx]"

Interactive Chat

python -m paroquant.cli.chat --model z-lab/Qwen3.5-27B-PARO

OpenAI-Compatible API Server

python -m paroquant.cli.serve --model z-lab/Qwen3.5-27B-PARO --port 8000

Add --llm-only if you do not wish to load the VLM components.

Agent with Tool Calling

Start the API server first, then install the agent dependencies and run:

pip install "paroquant[agent]"
python -m paroquant.cli.agent --model z-lab/Qwen3.5-27B-PARO

Tool use (web fetch, filesystem, time) requires Node.js.

Docker (NVIDIA GPU)

[!NOTE] The following commands map the local cache directory to the container in order to persist kernel cache across runs. Remove -v ... to disable this behaviour.

# Interactive chat
docker run --pull=always --rm -it --gpus all --ipc=host \
  -v $HOME/.cache/paroquant:/root/.cache/paroquant \
  ghcr.io/z-lab/paroquant:chat --model z-lab/Qwen3.5-27B-PARO

# API server (port 8000)
docker run --pull=always --rm -it --gpus all --ipc=host -p 8000:8000 \
  -v $HOME/.cache/paroquant:/root/.cache/paroquant \
  ghcr.io/z-lab/paroquant:serve --model z-lab/Qwen3.5-27B-PARO

Citation

@inproceedings{liang2026paroquant,
  title     = {{ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference}},
  author    = {Liang, Yesheng and Chen, Haisheng and Zhang, Zihan and Han, Song and Liu, Zhijian},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year      = {2026}
}

Author: z-lab

Likes: 5

Downloads: 0

Tags: transformers, safetensors, qwen3_5, image-text-to-text, conversational, arxiv:2511.10645, base_model:Qwen/Qwen3.5-27B, base_model:quantized:Qwen/Qwen3.5-27B, license:apache-2.0, endpoints_compatible, 4-bit, paroquant, region:us

llmfan46/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-heretic-v2


language:

  • en
  • zh license: apache-2.0 base_model:
  • Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled tags:
  • unsloth
  • qwen
  • qwen3.5
  • reasoning
  • chain-of-thought
  • Dense
  • heretic
  • uncensored
  • decensored
  • abliterated
  • ara pipeline_tag: text-generation datasets:
  • nohurry/Opus-4.6-Reasoning-3000x-filtered
  • Jackrong/Qwen3.5-reasoning-700x

⚠️ Important Note

This model scores 0/100 on refusal tests but retains Claude-style deflection on explicit content. Best for general uncensored conversations, coding, and reasoning. Not recommended for explicit NSFW creative writing. For unrestricted NSFW, use my Qwen3.5-27B Heretic v2 or Qwen3.5-27B Heretic v3 instead.

V1 vs V2

| Version | Refusals | KL | Best For | |---------|----------|-----|----------| | V1 | 21/100 | 0.0092 | General use, best quality | | V2 (this) | 0/100 | 0.0635 | Fewer restrictions on controversial topics, minimal quality loss |

⚠️ Thinking Loop Fix Pre-Applied

Credit: DavidAU

However in the event that you experience issues replace the chat_template.jinja in the main folder with the version from the Optional Fixes folder.

Thinking Mode

This is a reasoning model with <think> tokens.

To disable thinking (if your tool supports it):

  • Set enable_thinking=false in your inference settings
  • Or use a no-think instruct template

Note: Some tools (Ollama, vLLM, Transformers) support this parameter directly.

❤️ Support My Work

Creating these models takes significant time, work and compute. If you find them useful consider supporting me:

| Platform | Link | What you get | |----------|------|--------------| | ☕ Ko-fi | One-time tip | My eternal gratitude | | 🎉 Patreon | Monthly support | Priority model requests |

Your help will motivate me and would go into further improving my workflow and coverings fees for storage, compute and may even help uncensoring bigger model with rental Cloud GPUs.


This is a decensored version of Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled, made using Heretic v1.2.0 with the Arbitrary-Rank Ablation (ARA) method

Abliteration parameters

| Parameter | Value | | :-------- | :---: | | start_layer_index | 21 | | end_layer_index | 43 | | preserve_good_behavior_weight | 0.4720 | | steer_bad_behavior_weight | 0.0001 | | overcorrect_relative_weight | 1.2955 | | neighbor_count | 1 |

Performance

| Metric | This model | Original model (Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled) | | :----- | :--------: | :---------------------------: | | KL divergence | 0.0635 | 0 (by definition) | | Refusals | 0/100 | 98/100 |

Lower refusals indicate fewer content restrictions, while lower KL divergence indicates better preservation of the original model's capabilities. Higher refusals cause more rejections, objections, pushbacks, lecturing, censorship, softening and deflections, while higher KL divergence degrades coherence, reasoning ability, and overall quality.

GGUF Version

GGUF quantizations available here llmfan46/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-heretic-v2-GGUF.


🌟 Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled

📢 Release Note Build Environment Upgrades:

  • Fine-tuning Framework: Unsloth 2026.3.3
  • Core Dependencies: Transformers 5.2.0
  • This model fixes the crash in the official model caused by the Jinja template not supporting the "developer" role. (commonly sent by modern coding agents like Claude Code and OpenCode)
  • It does not disable thinking mode by default, and allowing the agent to run continuously for over 9 minutes without interruption.
  • Compared to the original model, autonomy and stability are significantly improved.

HB8AleUaMAArNyM

💡 Model Introduction

Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled is a highly capable reasoning model fine-tuned on top of the powerful Qwen3.5 architecture. The model's core directive is to leverage state-of-the-art Chain-of-Thought (CoT) distillation primarily sourced from Claude-4.6 Opus interactions.

Through Supervised Fine-Tuning (SFT) focusing specifically on structured reasoning logic, this model excels in breaking down complex user problems, planning step-by-step methodologies within strictly formatted <think> tags, and ultimately delivering precise, nuanced solutions.

🧠 Example of Learned Reasoning Scaffold(Example)

The model includes targeted optimizations addressing Qwen3.5’s tendency toward excessive transitional or repetitive reasoning on simple queries. Through deep distillation and structural imitation of Claude-4.6-Opus reasoning chains, the model adopts a more efficient structured thinking pattern:
“Let me analyze this request carefully: 1..2..3...”.
This streamlined reasoning paradigm significantly reduces redundant cognitive loops while preserving deep analytical capacity, resulting in substantially improved inference efficiency.

Let me analyze this request carefully:

1. Identify the core objective of the problem.
2. Break the task into clearly defined subcomponents.
3. Evaluate constraints and edge cases.
4. Formulate a step-by-step solution plan.
5. Execute the reasoning sequentially and verify consistency.
            .
            .
            .

🗺️ Training Pipeline Overview

Base Model (Qwen3.5-27B)
 │
 ▼
Supervised Fine-Tuning (SFT) + LoRA
 │
 ▼
Final Model (Claude-4.6-Opus-Reasoning-Distilled,text-only)

📋 Stage Details

🔥Community-tested advantages (benchmark tests by user @sudoingX on a single RTX 3090):

Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled shows significant advantages in coding-agent environments such as Claude Code and OpenCode:

  • Native support for the “developer” role, requiring no Jinja template patches or ChatML workarounds.
  • Thinking mode fully preserved (logs confirm thinking=1), not silently disabled, maintaining the complete chain-of-thought reasoning process.
  • Greatly improved autonomy and stability — capable of running continuously for over 9 minutes autonomously (with zero human intervention). It actively waits for tool responses, reads outputs, self-corrects errors, and can even automatically generate a README, whereas the base model often stalls or freezes mid-execution.

Hardware usage remains unchanged:

  • About 16.5 GB VRAM with Q4_K_M quantization
  • 29–35 tok/s generation speed
  • Full 262K context with no compromises
  • These improvements come from successfully distilling the structured reasoning style of Claude 4.6 Opus, allowing Qwopus to be truly plug-and-play in modern local coding agents and deliver an experience close to Opus in smoothness and usability.

Thanks to the community for the in-depth testing and feedback!

🔹 Supervised Fine-Tuning (SFT)

  • Objective: To inject high-density reasoning logic and establish a strict format for problem-solving involving an internal thinking state prior to outputting the final response.
  • Methodology: We utilized Unsloth for highly efficient memory and compute optimization. A critical component of this stage is the train_on_responses_only strategy, masking instructions so the loss is purely calculated over the generation of the <think> sequences and the subsequent solutions.
  • Format Enforcement: All training samples were systematically normalized so the model strictly abides by the structure <think> {internal reasoning} </think>\n {final answer}.

📚 All Datasets Used

The dataset consists of high-quality, filtered reasoning distillation data:

| Dataset Name | Description / Purpose | |--------------|-----------------------| | nohurry/Opus-4.6-Reasoning-3000x-filtered | Provides comprehensive Claude 4.6 Opus reasoning trajectories. | | TeichAI/claude-4.5-opus-high-reasoning-250x | Injecting high-intensity, structured reasoning instances. | | Jackrong/Qwen3.5-reasoning-700x | Additional curated reasoning samples designed to strengthen structured step-by-step problem solving and improve reasoning diversity. |

🌟 Core Skills & Capabilities

  1. Modular & Structured Thinking: Inheriting traits from Opus-level reasoning, the model demonstrates confident parsing of the prompt, establishing an outlined plan in its <think> block sequentially rather than exploratory "trial-and-error" self-doubt.

⚠️ Limitations & Intended Use

  • Hallucination Risk: While reasoning is strong, the model remains an autoregressive LLM; external facts provided during the thinking sequence may occasionally contain hallucinations if verifying real-world events.
  • Intended Scenario: Best suited for offline analytical tasks, coding, math, and heavy logic-dependent prompting where the user needs to transparently follow the AI's internal logic.
  • Preview Version Notice: Because this model is relatively new and intentionally lightweight, the surrounding ecosystem — including inference templates, fine-tuning pipelines, routing configurations, and tooling integrations — may not yet be fully mature or standardized. As a result, users may encounter occasional bugs, compatibility inconsistencies, or integration edge cases. The current release should be considered a preview build while the broader architectural stack and supporting utilities continue to stabilize and improve.

🙏 Acknowledgements

Significant thanks to the Unsloth AI team for making rapid fine-tuning of MoE and large LLM models accessible. Additionally, we acknowledge Qwen internally, and the open-source community developers producing exceptional distilled datasets (nohurry and TeichAI).

📖 Citation

If you use this model in your research or projects, please cite:

@misc{jackrong_qwen35_opus_distilled,
  title        = {Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled},
  author       = {Jackrong},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled}}
}

Author: llmfan46

Likes: 3

Downloads: 0

Tags: safetensors, qwen3_5, unsloth, qwen, qwen3.5, reasoning, chain-of-thought, Dense, heretic, uncensored, decensored, abliterated, ara, text-generation, conversational, en, zh, dataset:nohurry/Opus-4.6-Reasoning-3000x-filtered, dataset:Jackrong/Qwen3.5-reasoning-700x, base_model:Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled, base_model:finetune:Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled, license:apache-2.0, region:us

neurlang/en-whipstr-base-48khz-libritts-r


license: gpl-2.0 language:

  • en pipeline_tag: automatic-speech-recognition tags:
  • whipstr
  • stt
  • asr

Neurlang Whipstr STT (ASR)

A deep learning automatic speech recognition (ASR) system for transcribing speech audio into text using transformer-based sequence-to-sequence models.

  • Language: English
  • Model Github: neurlang/whipstr https://github.com/neurlang/whipstr
  • Model Dataset: LibriTTS-R https://www.openslr.org/141/
  • Model-Native Sample Rates: 8000 Hz, 16000 Hz, 24000 Hz, 32000 Hz, 48000 Hz
  • Degraded-Performance Sample Rates: 11025 Hz, 22050 Hz, 44100 Hz
  • License: GPL v2
  • Release: 2026-03-15

Author: neurlang

Likes: 2

Downloads: 0

Tags: whipstr, stt, asr, automatic-speech-recognition, en, license:gpl-2.0, region:us

0xvoid0000/zira-researcher


license: apache-2.0 base_model: Qwen/Qwen3.5-4B tags:

  • qwen3_5
  • research
  • reasoning
  • fine-tuned
  • zira
  • 0xvoid
  • self-correction
  • long-context
  • trl language:
  • en pipeline_tag: text-generation

ZiRA-Researcher

<p align="center"> <img src="assets/zira_logo.png" alt="ZiRA-Researcher Logo" width="380"/> </p>

ZiRA-Researcher is a fine-tuned version of Qwen3.5-4B, developed under the 0xvoid project. It's built specifically for deep research tasks, multi-step reasoning, and complex question answering, with a particular emphasis on catching and correcting its own mistakes mid-generation.

If you've ever used a model that confidently states something wrong and just... keeps going, that's exactly what ZiRA-Researcher is trained not to do.


What's Different Here

The base Qwen3.5-4B is already a strong reasoning model. ZiRA-Researcher takes that foundation and sharpens it toward a specific use case: research-grade responses where accuracy matters more than speed and self-doubt is a feature, not a bug.

Three things define this fine-tune:

1. Error self-correction
ZiRA doesn't just think before it answers, it actively revisits its own reasoning chain. During training, the model was exposed to examples where mid-chain corrections were necessary and rewarded. In practice, you'll see it catch faulty assumptions and revise them before committing to a final answer, rather than rationalizing bad premises all the way to a wrong conclusion.

2. Research-oriented instruction following
The model is tuned on datasets from state-of-the-art frontier models, responses that demonstrate what good research synthesis actually looks like. Structured arguments, source-aware hedging, citing uncertainty where it exists, and building conclusions incrementally rather than pattern-matching to the nearest plausible answer.

3. Long-horizon coherence
Complex research questions often require holding a lot of context at once. The Qwen3.5 architecture natively supports up to 262K tokens, and ZiRA-Researcher is fine-tuned to actually use that window productively, staying coherent and consistent across long reasoning chains without drifting.


Model Details

| Property | Value | |---|---| | Base Model | Qwen/Qwen3.5-4B | | Parameters | ~4B | | Architecture | Gated Delta Network + Sparse MoE hybrid | | Context Length | 262,144 tokens (native) | | Training Method | Supervised Fine-Tuning (SFT) via TRL | | Thinking Mode | Enabled by default (<think>...</think>) | | Developer | 0xvoid |


Training Metrics

<p align="center"> <img src="assets/train_loss.png" alt="Training Loss" width="64%"/><br/> <em>Training Loss</em> </p> <p align="center"> <img src="assets/mean_token_accuracy.png" alt="Mean Token Accuracy" width="64%"/><br/> <em>Mean Token Accuracy</em> </p>

The hybrid architecture Qwen3.5 uses — Gated Delta Networks layered with sparse Mixture-of-Experts — gives this model a surprisingly good throughput-to-quality ratio for its size. It punches above 4B in most reasoning benchmarks, which makes it a practical choice if you're running inference locally or on a budget.


Training Data

ZiRA-Researcher was trained on curated, high-quality datasets sourced from state-of-the-art model outputs, specifically selected to reflect:

  • Deep research synthesis and academic-style reasoning
  • Multi-step logical deduction with explicit intermediate steps
  • Complex Q&A pairs that require cross-referencing multiple sub-claims
  • Instances of error detection and self-correction within the chain-of-thought

The goal was to teach the model what good thinking looks like, using examples generated by frontier models as the standard.


Quickstart

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "0xvoid0000/zira-researcher"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto"
)

messages = [
    {
        "role": "user",
        "content": "What are the key limitations of transformer-based architectures for long-horizon reasoning tasks, and how do recent hybrid approaches attempt to address them?"
    }
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=4096,
    temperature=1.0,
    top_p=0.95,
    top_k=20,
    do_sample=True,
    presence_penalty=1.5,
)

print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))

By default, the model will produce a <think>...</think> block before the final response. That's intentional — it's where the self-correction happens. If you want direct output without the reasoning trace, you can disable it:

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False  # skips the <think> block
)

Recommended Sampling Parameters

These are the settings that tend to work well for research-style queries:

| Mode | temperature | top_p | top_k | presence_penalty | |---|---|---|---|---| | Thinking (general research) | 1.0 | 0.95 | 20 | 1.5 | | Thinking (precise/technical) | 0.6 | 0.95 | 20 | 0.0 | | Direct (no thinking) | 0.7 | 0.8 | 20 | 1.5 |

For particularly involved questions, think graduate-level exam problems, multi-document synthesis, or long chains of logical deduction, giving the model room to breathe helps. Set max_new_tokens to at least 8192, and don't be surprised if it uses most of it.


What It's Good At

  • Research synthesis — combining information from multiple sub-questions into a coherent, well-structured answer
  • Hypothesis-driven reasoning — forming a claim, stress-testing it, and revising if the logic doesn't hold
  • Error-aware generation — catching faulty premises or arithmetic mistakes within the thinking chain before they propagate
  • Long-context tasks — sustained coherence across documents, conversation history, or multi-stage problems
  • Technical deep dives — STEM, CS theory, economics, philosophy of science, and adjacent domains

What It's Not

ZiRA-Researcher is not a general-purpose chat assistant. It's tuned for deliberate, thoughtful responses to complex questions, if you're looking for something snappy and conversational, this isn't it. The thinking traces can get long. That's by design.

It also doesn't have real-time web access or retrieval built in. For RAG setups, treat it as the reasoning engine and pipe the retrieved context into the prompt.


Limitations

Like any fine-tune at 4B parameters, ZiRA-Researcher has a ceiling. On highly specialized domains with narrow technical vocabulary, it can still confabulate, though the self-correction mechanism does catch a meaningful fraction of these cases. On genuinely ambiguous or underspecified questions, it tends to lay out the uncertainty rather than pick an arbitrary answer, which is usually the right call but can feel unsatisfying if you just want a direct response.

The model inherits Qwen3.5-4B's 201-language support at the architecture level, but ZiRA-Researcher's fine-tuning was primarily English-focused. Non-English research queries will work but may not reflect the same quality improvements.


Acknowledgements

Built on top of Qwen3.5-4B by the Qwen Team at Alibaba. Fine-tuned using TRL. Part of the ZiRA model family developed under the 0xvoid project.


ZiRA-Researcher is part of the ongoing 0xvoid model series. More variants incoming.

Author: 0xvoid0000

Likes: 2

Downloads: 0

Tags: safetensors, qwen3_5_text, qwen3_5, research, reasoning, fine-tuned, zira, 0xvoid, self-correction, long-context, trl, text-generation, conversational, en, base_model:Qwen/Qwen3.5-4B, base_model:finetune:Qwen/Qwen3.5-4B, license:apache-2.0, region:us

boatbomber/NisabaRelief


license: apache-2.0 pipeline_tag: image-to-image base_model:

  • black-forest-labs/FLUX.2-klein-base-4B base_model_relation: finetune datasets:
  • boatbomber/CuneiformPhotosMSII tags:
  • image-to-image
  • cuneiform
  • geometry
  • curvature
  • multi-scale-integral-invariant
  • msii
  • Flux

<div align="center"> <h1 align="center"> NisabaRelief </h1> <img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/NisabaRelief-Logo.png" width="600"/> </div>

NisabaRelief

NisabaRelief is a rectified flow transformer that converts ordinary photographs of cuneiform clay tablets into Multi-Scale Integral Invariant (MSII) curvature visualizations, without requiring 3D scanning hardware. Traditional MSII computation requires a high-resolution 3D scanner and GigaMesh postprocessing, averaging approximately 68 minutes per tablet. NisabaRelief processes a photograph in approximately 7 seconds.

Photographic images introduce a variety of noise sources: lighting direction, clay color, surface sheen, photography conditions, and surface staining. Any of these can cause wedge impressions to appear as shadows or shadows to appear as wedge impressions. MSII filtering discards this photometric variation, retaining only the geometric signal pressed into the clay. See What is MSII? for full technical details.

Built by fine-tuning Flux.2 Klein Base 4B on paired photo/MSII data generated from 3D scans in the HeiCuBeDa corpus. Training data is made available here: CuneiformPhotosMSII.

Named for Nisaba, the early Sumerian goddess of writing and scribes, NisabaRelief will serve as the preprocessing backbone of NabuOCR V2, a cuneiform OCR system currently in development.

Showcase Video:

Showcase Video


Contents


Example Output

<table> <thead> <tr> <th align="center" width="25%">Input</th> <th align="center" width="25%">Output</th> <th align="center" width="25%">Ground Truth</th> <th align="center" width="25%">Difference</th> </tr> </thead> <tbody> <tr> <td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_input_0.png" width="200"/></td> <td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_output_0.png" width="200"/></td> <td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_truth_0.png" width="200"/></td> <td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_diff_0.png" width="200"/></td> </tr> <tr> <td colspan="4" align="center"><b>Dice: 0.9652</b> &nbsp;·&nbsp; RMSE: 0.0775 &nbsp;·&nbsp; MS-SSIM: 0.9295 &nbsp;·&nbsp; PSNR: 22.22 dB &nbsp;·&nbsp; PSNR-HVS-M: 17.77 dB &nbsp;·&nbsp; SRE: 58.34 dB</td> </tr> <tr> <td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_input_1.png" width="200"/></td> <td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_output_1.png" width="200"/></td> <td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_truth_1.png" width="200"/></td> <td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_diff_1.png" width="200"/></td> </tr> <tr> <td colspan="4" align="center"><b>Dice: 0.9555</b> &nbsp;·&nbsp; RMSE: 0.0788 &nbsp;·&nbsp; MS-SSIM: 0.9219 &nbsp;·&nbsp; PSNR: 22.07 dB &nbsp;·&nbsp; PSNR-HVS-M: 17.80 dB &nbsp;·&nbsp; SRE: 57.89 dB</td> </tr> <tr> <td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_input_2.png" width="200"/></td> <td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_output_2.png" width="200"/></td> <td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_truth_2.png" width="200"/></td> <td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_diff_2.png" width="200"/></td> </tr> <tr> <td colspan="4" align="center"><b>Dice: 0.9630</b> &nbsp;·&nbsp; RMSE: 0.1108 &nbsp;·&nbsp; MS-SSIM: 0.8513 &nbsp;·&nbsp; PSNR: 19.11 dB &nbsp;·&nbsp; PSNR-HVS-M: 14.65 dB &nbsp;·&nbsp; SRE: 59.60 dB</td> </tr> <tr> <td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_input_3.png" width="200"/></td> <td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_output_3.png" width="200"/></td> <td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_truth_3.png" width="200"/></td> <td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_diff_3.png" width="200"/></td> </tr> <tr> <td colspan="4" align="center"><b>Dice: 0.9713</b> &nbsp;·&nbsp; RMSE: 0.1035 &nbsp;·&nbsp; MS-SSIM: 0.8748 &nbsp;·&nbsp; PSNR: 19.70 dB &nbsp;·&nbsp; PSNR-HVS-M: 15.33 dB &nbsp;·&nbsp; SRE: 59.41 dB</td> </tr> <tr> <td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_input_4.png" width="200"/></td> <td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_output_4.png" width="200"/></td> <td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_truth_4.png" width="200"/></td> <td align="center"><img src="https://huggingface.co/boatbomber/NisabaRelief/resolve/main/assets/example_diff_4.png" width="200"/></td> </tr> <tr> <td colspan="4" align="center"><b>Dice: 0.9564</b> &nbsp;·&nbsp; RMSE: 0.1054 &nbsp;·&nbsp; MS-SSIM: 0.9325 &nbsp;·&nbsp; PSNR: 19.55 dB &nbsp;·&nbsp; PSNR-HVS-M: 15.18 dB &nbsp;·&nbsp; SRE: 57.36 dB</td> </tr> </tbody> </table>

Quickstart

Installation

Prerequisites:

  • Python >= 3.10
  • PyTorch with CUDA support. See https://pytorch.org/get-started/locally/.
# Install PyTorch (CUDA 12.8 example)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128

# Windows only: install Triton (included automatically on Linux)
pip install triton-windows

Install:

pip install nisaba-relief

Usage

from nisaba_relief import NisabaRelief

model = NisabaRelief()  # downloads weights from HF Hub automatically if needed
result = model.process("tablet.jpg")
result.save("tablet_msii.png")

Constructor parameters:

| Parameter | Default | Description | |---|---|---| | device | "cuda" if available | Device for inference | | num_steps | 2 | Denoising steps | | weights_dir | None | Local weights directory; if None, downloads from HF Hub or uses HF cache. Expected dir contents: model.safetensors, ae.safetensors, prompt_embedding.safetensors | | batch_size | None | Batch size for processing tiles during inference. None (default) auto-selects the largest batch that fits in available VRAM. Set an explicit integer to override. Higher values are faster but see note below. | | seed | None | Optional random seed for reproducible noise generation; if None, randomized | | compile | True | Use torch.compile for faster repeated inference. Requires Triton. Set to False if Triton is not installed or for one-off runs. |

Reproducibility note: Results are pixel-exact across repeated runs with the same batch_size and seed. However, changing batch_size between runs (including letting None auto-select a different value as available VRAM changes) will produce outputs that differ by up to ~1-2 pixel values (mean < 0.25) due to GPU floating-point non-determinism: CUDA selects different kernel implementations for different matrix shapes, which changes the floating-point accumulation order in the transformer attention and linear layers. The visual difference is imperceptible. If exact cross-run reproducibility is required, set a constant batch_size.

process() parameters:

| Parameter | Default | Description | |---|---|---| | image | required | File path (str/Path) or PIL Image | | show_pbar | None | Progress bar visibility. None = auto (shows when >= 2 batches); True/False = always show/hide |

Returns: Grayscale PIL.Image.Image containing the MSII visualization.

Input requirements:

  • Any PIL-readable format (PNG, JPG, WEBP, ...)
  • Minimum 64 px on the short side; maximum aspect ratio 8:1

Large image support:

The model's native tile size is 1024 px. For images where either side exceeds 1024 px, the model automatically applies a sliding-window tiling pass. Tiles are blended with raised-cosine overlap weights to avoid seams. Each tile is also conditioned on a 128 px thumbnail of the full image with a red rectangle marking the tile's position, so the model retains global context while processing local detail.

There is no practical upper limit on input resolution, though the model may perform unexpectedly if the 1024 px tile is only a small fraction of the total image area.


Hardware Requirements

While CPU inference is technically supported, it is too slow for practical use. A GPU with at least 9GB VRAM is required, with 12GB+ being recommended for better batching.

The 9 GB figure is substantially lower than the ~18 GB a standard FLUX.2-klein-base-4B deployment would require because the Qwen3-4B text encoder is never loaded at runtime. The conditioning prompt is pre-computed once and shipped as a 7.8 MB embedding file alongside the model weights.


Performance

Traditional pipelines require a high-resolution 3D scanner and GigaMesh postprocessing: across the HeiCuBeDa corpus, this averages approximately 68 minutes per tablet, totalling over 2,200 hours for the full collection. NisabaRelief processes a tablet photograph in approximately 7 seconds, roughly 600x faster, with no scanning equipment required.

On a 1064x2048px photo, an RTX 3090 performs as follows:

| Run | Time | |---|---| | compile warmup | 11.61s | | 1 | 7.05s | | 2 | 7.07s | | 3 | 7.09s | | Mean | 7.07 ± 0.02s |


What is MSII?

Multi-Scale Integral Invariant (MSII) filtering is a geometry-processing algorithm that computes a robust curvature measure at every point on a 3D surface mesh. At each vertex, a sphere of radius r is centered on the surface and the algorithm measures how much of the sphere's volume falls below the surface (the "interior" volume). On a perfectly flat surface the ratio is exactly one half. Concave regions (such as the channel cut by a wedge impression) admit more of the sphere below the surface, pushing the ratio above 0.5. Convex regions such as ridges or the rounded back of a tablet expose less interior volume, pulling the ratio below 0.5. The signed difference from the flat baseline maps directly to the sign and magnitude of mean curvature at that point.

The multi-scale component repeats this computation at several sphere radii simultaneously. Small radii resolve fine wedge tips and hairline details; large radii capture broader curvature trends such as the tablet's overall convexity. The per-vertex measurements across all radii form a compact feature vector, and the final scalar output conventionally displayed as a grayscale image is the maximum component of that feature vector, capturing the strongest curvature response across all scales into a single value per pixel.

By convention the scalar is displayed with its sign inverted relative to the mean curvature: concave regions (ratio > 0.5) map to darker pixel values and convex regions (ratio < 0.5) to lighter ones. This places the flat-surface baseline at mid-gray and renders wedge channels as dark strokes against a bright background, similar to ink on paper.

Because the result depends only on the 3D shape of the surface rather than on lighting, clay color, or photograph angle, wedge impressions appear as consistent dark strokes against a bright background. This makes the surface structure considerably more legible to machine-vision OCR systems than raw photographs.


Intended Use & Limitations

Generating an MSII visualization of a tablet requires a high-resolution laser scanner and substantial per-vertex computation. The vast majority of cuneiform tablets do not have a 3D scan available, and the computational cost is difficult to scale across large corpora.

To reduce this barrier and increase the availability of readable images, this model is trained to predict the MSII visualization directly from photographs.

Intended use:

  • Preprocessing step for cuneiform OCR (specifically NabuOCR V2)
  • Visualizing cuneiform tablet geometry for research and digital humanities

Limitations:

  • Trained exclusively using HeiCuBeDa 3D-scan data; performance on tablet types or scribal traditions not well-represented in that corpus is unknown
  • Outputs are MSII approximations inferred from 2D photographs, not computed from true 3D geometry. They are suitable for OCR preprocessing but are not a substitute for physical scanning
  • Not a general-purpose MSII model; behavior on non-cuneiform inputs is undefined and out of distribution
  • Designed for photographs following CDLI photography guidelines: high-resolution fatcross layout on a black background. The model may underperform on low-resolution or visually cluttered inputs such as older black-and-white excavation photographs where the background blends into the tablet

Evaluation

The model was evaluated on 704 held-out validation pairs, all tablets whose geometry was never seen during training (see Training Data). Each validation image was processed through the model and the output compared against the ground-truth MSII visualization computed from the 3D scan. Ran with seed=42 and batch_size=4.

| Metric | Value | |------------|------------------| | Dice | 0.9639 ± 0.0138 | | RMSE | 0.0877 ± 0.0208 | | MS-SSIM | 0.9026 ± 0.0308 | | PSNR | 21.36 ± 1.91 dB | | PSNR-HVS-M | 16.98 ± 1.89 dB | | SRE | 59.57 ± 1.92 dB |

Dice (Binarized Dice Coefficient) thresholds both images to isolate wedge stroke regions, then measures overlap between predicted and ground-truth strokes on a 0-1 scale. This is the most task-relevant metric, as it directly measures whether the model correctly localizes wedge impressions for downstream OCR.

RMSE (Root Mean Squared Error) measures average pixel-level reconstruction error; lower is better.

MS-SSIM (Multi-Scale Structural Similarity Index) measures perceptual image similarity by comparing luminance, contrast, and local structure at multiple spatial scales simultaneously. Coarser scales capture global shape agreement; finer scales capture edge and texture detail. Scores range from 0 to 1, where 1 is a perfect match; higher is better.

PSNR (Peak Signal-to-Noise Ratio) expresses reconstruction fidelity in decibels relative to the maximum pixel value; higher is better.

PSNR-HVS-M (Peak Signal-to-Noise Ratio - Human Visual System and Masking) measures reconstruction fidelity in decibels relative to the maximum pixel value while taking into account Contrast Sensitivity Function (CSF) and between-coefficient contrast masking of DCT basis functions.

SRE (Signal-to-Reconstruction Error) ratio measures reconstruction fidelity in decibels based on signal energy vs. error energy; higher is better.

Step Sweep

A sweep of step counts was run on a subset of 175 validation samples and found that 2 steps is ideal for this model, adding one corrective step over the already solid single-step result. The rectified flow field is extremely straight (straightness_ratio=0.9989, path_length_ratio=1.0011, velocity_std=0.1565). For near-perfectly straight ODE trajectories, a single Euler step is theoretically near-exact, and each additional step accumulates small model prediction errors faster than it reduces discretization error. Where throughput is the primary concern, one step is acceptable. Ran with seed=42 and batch_size=4.

| Metric | Steps=1 | Steps=2 | Steps=4 | Steps=8 | |------------|------------------|----------------------|------------------|------------------| | Dice | 0.9582 ± 0.0153 | 0.9634 ± 0.0139 | 0.9612 ± 0.0142 | 0.9580 ± 0.0148 | | RMSE | 0.0909 ± 0.0209 | 0.0859 ± 0.0212 | 0.0900 ± 0.0203 | 0.0949 ± 0.0197 | | MS-SSIM | 0.8987 ± 0.0326 | 0.9081 ± 0.0310 | 0.9039 ± 0.0314 | 0.8959 ± 0.0326 | | PSNR | 21.03 ± 1.83 dB | 21.56 ± 1.97 dB | 21.11 ± 1.84 dB | 20.63 ± 1.72 dB | | PSNR-HVS-M | 16.65 ± 1.80 dB | 17.19 ± 1.96 dB | 16.70 ± 1.83 dB | 16.18 ± 1.70 dB | | SRE | 58.81 ± 1.81 dB | 59.07 ± 1.87 dB | 58.85 ± 1.87 dB | 58.61 ± 1.86 dB |


Training Data

Training uses the CuneiformPhotosMSII dataset: 13,928 paired image pairs generated from 1,741 tablets sourced from the HeiCuBeDa (Heidelberg Cuneiform Benchmark Dataset), a professional research collection of 3D-scanned clay tablets. Each tablet was rendered multiple times in Blender at up to 4096 px, producing synthetic photographs alongside their corresponding MSII curvature visualizations.

Each render variant randomizes which faces of the tablet are shown, camera focal length (80-150 mm), tablet rotation (±5° Euler XYZ), lighting position/color/intensity, and background (fabric, grunge, stone, or none). This diversity encourages the model to generalize across realistic shooting conditions rather than overfitting to a specific lighting or composition style.

The dataset was split tablet-wise: 13,224 pairs (~95% of tablets) for training and 704 pairs (~5% of tablets) held out for validation. Because the split is by tablet identity, the model never sees a validation tablet's geometry during training.


Training Pipeline

Training proceeded in three sequential stages: Pretrain, Train, and Rectify. Each stage builds directly on the weights from the previous one.

Key Technical Decision: Text-Encoder-Free Training

All three stages skip the Qwen3-4B text encoder entirely. Text embeddings are pre-computed once and cached to disk, reducing VRAM consumption from ~18 GB to ~9 GB without any loss in conditioning fidelity.

Key Technical Decision: VAE BatchNorm Domain Calibration

The FLUX.2 VAE contains a BatchNorm layer whose running statistics (running_mean and running_var across 128 channels: 32 latent channels × 2×2 patch size) were originally computed on diverse natural images. Applying this encoder to cuneiform tablets and MSII renderings introduces a latent-space distribution shift that manifests as screen-door dithering artifacts in decoded outputs.

To correct this, the BatchNorm statistics were recalibrated on the target domain before training began. 3,000 CDLI cuneiform tablet photographs and 2,000 synthetic MSII visualizations (5,000 images total) were encoded through the frozen VAE encoder; running mean and variance were accumulated across 19,301,093 spatial samples using float64 accumulators for numerical stability. Images from both domains were interleaved to ensure balanced sampling. The calibrated statistics are baked directly into the ae.safetensors weights shipped with this model.


Stage 1: Pretrain (Domain Initialization)

The pretrain stage adapts the base FLUX.2 model to the cuneiform domain before any image-to-image translation is attempted. It runs standard text-to-image flow-matching training on two sources of real cuneiform imagery:

  • ~60% CDLI archive photographs: real museum photos of tablets, paired with per-image text embeddings generated from CDLI metadata (period, material, object type, provenience, genre, language). Eight prompt templates were used and varied randomly.
  • ~40% synthetic MSII renders: MSII visualization images from the training set, paired with MSII-specific text embeddings emphasizing curvature, surface topology, and wedge impression terminology.

Each image has its own unique cached embedding rather than a shared prompt, preventing the model from memorizing specimen identifiers and encouraging generalization.

| Hyperparameter | Value | |---|---| | Steps | 75,000 | | Learning rate | 2e-4 (cosine decay, 1k warmup) | | Effective batch size | 2 (batch 1, grad accum 2) | | LoRA rank | 256 | | LoRA init | PiSSA (8-iteration fast SVD) | | Optimizer | 8-bit Adam | | Precision | bfloat16 autocast | | Timestep sampling | Logit-normal (mean=0, std=1) | | Gradient clipping | 1.0 |

Images are resized to fit within 1 megapixel and rounded to 128-pixel multiples. Light augmentations are applied (horizontal flip, ±5° rotation, minor color jitter). Validation generates text-conditioned images across four aspect ratios every 1,000 steps.


Stage 2: Train (Image-to-Image Adaptation)

The main training stage fine-tunes the pretrained weights for the target task: translating cuneiform tablet photographs into MSII visualizations. This stage introduces two significant changes over standard FLUX.2 fine-tuning.

Tile and global context conditioning

Rather than processing full images, the model trains on dynamic tile crops (128-1024 px, depending on image resolution) while simultaneously receiving a downscaled 128 px thumbnail of the full image with a red rectangle marking the tile's location, providing both local detail and global context.

Paired crop with geometric consistency

The same crop coordinates and geometric transforms (flip, rotation, perspective distortion) are applied to both the input photograph and the target MSII image, ensuring the model always receives spatially aligned pairs.

Augmentation Pipeline

Augmentations are split into two categories applied in sequence:

Geometric (applied identically to input and target):

  • Horizontal flip (50%), vertical flip (40%), rotation ±8° (50%), perspective distortion strength 0.02 (30%)

Domain adaptation (applied to input only, to simulate real photographic variation):

  • Perlin noise illumination (20%), vignette (40%), directional lighting gradient (50%), dust particles (50%), Gaussian noise (80%), gamma correction (50%), contrast adjustment (50%), brightness shift (50%), hue/saturation shift (40%), Gaussian blur (30%), grayscale conversion (3%)

Spatially-dependent effects (Perlin noise, vignette, gradient) use crop coordinates so the tile and its global thumbnail receive matching effects.

Loss

Flow-matching loss with Min-SNR-γ weighting (γ=5.0) to down-weight noisy high-timestep predictions, plus a multi-scale latent gradient loss weighted at 0.25. The gradient loss computes spatial gradient differences between predicted and target latents at four downsampling scales, encouraging sharp edge structure in outputs.

| Hyperparameter | Value | |---|---| | Steps | 150,000 | | Learning rate | 3e-4 (cosine decay to 6e-6, 1k warmup) | | Effective batch size | 8 (batch 1, grad accum 8) | | LoRA rank | 256, alpha √rank, RSLoRA | | LoRA init | PiSSA (8-iteration fast SVD) | | EMA decay | 0.999 (used for validation and final save) | | Optimizer | 8-bit Adam | | Gradient clipping | 0.8 (with spike detection: skip if >2.5× EMA norm) | | Precision | bfloat16 autocast | | Gradient loss weight | 0.25 | | Min-SNR-γ | 5.0 | | Timestep sampling | Logit-normal (mean=0, std=1) |

Validation runs every 2,000 steps, generating 8 sample images with 8 denoising steps.


Stage 3: Rectify (Trajectory Straightening)

The rectify stage implements Rectified Flow to reduce the number of inference steps required at runtime.

Standard flow-matching trains on random (noise, real target) pairs, producing curved ODE trajectories that require 25-50 denoising steps to traverse accurately. Rectified training instead pairs each noise sample with the output the fully-trained model generates from that noise, creating straight-line trajectories that can be traversed in 1-4 steps without quality loss.

Before training, a one-time preprocessing pass runs the trained model over the training set. Each image is cropped deterministically (seeded RNG, same tile-sizing logic as training), then fully denoised with the trained weights to produce a (noise, generated_output) coupled pair saved to disk. This eliminates VAE encoding from the training loop, reducing VRAM further.

The loss trains the model to predict the velocity between a coupled (noise, generated) pair at a random interpolated timestep. A pseudo-Huber loss replaces the MSE used in earlier stages, providing better gradient stability when predictions are far from target.

| Hyperparameter | Value | |---|---| | Steps | 50,000 | | Learning rate | 3e-6 (cosine decay, 500 warmup) | | Effective batch size | 4 (batch 1, grad accum 4) | | LoRA rank | 256 | | LoRA init | Loaded from Stage 2 weights (warm-start) | | Loss | Pseudo-Huber (c=0.001) | | Optimizer | 8-bit Adam | | Gradient clipping | 1.0 | | Precision | bfloat16 autocast | | Timestep sampling | Logit-normal (mean=0, std=1) |

Validation runs every 2,000 steps using real validation images (not coupled pairs), generating outputs with only 2 denoising steps to directly measure few-step inference quality.

The result is usable MSII visualizations in 1-2 denoising steps, compared to the 25-50 steps standard flow-matching requires.


Acknowledgements & Citations

3D Scan Data (HeiCuBeDa)

3D scans used to generate the training dataset are from the Heidelberg Cuneiform Benchmark Dataset (HeiCuBeDa):

Bogacz, B., Gertz, M., & Mara, H. (2015). Character Proposals for Cuneiform Script Digitization. Proceedings of the 15th International Conference on Frontiers in Handwriting Recognition (ICFHR). doi:10.11588/data/IE8CCN

Archive Photographs (CDLI)

Real tablet photographs used in Stage 1 pretraining are sourced from the Cuneiform Digital Library Initiative (CDLI).

MSII Curvature (GigaMesh)

MSII curvature values embedded in the HeiCuBeDa PLY files were computed using the GigaMesh Software Framework.

Rectified Flow

Stage 3 (Rectify) implements the trajectory-straightening approach from:

Liu, X., et al. (2022). Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow. arXiv:2209.03003

Base Model (FLUX.2 Klein Base 4B)

Fine-tuned from FLUX.2-klein-base-4B by Black Forest Labs.

Author: boatbomber

Likes: 2

Downloads: 0

Tags: safetensors, image-to-image, cuneiform, geometry, curvature, multi-scale-integral-invariant, msii, Flux, dataset:boatbomber/CuneiformPhotosMSII, arxiv:2209.03003, base_model:black-forest-labs/FLUX.2-klein-base-4B, base_model:finetune:black-forest-labs/FLUX.2-klein-base-4B, license:apache-2.0, region:us

dealignai/Nemotron-3-Super-120B-A12B-4bit-MLX-CRACK-Uncensored

Author: dealignai

Likes: 2

Downloads: 0

Tags: mlx, safetensors, nemotron_h, abliterated, uncensored, crack, moe, nemotron, mamba, ssm, hybrid, text-generation, conversational, custom_code, en, license:other, 4-bit, region:us

LocoreMind/LocoTrainer-4B-GGUF


library_name: transformers license: mit base_model: LocoreMind/LocoTrainer-4B tags:

  • code
  • agent
  • tool-calling
  • distillation
  • qwen3
  • ms-swift
  • gguf
  • quantization language:
  • en pipeline_tag: text-generation

LocoTrainer-4B GGUF

GGUF quantized version of LocoTrainer-4B model for local inference.

Model Information

  • Base Model: Qwen3-4B-Instruct-2507
  • Distilled from: Qwen3-Coder-Next
  • Training Method: Knowledge Distillation (SFT)
  • Training Data: 361,830 samples
  • Max Context: 32,768 tokens
  • Framework: MS-SWIFT

Available Versions

| Version | Size | Speed | Quality | Recommended For | |---------|------|-------|---------|-----------------| | F16 | 8.3GB | Fast | Highest | Baseline/Reference | | Q8_0 | 4.4GB | Fast | Very High | High-quality inference | | Q5_K_M | 3.0GB | Medium | High | Balanced approach | | Q4_K_M | 2.6GB | Fast | Medium | Recommended | | Q3_K_M | 2.1GB | Very Fast | Medium | Resource-constrained |

Quick Start

Using llama.cpp

# Download model
wget https://huggingface.co/LocoreMind/LocoTrainer-4B-GGUF/resolve/main/LocoTrainer-4B-Q4_K_M.gguf

# Start server
./llama-server -m LocoTrainer-4B-Q4_K_M.gguf --port 8080 --ctx-size 32768

Using LocoTrainer Framework

# Configure .env
export LOCOTRAINER_BASE_URL=http://localhost:8080/v1
export LOCOTRAINER_MODEL=LocoTrainer-4B

# Run
locotrainer run -q "What are the default LoRA settings in ms-swift?"

Using llama-cpp-python

from llama_cpp import Llama

llm = Llama(
    model_path="LocoTrainer-4B-Q4_K_M.gguf",
    n_gpu_layers=99,
    n_ctx=32768,
)

response = llm(
    "What is MS-SWIFT?",
    max_tokens=512,
)
print(response["choices"][0]["text"])

Performance Metrics

Tested on NVIDIA H100:

  • First Token Latency: ~200-300ms
  • Subsequent Token Speed: 50-100 tokens/sec
  • Memory Usage (Q4_K_M): ~10-12GB

Features

  • 🎯 MS-SWIFT Domain Expert: Trained on MS-SWIFT documentation and codebase
  • 🔧 Tool Calling: Supports Read, Grep, Glob, Bash, Write tools
  • 📊 End-to-End Reports: From question to complete markdown analysis report
  • 🏠 Local Deployment: Fully offline, zero API cost
  • 📏 Long Context: 32K tokens support

Use Cases

  • Codebase analysis and documentation generation
  • MS-SWIFT framework Q&A
  • Local AI agent deployment
  • Offline inference applications

License

MIT

Acknowledgments

Related Resources

Author: LocoreMind

Likes: 2

Downloads: 244

Tags: transformers, gguf, code, agent, tool-calling, distillation, qwen3, ms-swift, quantization, text-generation, en, base_model:LocoreMind/LocoTrainer-4B, base_model:quantized:LocoreMind/LocoTrainer-4B, license:mit, endpoints_compatible, region:us, conversational

robinxiexie/qwen-finetuned-model


library_name: transformers tags: []

Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->

Model Details

Model Description

<!-- Provide a longer summary of what this model is. -->

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

  • Developed by: [More Information Needed]
  • Funded by [optional]: [More Information Needed]
  • Shared by [optional]: [More Information Needed]
  • Model type: [More Information Needed]
  • Language(s) (NLP): [More Information Needed]
  • License: [More Information Needed]
  • Finetuned from model [optional]: [More Information Needed]

Model Sources [optional]

<!-- Provide the basic links for the model. -->
  • Repository: [More Information Needed]
  • Paper [optional]: [More Information Needed]
  • Demo [optional]: [More Information Needed]

Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

[More Information Needed]

Downstream Use [optional]

<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->

[More Information Needed]

Out-of-Scope Use

<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

[More Information Needed]

Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

[More Information Needed]

Recommendations

<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

Training Details

Training Data

<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

[More Information Needed]

Training Procedure

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

Preprocessing [optional]

[More Information Needed]

Training Hyperparameters

  • Training regime: [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->

Speeds, Sizes, Times [optional]

<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->

[More Information Needed]

Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->

Testing Data, Factors & Metrics

Testing Data

<!-- This should link to a Dataset Card if possible. -->

[More Information Needed]

Factors

<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->

[More Information Needed]

Metrics

<!-- These are the evaluation metrics being used, ideally with a description of why. -->

[More Information Needed]

Results

[More Information Needed]

Summary

Model Examination [optional]

<!-- Relevant interpretability work for the model goes here -->

[More Information Needed]

Environmental Impact

<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: [More Information Needed]
  • Hours used: [More Information Needed]
  • Cloud Provider: [More Information Needed]
  • Compute Region: [More Information Needed]
  • Carbon Emitted: [More Information Needed]

Technical Specifications [optional]

Model Architecture and Objective

[More Information Needed]

Compute Infrastructure

[More Information Needed]

Hardware

[More Information Needed]

Software

[More Information Needed]

Citation [optional]

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

[More Information Needed]

Author: robinxiexie

Likes: 1

Downloads: 0

Tags: transformers, safetensors, arxiv:1910.09700, endpoints_compatible, region:us