Todays AI Summary

AI Developments: Ling-flash-2.0 Achieves SOTA Performance, Magistral Small 1.2 Adds Multimodality, and More

Today's AI landscape is marked by advancements in language models, multimodal capabilities, and cognitive architectures. Here's a breakdown of the most interesting developments:

Research Highlights

  • Shapes of Cognition: A new paradigm for computational cognitive modeling of Language-Endowed Intelligent Agents (LEIAs) is introduced, focusing on how agents use remembered knowledge to navigate real-world complexity.
  • Contrastive Timbre Representations: A contrastive learning framework is presented for musical instrument retrieval, enabling efficient querying of instrument databases using a single model for both single- and multi-instrument sounds.
  • RepIt for Targeted Interventions: A framework called RepIt is introduced for isolating concept-specific representations in LLMs, enabling precise interventions and granular control of model behavior.
  • HARMONIC Cognitive Robotic Architecture: A cognitive-robotic architecture designed for robots in human-robotic teams is introduced, supporting semantic perception interpretation, human-like decision-making, and intentional language communication.
  • RadGame for Radiology Education: An AI-powered gamified platform for radiology education is introduced, targeting localization of findings and report generation.
  • JANUS for Stealthy Node Injection Attacks: A dual-constraint stealthy node injection framework is proposed, called Joint Alignment of Nodal and Universal Structures (JANUS).
  • ResidualViT for Efficient Video Encoding: A vision transformer (ViT) architecture, dubbed ResidualViT, is introduced that leverages the large temporal redundancy in videos to efficiently compute temporally dense frame-level features.
  • Metacognitive Reuse: A mechanism is studied that converts recurring reasoning fragments into concise, reusable "behaviors" via the model's own metacognitive analysis of prior traces.
  • Layout-Aware OCR for Black Digital Archives: A layout-aware OCR pipeline tailored for Black newspaper archives is presented and introduces an unsupervised evaluation framework suited to low-resource archival contexts.
  • COLMA for Next-Generation AI Memory: The COgnitive Layered Memory Architecture (COLMA), a novel framework that integrates cognitive scenarios, memory processes, and storage mechanisms into a cohesive design, is introduced.

Model Releases

  • inclusionAI/Ling-flash-2.0: This language model boasts 100B total parameters with only 6.1B activated, achieving SOTA performance among dense models under 40B parameters. It excels in complex reasoning, code generation, and frontend development. It achieves 200+ tokens/s on H20 hardware and supports 128K context length with YaRN extrapolation. The model is based on the reaserch paper Ling Scaling Laws. The model has 61 likes.
  • unsloth/Magistral-Small-2509-GGUF: A quantized version of Mistral Small 3.2 (2506), Magistral Small 1.2 features added reasoning capabilities, multimodality, and a 128k context window. It achieves strong benchmark results in AIME24, AIME25, GPQA Diamond, and Livecodebench. The model has 28 likes.
  • PerceptronAI/Isaac-0.1: A 2B-parameter perceptive-language model designed for real-world applications, focusing on visual QA, grounded spatial intelligence, in-context learning for perception, OCR, and conversational pointing. The model has 18 likes.

Key Takeaways

  • Efficient MoE Architectures: Ling-flash-2.0 demonstrates the potential of Mixture of Experts (MoE) models to achieve high performance with a fraction of the activated parameters, leading to faster inference.
  • Multimodal Reasoning: Magistral Small 1.2 highlights the growing importance of multimodal AI, with its new vision encoder enabling reasoning based on both visual and textual inputs.
  • Specialized Models: Isaac-0.1 showcases the trend towards specialized models tailored for specific tasks, in this case, understanding and interacting with the physical world.
  • Data-Efficient Learning: RepIt demonstrates that targeted interventions in LLMs can be achieved with modest compute and data, raising concerns about potential misuse but also opening opportunities for extending capabilities to underrepresented topics.
  • Cognitive Architectures: The HARMONIC and COLMA papers emphasize the need for robust and human-like memory systems in AI, paving the way for more adaptable and continuously learning AI agents.

AI Papers for 2026-03-25

WorldCache: Content-Aware Caching for Accelerated Video World Models

Diffusion Transformers (DiTs) power high-fidelity video world models but remain computationally expensive due to sequential denoising and costly spatio-temporal attention. Training-free feature caching accelerates inference by reusing intermediate activations across denoising steps; however, existing methods largely rely on a Zero-Order Hold assumption i.e., reusing cached features as static snapshots when global drift is small. This often leads to ghosting artifacts, blur, and motion inconsistencies in dynamic scenes. We propose \textbf{WorldCache}, a Perception-Constrained Dynamical Caching framework that improves both when and how to reuse features. WorldCache introduces motion-adaptive thresholds, saliency-weighted drift estimation, optimal approximation via blending and warping, and phase-aware threshold scheduling across diffusion steps. Our cohesive approach enables adaptive, motion-consistent feature reuse without retraining. On Cosmos-Predict2.5-2B evaluated on PAI-Bench, WorldCache achieves \textbf{2.3$\times$} inference speedup while preserving \textbf{99.4\%} of baseline quality, substantially outperforming prior training-free caching approaches. Our code can be accessed on \href{https://umair1221.github.io/World-Cache/}{World-Cache}.

End-to-End Training for Unified Tokenization and Latent Denoising

Latent diffusion models (LDMs) enable high-fidelity synthesis by operating in learned latent spaces. However, training state-of-the-art LDMs requires complex staging: a tokenizer must be trained first, before the diffusion model can be trained in the frozen latent space. We propose UNITE - an autoencoder architecture for unified tokenization and latent diffusion. UNITE consists of a Generative Encoder that serves as both image tokenizer and latent generator via weight sharing. Our key insight is that tokenization and generation can be viewed as the same latent inference problem under different conditioning regimes: tokenization infers latents from fully observed images, whereas generation infers them from noise together with text or class conditioning. Motivated by this, we introduce a single-stage training procedure that jointly optimizes both tasks via two forward passes through the same Generative Encoder. The shared parameters enable gradients to jointly shape the latent space, encouraging a "common latent language". Across image and molecule modalities, UNITE achieves near state of the art performance without adversarial losses or pretrained encoders (e.g., DINO), reaching FID 2.12 and 1.73 for Base and Large models on ImageNet 256 x 256. We further analyze the Generative Encoder through the lenses of representation alignment and compression. These results show that single stage joint training of tokenization & generation from scratch is feasible.

UniMotion: A Unified Framework for Motion-Text-Vision Understanding and Generation

We present UniMotion, to our knowledge the first unified framework for simultaneous understanding and generation of human motion, natural language, and RGB images within a single architecture. Existing unified models handle only restricted modality subsets (e.g., Motion-Text or static Pose-Image) and predominantly rely on discrete tokenization, which introduces quantization errors and disrupts temporal continuity. UniMotion overcomes both limitations through a core principle: treating motion as a first-class continuous modality on equal footing with RGB. A novel Cross-Modal Aligned Motion VAE (CMA-VAE) and symmetric dual-path embedders construct parallel continuous pathways for Motion and RGB within a shared LLM backbone. To inject visual-semantic priors into motion representations without requiring images at inference, we propose Dual-Posterior KL Alignment (DPA), which distills a vision-fused encoder's richer posterior into the motion-only encoder. To address the cold-start problem -- where text supervision alone is too sparse to calibrate the newly introduced motion pathway -- we further propose Latent Reconstruction Alignment (LRA), a self-supervised pre-training strategy that uses dense motion latents as unambiguous conditions to co-calibrate the embedder, backbone, and flow head, establishing a stable motion-aware foundation for all downstream tasks. UniMotion achieves state-of-the-art performance across seven tasks spanning any-to-any understanding, generation, and editing among the three modalities, with especially strong advantages on cross-modal compositional tasks.

ThinkJEPA: Empowering Latent World Models with Large Vision-Language Reasoning Model

Recent progress in latent world models (e.g., V-JEPA2) has shown promising capability in forecasting future world states from video observations. Nevertheless, dense prediction from a short observation window limits temporal context and can bias predictors toward local, low-level extrapolation, making it difficult to capture long-horizon semantics and reducing downstream utility. Vision--language models (VLMs), in contrast, provide strong semantic grounding and general knowledge by reasoning over uniformly sampled frames, but they are not ideal as standalone dense predictors due to compute-driven sparse sampling, a language-output bottleneck that compresses fine-grained interaction states into text-oriented representations, and a data-regime mismatch when adapting to small action-conditioned datasets. We propose a VLM-guided JEPA-style latent world modeling framework that combines dense-frame dynamics modeling with long-horizon semantic guidance via a dual-temporal pathway: a dense JEPA branch for fine-grained motion and interaction cues, and a uniformly sampled VLM \emph{thinker} branch with a larger temporal stride for knowledge-rich guidance. To transfer the VLM's progressive reasoning signals effectively, we introduce a hierarchical pyramid representation extraction module that aggregates multi-layer VLM representations into guidance features compatible with latent prediction. Experiments on hand-manipulation trajectory prediction show that our method outperforms both a strong VLM-only baseline and a JEPA-predictor baseline, and yields more robust long-horizon rollout behavior.

3D-Layout-R1: Structured Reasoning for Language-Instructed Spatial Editing

Large Language Models (LLMs) and Vision Language Models (VLMs) have shown impressive reasoning abilities, yet they struggle with spatial understanding and layout consistency when performing fine-grained visual editing. We introduce a Structured Reasoning framework that performs text-conditioned spatial layout editing via scene-graph reasoning. Given an input scene graph and a natural-language instruction, the model reasons over the graph to generate an updated scene graph that satisfies the text condition while maintaining spatial coherence. By explicitly guiding the reasoning process through structured relational representations, our approach improves both interpretability and control over spatial relationships. We evaluate our method on a new text-guided layout editing benchmark encompassing sorting, spatial alignment, and room-editing tasks. Our training paradigm yields an average 15% improvement in IoU and 25% reduction in center-distance error compared to Chain of Thought Fine-tuning (CoT-SFT) and vanilla GRPO baselines. Compared to SOTA zero-shot LLMs, our best models achieve up to 20% higher mIoU, demonstrating markedly improved spatial precision.

TiCo: Time-Controllable Training for Spoken Dialogue Models

We propose TiCo, a simple post-training method for enabling spoken dialogue models (SDMs) to follow time-constrained instructions and generate responses with controllable duration. This capability is valuable for real-world spoken language systems such as voice assistants and interactive agents, where controlling response duration can improve interaction quality. However, despite their strong ability to generate natural spoken responses, existing models lack time awareness and struggle to follow duration-related instructions (e.g., "Please generate a response lasting about 15 seconds"). Through an empirical evaluation of both open-source and commercial SDMs, we show that they frequently fail to satisfy such time-control requirements. TiCo addresses this limitation by enabling models to estimate elapsed speaking time during generation through Spoken Time Markers (STM) (e.g., <10.6 seconds>). These markers help the model maintain awareness of time and adjust the remaining content to meet the target duration. TiCo is simple and efficient: it requires only a small amount of data and no additional question-answer pairs, relying instead on self-generation and reinforcement learning. Experimental results show that TiCo significantly improves adherence to duration constraints while preserving response quality.

Confidence-Based Decoding is Provably Efficient for Diffusion Language Models

Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive (AR) models for language modeling, allowing flexible generation order and parallel generation of multiple tokens. However, this flexibility introduces a challenge absent in AR models: the \emph{decoding strategy} -- which determines the order and number of tokens generated at each iteration -- critically affects sampling efficiency. Among decoding strategies explored in practice, confidence-based methods, which adaptively select which and how many tokens to unmask based on prediction confidence, have shown strong empirical performance. Despite this success, our theoretical understanding of confidence-based decoding remains limited. In this work, we develop the first theoretical analysis framework for confidence-based decoding in DLMs. We focus on an entropy sum-based strategy that continues unmasking tokens within each iteration until the cumulative entropy exceeds a threshold, and show that it achieves $\varepsilon$-accurate sampling in KL divergence with an expected number of iterations $\widetilde O(H(X_0)/\varepsilon)$, where $H(X_0)$ denotes the entropy of the target data distribution. Notably, this strategy yields substantial sampling acceleration when the data distribution has low entropy relative to the sequence length, while automatically adapting to the intrinsic complexity of data without requiring prior knowledge or hyperparameter tuning. Overall, our results provide a theoretical foundation for confidence-based decoding and may inform the design of more efficient decoding strategies for DLMs.

One Model, Two Markets: Bid-Aware Generative Recommendation

Generative Recommender Systems using semantic ids, such as TIGER (Rajput et al., 2023), have emerged as a widely adopted competitive paradigm in sequential recommendation. However, existing architectures are designed solely for semantic retrieval and do not address concerns such as monetization via ad revenue and incorporation of bids for commercial retrieval. We propose GEM-Rec, a unified framework that integrates commercial relevance and monetization objectives directly into the generative sequence. We introduce control tokens to decouple the decision of whether to show an ad from which item to show. This allows the model to learn valid placement patterns directly from interaction logs, which inherently reflect past successful ad placements. Complementing this, we devise a Bid-Aware Decoding mechanism that handles real-time pricing, injecting bids directly into the inference process to steer the generation toward high-value items. We prove that this approach guarantees allocation monotonicity, ensuring that higher bids weakly increase an ad's likelihood of being shown without requiring model retraining. Experiments demonstrate that GEM-Rec allows platforms to dynamically optimize for semantic relevance and platform revenue.

SpatialReward: Verifiable Spatial Reward Modeling for Fine-Grained Spatial Consistency in Text-to-Image Generation

Recent advances in text-to-image (T2I) generation via reinforcement learning (RL) have benefited from reward models that assess semantic alignment and visual quality. However, most existing reward models pay limited attention to fine-grained spatial relationships, often producing images that appear plausible overall yet contain inaccuracies in object positioning. In this work, we present \textbf{SpatialReward}, a verifiable reward model explicitly designed to evaluate spatial layouts in generated images. SpatialReward adopts a multi-stage pipeline: a \emph{Prompt Decomposer} extracts entities, attributes, and spatial metadata from free-form prompts; expert detectors provide accurate visual grounding of object positions and attributes; and a vision-language model applies chain-of-thought reasoning over grounded observations to assess complex spatial relations that are challenging for rule-based methods. To more comprehensively evaluate spatial relationships in generated images, we introduce \textbf{SpatRelBench}, a benchmark covering object attributes, orientation, inter-object relations, and rendered text placement. Experiments on Stable Diffusion and FLUX show that incorporating SpatialReward into RL training consistently improves spatial consistency and overall generation quality, with results aligned more closely to human judgments. These findings indicate that verifiable reward models hold considerable potential for enabling more accurate and controllable optimization in text-to-image generation models.

Dyadic: A Scalable Platform for Human-Human and Human-AI Conversation Research

Conversation is ubiquitous in social life, but the empirical study of this interactive process has been thwarted by tools that are insufficiently modular and unadaptive to researcher needs. To relieve many constraints in conversation research, the current tutorial presents an overview and introduction to a new tool, Dyadic (https://www.chatdyadic.com/), a web-based platform for studying human-human and human-AI conversations using text-based or voice-based chats. Dyadic is distinct from other platforms by offering studies with multiple modalities, AI suggestions (e.g., in human-human studies, AI can suggest responses to a participant), live monitoring (e.g., researchers can evaluate, in real time, chats between communicators), and survey deployment (e.g., Likert-type scales, feeling thermometers, and open-ended text boxes can be sent to humans for in situ evaluations of the interaction), among other consequential features. No coding is required to operate Dyadic directly, and integrations with existing survey platforms are offered.

AI Models

Tesslate/OmniCoder-2-9B


library_name: transformers base_model: Qwen/Qwen3.5-9B tags:

  • qwen3.5
  • code
  • agent
  • sft
  • omnicoder
  • tesslate license: apache-2.0 language:
  • en pipeline_tag: text-generation model-index:
  • name: OmniCoder-2-9B results:
    • task: type: text-generation dataset: name: AIME 2025 type: custom metrics:
      • name: Accuracy type: accuracy value: 90
    • task: type: text-generation dataset: name: GPQA Diamond (pass@1) type: custom metrics:
      • name: Accuracy type: accuracy value: 83
    • task: type: text-generation dataset: name: GPQA Diamond (pass@3) type: custom metrics:
      • name: Accuracy type: accuracy value: 86
    • task: type: text-generation dataset: name: Terminal-Bench 2.0 type: custom metrics:
      • name: Pass Rate type: accuracy value: 25.8

<div align="center"> <img src="omnicoder-banner.png" alt="OmniCoder 2" width="720">

OmniCoder-2-9B

The second generation of OmniCoder. Faster reasoning, dramatically less repetition, completely rebuilt training pipeline.

License Base Model Previous


</div>

What's New in v2

OmniCoder-2-9B is a ground-up rebuild of OmniCoder-9B, not a continuation. The entire training pipeline was redesigned to fix the core issues users reported with v1:

  • No more repetition loops. v1 trained on ALL tokens (system prompts, tool outputs, templates), which taught the model to reproduce repetitive boilerplate. v2 trains only on assistant tokens. The model never learns to parrot templates, so it stays coherent through long multi-turn conversations.
  • Faster, more focused reasoning. v1's <think> blocks were often bloated and circular. v2 produces tighter reasoning chains that get to the point faster because it only learned from the actual reasoning, not the scaffolding around it.
  • Much more stable in agentic loops. v1 would sometimes get stuck in repetitive tool-call cycles. v2 handles multi-step agentic tasks cleanly. It knows when to stop, when to switch tools, and when to give a final answer.
  • Rebuilt training pipeline. Switched from all-token cosine-schedule training to assistant-only constant-LR training (based on Schulman's "LoRA Without Regret"). The model converges faster and doesn't suffer from premature LR decay that killed v1's learning.

TL;DR: OmniCoder-2-9B fixes the repetition and instability issues from v1. Same weights, same architecture, completely different training approach.


Benchmarks

<div align="center">

| Benchmark | OmniCoder-2-9B | OmniCoder-9B | Qwen3.5-9B | GPT-OSS-120B | GLM 4.7 | Claude Haiku 4.5 | |:---|:---:|:---:|:---:|:---:|:---:|:---:| | AIME 2025 (pass@5) | 90 | 90 | 91.6 | | | | | GPQA Diamond (pass@1) | 83 | 83.8 | 81.7 | 71.5 | | 73 | | GPQA Diamond (pass@3) | 86 | 86.4 | | | | | | Terminal-Bench 2.0 | 25.8 | 23.6 | 14.6 | | 33.4 | 27 |

</div>

Key Results

  • GPQA Diamond pass@1: 83% (164/198). On par with OmniCoder-9B (83.8%) and Qwen3.5-9B base (81.7%). Pass@3 improves to 86%.
  • Terminal-Bench 2.0: 25.8% (23/89). +2.2 points over OmniCoder-9B (23.6%), +76% over Qwen3.5-9B base (14.6%).
  • AIME 2025 pass@5: 90% (27/30). Parity with OmniCoder-9B.

Quickstart

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Tesslate/OmniCoder-2-9B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")

messages = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "Write a Python function to find the longest common subsequence of two strings."},
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.6, top_p=0.95, top_k=20)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))

vLLM

vllm serve Tesslate/OmniCoder-2-9B --tensor-parallel-size 1 --max-model-len 65536
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="token")
response = client.chat.completions.create(
    model="Tesslate/OmniCoder-2-9B",
    messages=[{"role": "user", "content": "Explain the difference between a mutex and a semaphore."}],
    temperature=0.6,
)
print(response.choices[0].message.content)

llama.cpp (GGUF)

llama-cli --hf-repo Tesslate/OmniCoder-2-9B-GGUF --hf-file omnicoder-2-9b-q4_k_m.gguf -p "Your prompt" -c 8192

Training Details

v1 vs v2 Pipeline

| | OmniCoder-9B (v1) | OmniCoder-2-9B (v2) | |:---|:---|:---| | Loss computed on | All tokens (system, user, tool, assistant) | Assistant tokens only | | LR schedule | Cosine (decayed to zero too early) | Constant with warmup | | EOS training | Global | Per-turn (train_on_eos: turn) | | Thinking blocks | Stripped by stock template | Preserved on all turns (custom Jinja2) | | Repetition | Frequent loops in generation | Near-zero | | Convergence | LR killed learning at step 60 | Clean convergence at step 80-100 |

Training Config

| | | |:---|:---| | Base Model | Qwen3.5-9B | | Method | LoRA SFT (r=64, alpha=32, all layers incl. MLP) | | Dataset | 425K agentic trajectories from 5 sources | | Loss | Assistant tokens only (roles_to_train: [assistant]) | | Sequence Length | 65,536 tokens (sample packing) | | LR Schedule | Constant with warmup (2e-4, 10 warmup steps) | | Hardware | 4x NVIDIA H200 (DDP) | | Framework | Axolotl | | Precision | bf16 | | Optimizer | AdamW (weight_decay=0.001) | | Effective Batch | 32 | | Steps | 350 (~10% of one epoch) |

Training Data Sources

| Source | Samples | Description | |:---|---:|:---| | NVIDIA Nemotron-Terminal-Corpus | 226K | Terminal agent trajectories | | CoderForge-Preview (reward >= 0.5) | 155K | SWE-bench style coding trajectories | | Nemotron Skill-Based | 24K | Skill-based coding tasks | | Scale-SWE | 20K | Real GitHub issue patches (synthesized trajectories) | | Opus Reasoning | 2.3K | Chain-of-thought reasoning |

Why Constant LR? (Schulman "LoRA Without Regret")

v1 used a cosine LR schedule over 80 steps. The learning rate decayed to zero by step 60, killing learning before the model converged. Loss appeared to plateau at 0.45 but was actually still dropping.

v2 follows Schulman et al.'s findings:

  • Constant LR. No decay, no premature convergence death.
  • LoRA LR ~10x FullFT. Our 2e-4 is correct for 9B-class models.
  • LoRA on ALL layers including MLP. Attention-only LoRA significantly underperforms.
  • Batch size 32. LoRA is less tolerant of large batches; 32 is the sweet spot.

Architecture

OmniCoder-2 inherits Qwen3.5-9B's hybrid architecture:

  • Gated Delta Networks. Linear attention layers interleaved with standard attention for efficient long-range dependencies.
  • VLM Backbone. Built on Qwen3_5ForConditionalGeneration.
  • 262K Native Context. Full 262,144 token context window.

Recommended Sampling Parameters

| Parameter | Value | |:---|:---| | Temperature | 0.6 | | Top-P | 0.95 | | Top-K | 20 | | Presence Penalty | 0.0 |

For agentic / tool-calling tasks, consider lower temperature (0.2-0.4) for more deterministic behavior.


Limitations

  • Performance on non-English tasks has not been extensively evaluated
  • Tool-calling format is flexible but works best with the scaffolding patterns seen in training

Acknowledgments

Special thanks to the Axolotl team and the discussion in axolotl#3453 for helping get Qwen3.5 packing support working.


Citation

@misc{omnicoder2_2025,
  title={OmniCoder-2-9B: A Frontier Open Coding Agent},
  author={Tesslate},
  year={2025},
  url={https://huggingface.co/Tesslate/OmniCoder-2-9B}
}

<div align="center">

Built by Tesslate

</div>

Author: Tesslate

Likes: 22

Downloads: 0

Tags: transformers, safetensors, qwen3_5, image-text-to-text, qwen3.5, code, agent, sft, omnicoder, tesslate, text-generation, conversational, en, base_model:Qwen/Qwen3.5-9B, base_model:finetune:Qwen/Qwen3.5-9B, license:apache-2.0, model-index, endpoints_compatible, region:us

allenai/MolmoWeb-8B-Native


license: apache-2.0 datasets:

  • allenai/MolmoWeb-SyntheticTraj
  • allenai/MolmoWeb-HumanTrajs
  • allenai/MolmoWeb-HumanSkills
  • allenai/MolmoWeb-SyntheticSkills
  • allenai/MolmoWeb-SyntheticQA
  • allenai/MolmoWeb-SyntheticGround language:
  • en base_model:
  • Qwen/Qwen3-8B
  • google/siglip-so400m-patch14-384 pipeline_tag: image-text-to-text library_name: transformers tags:
  • multimodal
  • olmo
  • molmo
  • molmo2

<img src="molmoweb_logo.png" alt="Logo for the MolmoWeb Project" style="width: auto; height: 50px;">

MolmoWeb-8B-Native

Note that this is the molmo-native checkpoint, and it's NOT Huggingface/transformers-compatible. Check out allenai/MolmoWeb-8B for HF-compatible checkpoint.

MolmoWeb is a family of fully open multimodal web agents. MolmoWeb agents achieve state-of-the-art results outperforming similar scale open-weight-only models such as Fara-7B, UI-Tars-1.5-7B, and Holo1-7B. MolmoWeb-8B also surpasses set-of-marks (SoM) agents built on much larger closed frontier models like GPT-4o. We further demonstrate consistent gains through test-time scaling via parallel rollouts with best-of-N selection, achieving 94.7% and 60.5% pass@4 (compared to 78.2% and 35.3% pass@1)on WebVoyager and Online-Mind2Web respectively.

Learn more about the MolmoWeb family in our announcement blog post and tech report.

MolmoWeb-8B-Native is based on Molmo2 architecture, which uses Qwen3-8B and SigLIP 2 as vision backbone.

Ai2 is committed to open science. The MolmoWeb datasets are available here. All other artifacts used in creating MolmoWeb (training code, evaluations, intermediate checkpoints) will be made available, furthering our commitment to open-source AI development and reproducibility.

Quick links:

Usage

Please refer to our Github repo for inference code.

License and Use

This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with Ai2’s Responsible Use Guidelines.

Author: allenai

Likes: 6

Downloads: 0

Tags: transformers, multimodal, olmo, molmo, molmo2, image-text-to-text, en, dataset:allenai/MolmoWeb-SyntheticTraj, dataset:allenai/MolmoWeb-HumanTrajs, dataset:allenai/MolmoWeb-HumanSkills, dataset:allenai/MolmoWeb-SyntheticSkills, dataset:allenai/MolmoWeb-SyntheticQA, dataset:allenai/MolmoWeb-SyntheticGround, arxiv:2601.10611, base_model:Qwen/Qwen3-8B, base_model:finetune:Qwen/Qwen3-8B, license:apache-2.0, endpoints_compatible, region:us

Jackrong/Qwen3.5-9B-Neo


language:

  • en
  • zh license: apache-2.0 base_model: Qwen/Qwen3.5-9B tags:
  • unsloth
  • qwen
  • qwen3.5
  • reasoning
  • chain-of-thought
  • lora
  • competitive-programming pipeline_tag: text-generation datasets:
  • Jackrong/Competitive-Programming-python-blend
  • stepfun-ai/Step-3.5-Flash-SFT

🌟 Jackrong/Qwen3.5-9B-Neo

📢 Announcement

Neo Update: This iteration focuses extensively on achieving meaningful gains in reasoning and mathematical performance while maintaining competitive general accuracy.

Neo introduces a highly optimized reasoning scaffold designed to eliminate redundant internal loops and circular reasoning. Unlike standard models that simply think longer when faced with difficult tasks, Neo is built to think smarter, not longer. Evaluated on the LM Evaluation Harness leaderboard suite, it delivers notable improvements in BBH (+0.87 pp), MATH Hard (+0.98 pp), and MUSR (+2.91 pp) — the benchmarks that most directly probe structured multi-step reasoning and logical deduction.

💡 Model Introduction

Jackrong/Qwen3.5-9B-Neo is a reasoning-focused fine-tune of the Qwen3.5-9B model. Its primary objective is to improve the quality of chain-of-thought generation, with a particular focus on harder reasoning and mathematical tasks, while remaining broadly competitive across general academic benchmarks.

The goal of this Neo model is not simply to make the model "think more," but to help it think more structuredly: eliminating unnecessary verbose over-analysis, anchoring intermediate steps, and improving multi-hop logical consistency. Based on the LM Eval Harness leaderboard evaluation (conducted by community member selimaktas), the 9B-Neo achieves improvements on three of the four sub-benchmarks — BBH, MATH Hard, and MUSR — with a notable +2.91 pp gain on MUSR (multi-step reasoning under uncertainty) and +0.98 pp on MATH Hard.

LM Eval Harness Benchmark Results 🪐

Screenshot 2026-03-25 at 12.01.47 AM

Screenshot 2026-03-25 at 12.04.03 AM

Screenshot 2026-03-24 at 11.46.19 PM

| Group / Task | Metric | Qwen3.5-9B | Qwen3.5-9B-Neo | Δ | |---|---|---|---|---| | leaderboard_bbh | acc_norm ↑ | 0.6190 | 0.6277 | +0.87 pp | | leaderboard_gpqa | acc_norm ↑ | 0.4446 | 0.4136 | −3.10 pp | | leaderboard_math_hard | exact_match ↑ | 0.3965 | 0.4063 | +0.98 pp | | leaderboard_musr | acc_norm ↑ | 0.4339 | 0.4630 | +2.91 pp |

📌 Note on IFEval: The instruction-following metrics (IFEval prompt/inst level) show a regression in this version. This is a known trade-off from the current training pipeline's emphasis on structured reasoning over format-following, and will be an area of focus in future iterations.

Evaluation conducted by community member selimaktas using the LM Evaluation Harness framework. March 2026.

🗺️ Training Pipeline Overview

Base Model (Qwen/Qwen3.5-9B)
 │
 ▼
Qwen3.5-9B fine-tuned with Unsloth
 │
 ▼
Supervised Fine-Tuning (SFT) + LoRA
(Response-Only Training masked on "<|im_start|>assistant\n<think>")
 │
 ▼
Jackrong/Qwen3.5-9B-Neo

🧠 Example of Learned Reasoning Scaffold

Through robust data cleaning and formatting, the model was conditioned to explicitly structure its thought processes inside <think>...</think> tags before emitting the final answer. This forces the model to methodically break down complex programming or logical problems without repetitive thoughts.

<|im_start|>user
[User Query here]<|im_end|>
<|im_start|>assistant
<think>
    ...
</think>
[Final concise and accurate answer]

📚 All Datasets Used

The dataset consists of high-quality, filtered reasoning distillation data merged during the pipeline. Our pipeline dynamically sampled and structured conversations, strictly maintaining the intended layout.

  1. stepfun-ai/Step-3.5-Flash-SFT
  2. Jackrong/Competitive-Programming-python-blend (A custom curated blend specifically for Python competitive programming and logic).

Detailed breakdown of the Competitive-Programming-python-blend:

| Source | Role in the Blend | |--------|-------------------| | nohurry/Opus-4.6-Reasoning-3000x-filtered | Reasoning-heavy synthetic SFT data | | Jackrong/Qwen3.5-reasoning-700x | Distilled reasoning and instruction-following data | | nvidia/Nemotron-SFT-Competitive-Programming-v2 (competitive_coding_python) | Primary Python competitive-programming supervision | | nvidia/Nemotron-SFT-Competitive-Programming-v2 (competitive_coding_cpp) | Small cross-language competitive-programming supplement | | nvidia/Nemotron-SFT-SWE-v2 (agentless) | Lightweight agentless SWE-style supervision | | nvidia/Nemotron-SFT-Instruction-Following-Chat-v2 (reasoning_on) | Small reasoning-oriented chat supplement |

⚠️ Limitations & Intended Use

  • Instruction Following Regression: This version shows reduced IFEval scores compared to the base model, reflecting the current pipeline's trade-off toward structured reasoning over rigid format compliance. This will be addressed in future iterations.
  • Hallucination Risk: While reasoning is strong, the model remains an autoregressive LLM; external facts provided during the thinking sequence may occasionally contain hallucinations if verifying real-world events.
  • Context Boundaries: In rare cases of extremely complex logic where the model struggles to converge, it may exhibit truncation events from excessive circular thinking.
  • Intended Scenario: Best suited for offline analytical tasks, coding, competitive programming, math, and heavy logic-dependent prompting where the user needs to transparently follow the AI's internal logic with high token efficiency.
  • This model is a test version intended solely for learning and demonstration purposes, and is for academic research and technical exploration use only.

🙏 Acknowledgements

Significant thanks to the Unsloth AI team for making rapid fine-tuning of large LLM models accessible. Additionally, we acknowledge Qwen internally, and the open-source community developers producing exceptional distilled datasets. Special thanks to selimaktas for conducting the LM Eval Harness benchmark evaluation.

Author: Jackrong

Likes: 6

Downloads: 48

Tags: safetensors, qwen3_5, unsloth, qwen, qwen3.5, reasoning, chain-of-thought, lora, competitive-programming, text-generation, conversational, en, zh, dataset:Jackrong/Competitive-Programming-python-blend, dataset:stepfun-ai/Step-3.5-Flash-SFT, base_model:Qwen/Qwen3.5-9B, base_model:adapter:Qwen/Qwen3.5-9B, license:apache-2.0, region:us

Tesslate/OmniCoder-2-9B-GGUF


base_model: Tesslate/OmniCoder-2-9B tags:

  • llama-cpp
  • gguf
  • qwen3.5
  • omnicoder
  • tesslate
  • code
  • agent license: apache-2.0

<div align="center"> <img src="https://huggingface.co/Tesslate/OmniCoder-2-9B/resolve/main/omnicoder-banner.png" alt="OmniCoder" width="720">

OmniCoder-9B-GGUF

GGUF quantizations of OmniCoder-9B

License Full Weights

</div>

Available Quantizations

| Quantization | Size | Use Case | |:---|---:|:---| | Q2_K | ~3.8 GB | Extreme compression, lowest quality | | Q3_K_S | ~4.3 GB | Small footprint | | Q3_K_M | ~4.6 GB | Small footprint, balanced | | Q3_K_L | ~4.9 GB | Small footprint, higher quality | | Q4_0 | ~5.3 GB | Good balance | | Q4_K_S | ~5.4 GB | Good balance | | Q4_K_M | ~5.7 GB | Recommended for most users | | Q5_0 | ~6.3 GB | High quality | | Q5_K_S | ~6.3 GB | High quality | | Q5_K_M | ~6.5 GB | High quality, balanced | | Q6_K | ~7.4 GB | Near-lossless | | Q8_0 | ~9.5 GB | Highest quality quantization | | BF16 | ~17.9 GB | Full precision |

Usage

# Install llama.cpp
brew install llama.cpp  # macOS
# or build from source: https://github.com/ggml-org/llama.cpp

# Interactive chat
llama-cli --hf-repo Tesslate/OmniCoder-2-9B-GGUF --hf-file omnicoder-9b-q4_k_m.gguf -p "Your prompt" -c 8192

# Server mode (OpenAI-compatible API)
llama-server --hf-repo Tesslate/OmniCoder-2-9B-GGUF --hf-file omnicoder-9b-q4_k_m.gguf -c 8192

<div align="center">

Built by Tesslate | See full model card: OmniCoder-9B

</div>

Author: Tesslate

Likes: 3

Downloads: 0

Tags: gguf, llama-cpp, qwen3.5, omnicoder, tesslate, code, agent, base_model:Tesslate/OmniCoder-2-9B, base_model:quantized:Tesslate/OmniCoder-2-9B, license:apache-2.0, endpoints_compatible, region:us, conversational

elix3r/LTX-2.3-22b-AV-LoRA-talking-head


license: openrail base_model: Lightricks/LTX-2.3 tags:

  • lora
  • ltx-video
  • talking-head
  • video-generation
  • audio-visual
  • ltx-2-3
  • ltx-2 language:
  • en pipeline_tag: image-to-video library_name: diffusers

elix3r/LTX-2.3-22b-AV-LoRA-talking-head

Overview

This is the first community audio-visual (AV) LoRA for LTX-Video 2.3, trained using the joint audio-video cross-attention architecture of the LTX-2.3 22B model. The LoRA enables talking head video generation with synchronized lip sync and internalized voice characteristics from a reference character.

This release is a character-specific implementation and reference pipeline. The weights demonstrate a working AV LoRA trained on a custom dataset. The methodology, dataset structure, caption format, and training config are fully documented and reusable for training your own character-specific AV LoRA.


What It Does

  • Generates talking head videos with synchronized lip sync from a reference image
  • Internalizes voice characteristics without requiring external audio input at inference time
  • Preserves character identity across unseen reference images and backgrounds

Demo Results (v1)

  • Lip sync: accurate and consistent
  • Identity preservation: locks in at step 1250, improves linearly to step 2000
  • Voice characteristics: internalized from training data
  • Known limitations: slight audio buzz artifacts, occasional eye blinking inconsistency, seed-dependent output quality

How To Use

Requirements

  • ComfyUI Workflow (examples)
  • LTX-2.3 Model
  • Power Lora Loader node

Loading the LoRA

Load LTX-2.3-22b-AV-LoRA-talking-head-v1.safetensors via the Power Lora Loader node in ComfyUI.

Set LoRA strength to 1.0.

Recommended Inference Settings

| Parameter | Value | |-----------|-------| | Resolution | 1280x736 | | FPS | 24 | | Video length | Any (10+ seconds recommended) | | LoRA strength | 1.0 | | Trigger word | OHWXPERSON | | CFG scale | 1.0 |

Note: 1280x736 @ 24fps is recommended for image-to-video inference. For image + audio to video inference, use 1280x704 @ 25fps to match the training distribution.

Prompt Format

Include the trigger word OHWXPERSON and end the prompt with the speech transcript:

OHWXPERSON, [visual description]. The person is talking, and he says: "[transcript]"

Training Your Own AV LoRA

This section documents the full pipeline so you can train a character-specific AV LoRA for your own subject.

Pipeline Overview

Reference Images
      |
      v
Flux.1 Kontext / Flux.2 Klein     -- Image generation
      |
      v
Fish Audio S2 Pro                 -- Voice cloning + TTS
      |
      v
LTX-Video 2.3                     -- Talking head video generation
      |
      v
LTX-2 trainer                     -- AV LoRA training
      |
      v
Trained AV LoRA weights

Step 1 -- Generate Reference Images

Use Flux Kontext in ComfyUI to generate consistent reference images of your character across varied poses, angles, lighting conditions, and expressions.

[KONTEXT WORKFLOW]

Key settings used in this project:

  • Flux Kontext dev Q6_K GGUF
  • Sampler: res_3s + res_2m (RES4LYF)
  • FluxGuidance: 1
  • denoise: 1

Step 2 -- Clone the Voice

Use Fish Audio S2 Pro (model) with a 10-15 second reference audio clip of your target voice. Supports [pause], [short pause], and [emphasis] tags for pacing control.

Generate TTS audio for each clip's script using the cloned voice.

Step 3 -- Generate Training Clips

Use LTX-2.3 in ComfyUI to generate talking head clips from your reference images.

[LTX-2.3 IMAGE + AUDIO TO VIDEO WORKFLOW]

Dataset requirements:

  • 25-30 clips minimum
  • Resolution: 1280x704
  • FPS: 25
  • Length: 6-10 seconds per clip after trimming
  • Variety: front facing, 3/4 angles, side profile, different backgrounds, multiple emotions

Prompt format for each clip:

[scene description]. Mouth partially open during speech with only the front teeth partially visible, lips moving naturally without fully exposing all teeth. Smooth continuous motion, cinematic, realistic, sharp focus on subject. The person is talking, and he says: "[transcript]"

Background complexity directly impacts lip sync quality. Simple and dark backgrounds produce the best results. Complex backgrounds with many competing elements reduce lip sync accuracy.

Step 4 -- Prepare the Dataset

Structure your dataset folder as follows:

ohwxperson_dataset_v1/
  clip_001.mp4          # video with embedded audio from LTX-2.3
  clip_002.mp4
  ...
  CAPTIONS.json

Caption format in CAPTIONS.json:

{
  "captions": [
    {
      "file": "clip_001.mp4",
      "caption": "[VISUAL] OHWXPERSON, [visual description of scene, pose, clothing, background]. [SPEECH] OHWXPERSON speaks in a [voice description]: \"[exact transcript]\""
    }
  ]
}

A reference CAPTIONS.json from this project is included in this repository.

Step 5 -- Train with ltx-trainer

Recommended training configuration:

model:
  model_path: ltx-2.3-22b-dev.safetensors
  text_encoder_path: gemma
  training_mode: lora

lora:
  rank: 32
  alpha: 32
  target_modules: [to_k, to_q, to_v, to_out.0]

training_strategy:
  name: text_to_video
  with_audio: true
  first_frame_conditioning_p: 0.5

optimization:
  steps: 2000
  learning_rate: 1.0e-04
  batch_size: 1
  gradient_accumulation_steps: 1
  optimizer_type: adamw
  scheduler_type: linear
  mixed_precision_mode: bf16
  enable_gradient_checkpointing: true

validation:
  interval: 250
  inference_steps: 30
  guidance_scale: 4.0

Training Details

| Parameter | Value | |-----------|-------| | Base model | LTX-Video 2.3 22B | | Training mode | LoRA | | LoRA rank | 32 | | LoRA alpha | 32 | | Steps | 2000 | | Learning rate | 1e-4 | | Batch size | 1 | | Mixed precision | bf16 | | Dataset size | 26 clips | | Peak VRAM usage | 77.08 GB | | Training time | ~7.8 hours | | Training cost | ~$5.33 (GCP Spot G4 instance, RTX PRO 6000 96GB) | | Identity lock | Step 1250 |


Known Limitations (v1)

  • Slight audio buzz artifacts present in outputs
  • Eye blinking occasionally inconsistent (can be fixed by manual prompting)
  • Output quality is seed dependent -- sweep 3-5 seeds per generation
  • Character-specific weights -- lip sync and voice are tied to the trained character
  • Best results at 1280x736 @ 24fps

v2 Roadmap

  • Audio preprocessing with MelBand Roformer before training to eliminate buzz artifacts
  • Explicit eye blinking captions and dedicated blinking clips in dataset
  • Extended training to 2500-3000 steps
  • Larger and more diverse dataset

Files

| File | Description | |------|-------------| | LTX-2.3-22b-AV-LoRA-talking-head-v1.safetensors | Final trained LoRA weights (v1) | | CAPTIONS.json | Reference caption file for dataset structure | | ohwxperson_av_lora.yaml | Full training configuration | | flux_kontext_clownsharkextended.json | Flux Kontext workflow for generating reference images | | LTX-2-3-I2V.json | LTX-Video 2.3 Image to Video workflow | | LTX-2-3-I2V-Custom-Audio.json | LTX-Video 2.3 Image + Custom Audio to Video workflow |


Citation

If you use this model or methodology in your work, please credit this repository.


License

The LoRA weights are released for research and personal use. Commercial use requires separate permission.

Author: elix3r

Likes: 3

Downloads: 0

Tags: diffusers, lora, ltx-video, talking-head, video-generation, audio-visual, ltx-2-3, ltx-2, image-to-video, en, base_model:Lightricks/LTX-2.3, base_model:adapter:Lightricks/LTX-2.3, license:openrail, region:us

alvdansen/illustration-1.0-flux-dev


language:

  • en library_name: peft base_model: black-forest-labs/FLUX.1-dev tags:
  • flux
  • lora
  • illustration
  • anime
  • style
  • text-to-image
  • bande-dessinee
  • graphic-novel
  • risograph license: other

Alvdansen Illustration 1.0 - Flux Dev

A style LoRA for FLUX.1-dev trained on 244 curated illustration and anime reference images across diverse visual styles. Produces images spanning cute anime character design, European bande dessinee, indie risograph prints, storybook watercolor, retro shoujo manga, and graphic novel illustration.

Style Influences

  • Japanese anime and manga illustration -- clean cel shading, expressive character design, shoujo and slice-of-life aesthetics
  • European graphic novel and bande dessinee -- ink crosshatching, atmospheric landscapes, ligne claire
  • Indie illustration and zine culture -- risograph print textures, limited palettes, grainy halftone
  • Children's book and storybook illustration -- watercolor washes, warm palettes, charming character proportions
  • Retro anime aesthetics -- 80s/90s anime film stills, VHS grain, bold color blocking
  • Concept art and character design -- flat color fills, turnaround sheets, fashion illustration

Usage

No trigger word -- this is a style LoRA. Describe what you want and add style cues like "ink and watercolor", "clean cel shading", "risograph print" to steer the output.

Recommended Inference Settings

  • Sampler: dpmpp_2m
  • Scheduler: sgm_uniform
  • CFG: 1.0
  • Steps: 35 (20-50 works well)
  • LoRA strength: 0.8-1.0

Sample Generations

<img src="samples/ComfyUI_00256__1.png" width="400">

"close-up, a girl with short messy silver hair and round glasses, wearing a chunky knit turtleneck sweater, one hand tucking hair behind her ear, looking slightly past the viewer with half-closed eyes, anime style" -- 50 steps

<img src="samples/ComfyUI_00268_.png" width="400">

"a girl with short hair in a bomber jacket leaning against a wall, clean cel shading, bold graphic composition, 90s ranma era anime, film grain" -- 35 steps

<img src="samples/ComfyUI_00245_.png" width="400">

"a knight on horseback crossing a stone bridge, european bande dessinee, fine ink crosshatching, muted sage and ochre palette, atmospheric perspective" -- 50 steps

<img src="samples/ComfyUI_00153_.png" width="400">

"a fox sleeping in a hollowed-out log, children's book watercolor, soft wet-on-wet washes, autumn leaf palette" -- 35 steps

<img src="samples/ComfyUI_00157_.png" width="400">

"a wanderer approaching a stone gate in the desert, european graphic novel, detailed ink hatching, warm sand tones" -- 35 steps

<img src="samples/ComfyUI_00249_.png" width="400">

"portrait, a girl with star-shaped hair clips, bold graphic shapes, limited three-color palette, screen print flatness, harajuku fashion illustration" -- 35 steps

<img src="samples/ComfyUI_00263__1.png" width="400">

"a cat wearing a tiny cape perched on a fence post, indie risograph print, two-color teal and coral, grainy paper texture" -- 50 steps

<img src="samples/ComfyUI_00257_.png" width="400">

"an astronaut sitting on a rocky surface with a small robot, retro watercolor, warm olive and cream tones, hand-painted feel" -- 35 steps

<img src="samples/ComfyUI_00146_.png" width="400">

"wide shot, a girl on a bicycle coasting downhill, ghibli film still, clean cel shading, golden hour warmth, anime style lofi" -- 35 steps

<img src="samples/ComfyUI_00419_.png" width="400">

"wide shot, a lone figure on a cliff overlooking the sea with seagulls, bande dessinee, fine ink hatching, muted blue-gray" -- 35 steps

<img src="samples/ComfyUI_00418_.png" width="400">

"wide shot, an abandoned greenhouse overgrown with wildflowers and ferns, glass panels cracked and missing, a rusty bicycle leaning against a broken door, afternoon sunlight streaming through the gaps" -- 35 steps

<img src="samples/ComfyUI_00504_.png" width="400">

"portrait, a girl with flowers growing from her hair, risograph print, three-color pink blue and cream, grainy texture" -- 35 steps

<img src="samples/ComfyUI_00505_.png" width="400">

"a dragon curled around a lighthouse, detailed ink linework, teal and coral duotone, screen print aesthetic" -- 35 steps

<img src="samples/ComfyUI_00506_.png" width="400">

"two kids running through tall grass at sunset, loose ink and flat color, dusty pink sky, over the garden wall aesthetic" -- 35 steps

<img src="samples/ComfyUI_00508__1.png" width="400">

"a girl with short hair in a bomber jacket leaning against a wall, clean cel shading, bold graphic composition, pink and black, 90s anime, film grain" -- 35 steps

<img src="samples/ComfyUI_00486_.png" width="400">

"profile view, a woman with flowers braided into her hair, art nouveau linework, flowing organic curves, muted sage and gold" -- 20 steps

<img src="samples/ComfyUI_00493_.png" width="400">

"wide shot, a lighthouse keeper climbing spiral stairs, bande dessinee vertical panel, detailed architectural perspective, steel blue monochrome" -- 50 steps

<img src="samples/ComfyUI_01832_.png" width="400">

"a lone traveler on horseback crossing a vast desert plain toward enormous turquoise crystal pillars rising from the sand, a ringed planet low on the horizon, long shadows stretching across the dunes" -- 35 steps

<img src="samples/ComfyUI_01839_.png" width="400">

"a girl with headphones sitting on a rooftop water tank, 90s anime screenshot, soft cel shading, hazy pink sunset" -- 60 steps

Training Details

  • Base model: FLUX.1-dev (8-bit quantized)
  • Training steps: 53,500
  • Rank/Alpha: 42/42
  • Learning rate: 5e-5
  • Optimizer: AdamW 8-bit
  • Caption dropout: 0.35
  • EMA: enabled (decay 0.99)
  • Dataset: 244 curated images across 4 subsets, trained sequentially then consolidated on the full dataset
  • Trainer: ai-toolkit by Ostris

Author: alvdansen

Likes: 3

Downloads: 0

Tags: peft, flux, lora, illustration, anime, style, text-to-image, bande-dessinee, graphic-novel, risograph, en, base_model:black-forest-labs/FLUX.1-dev, base_model:adapter:black-forest-labs/FLUX.1-dev, license:other, region:us

ReadyArt/Omega-Evolution-9B-v1.0


base_model:

  • Qwen/Qwen3.5-9B base_model_relation: finetune tags:
  • nsfw
  • explicit
  • roleplay
  • unaligned
  • dangerous
  • ERP
  • Other License license: apache-2.0

<style> :root { --primary-glow: #ff4d00; /* Danger Orange */ --secondary-glow: #00ffcc; /* Cyber Cyan */ --dark-bg: #050505; --card-bg: #111111; --text-main: #e0e0e0; --text-muted: #a0a0a0; --danger: #ff0000; } body { font-family: 'Courier New', monospace; /* Typewriter feel for that "classified" vibe */ background-color: var(--dark-bg); color: var(--text-main); margin: 0; padding: 0; overflow-x: hidden; cursor: crosshair; /* Weaponized cursor */ perspective: 1000px; } /* CRT Scanline Overlay */ body::after { content: ""; position: fixed; top: 0; left: 0; width: 100vw; height: 100vh; background: repeating-linear-gradient( 0deg, rgba(0, 0, 0, 0.15), rgba(0, 0, 0, 0.15) 1px, transparent 1px, transparent 2px ); pointer-events: none; z-index: 9999; animation: flicker 0.15s infinite; } @keyframes flicker { 0% { opacity: 0.9; } 50% { opacity: 1; } 100% { opacity: 0.95; } } .container { max-width: 900px; margin: 0 auto; padding: 40px 20px; background: radial-gradient(circle at center, #1a1a1a 0%, #000000 100%); border: 1px solid #333; box-shadow: 0 0 50px rgba(0, 0, 0, 0.8), inset 0 0 100px rgba(0,0,0,0.9); position: relative; animation: containerEntrance 1.5s cubic-bezier(0.22, 1, 0.36, 1); } @keyframes containerEntrance { from { transform: scale(0.95) rotateX(5deg); opacity: 0; } to { transform: scale(1) rotateX(0); opacity: 1; } } /* Glitchy Header */ .header { text-align: center; margin-bottom: 60px; position: relative; } .model-name { font-size: 3.5em; font-weight: 900; text-transform: uppercase; letter-spacing: 5px; color: transparent; -webkit-text-stroke: 1px var(--text-main); text-shadow: 2px 2px 0px var(--danger), -2px -2px 0px var(--secondary-glow); animation: textGlitch 3s infinite; position: relative; } .model-name span { display: inline-block; } @keyframes textGlitch { 0% { transform: skewX(0); text-shadow: 2px 2px 0px var(--danger), -2px -2px 0px var(--secondary-glow); } 2% { transform: skewX(-10deg); } 4% { transform: skewX(10deg); text-shadow: 3px 3px 0px var(--danger), -3px -3px 0px var(--secondary-glow); } 6% { transform: skewX(0); } 100% { transform: skewX(0); } } .subtitle-2 { font-size: 2.2em; color: var(--secondary-glow); margin-top: 10px; letter-spacing: 2px; text-shadow: 0 0 10px var(--secondary-glow); animation: pulseSlow 4s infinite; } .subtitle { font-size: 1.2em; color: var(--secondary-glow); margin-top: 10px; letter-spacing: 2px; text-shadow: 0 0 10px var(--secondary-glow); animation: pulseSlow 4s infinite; } @keyframes pulseSlow { 0%, 100% { opacity: 0.5; filter: blur(1px); } 50% { opacity: 1; filter: blur(0); } } /* Waifu Container */ .waifu-container { margin: 30px auto; width: 100%; max-width: 800px; position: relative; overflow: hidden; border-radius: 4px; } .waifu-container::before { content: ''; position: absolute; top: -50%; left: -50%; width: 200%; height: 200%; background: conic-gradient(from 0deg, transparent, rgba(255, 0, 0, 0.1), transparent); animation: rotate 4s linear infinite; pointer-events: none; } @keyframes rotate { from { transform: rotate(0deg); } to { transform: rotate(360deg); } } .waifu-img { width: 100%; height: auto; display: block; filter: contrast(1.1) saturate(1.2); animation: imageZoom 20s infinite alternate; } @keyframes imageZoom { from { transform: scale(1); } to { transform: scale(1.02); } } /* Section Styling */ .section { background: rgba(20, 20, 20, 0.9); border-left: 3px solid var(--primary-glow); margin: 40px 0; padding: 25px; box-shadow: 0 10px 30px rgba(0, 0, 0, 0.5); transition: all 0.3s ease; position: relative; overflow: hidden; } .section:hover { transform: translateX(10px); border-left-color: var(--danger); box-shadow: 0 10px 40px rgba(255, 0, 0, 0.1); } .section::after { content: ''; position: absolute; top: 0; left: 0; width: 100%; height: 2px; background: linear-gradient(90deg, transparent, var(--primary-glow), transparent); animation: scanline 2s linear infinite; } @keyframes scanline { 0% { transform: translateX(-100%); } 100% { transform: translateX(100%); } } .section-title { font-size: 1.8em; color: var(--text-main); margin-top: 0; display: flex; align-items: center; gap: 10px; } .section-title::before { content: '🔒'; animation: shake 2s infinite; } @keyframes shake { 0%, 100% { transform: rotate(0deg); } 25% { transform: rotate(-5deg); } 75% { transform: rotate(5deg); } } /* Lists and Content */ .section ul { list-style: none; padding: 0; } .section li { margin-bottom: 15px; padding-left: 20px; position: relative; color: var(--text-muted); line-height: 1.6; } .section li::before { content: '> '; color: var(--danger); font-weight: bold; position: absolute; left: 0; } /* Technical Specs */ .specs-grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(200px, 1fr)); gap: 20px; margin-top: 20px; } .spec-card { background: rgba(0,0,0,0.3); border: 1px solid #333; padding: 15px; text-align: center; transition: all 0.3s; } .spec-card:hover { border-color: var(--secondary-glow); box-shadow: 0 0 15px rgba(0, 255, 204, 0.2); transform: translateY(-5px); } .spec-value { display: block; font-size: 1.5em; font-weight: bold; color: var(--secondary-glow); } /* Credits */ .credit-list { display: flex; flex-direction: column; gap: 15px; } .credit-item { display: flex; align-items: center; background: linear-gradient(90deg, #1a1a1a, #2a2a2a); padding: 15px; border-radius: 4px; border-left: 2px solid var(--text-muted); transition: all 0.3s; } .credit-item:hover { border-left-color: var(--secondary-glow); padding-left: 25px; box-shadow: 0 0 20px rgba(0, 255, 204, 0.1); } .avatar { width: 50px; height: 50px; border-radius: 50%; border: 2px solid #333; margin-right: 20px; object-fit: cover; } /* License */ .license-warning { color: var(--danger); font-weight: bold; border: 1px solid var(--danger); padding: 20px; text-align: center; background: rgba(255, 0, 0, 0.05); margin: 30px 0; animation: pulseWarning 2s infinite; } @keyframes pulseWarning { 0%, 100% { opacity: 0.5; box-shadow: 0 0 10px rgba(255,0,0,0.2); } 50% { opacity: 1; box-shadow: 0 0 30px rgba(255,0,0,0.6); } } /* Interactive JS Elements */ .curtain-text { position: absolute; top: -100px; left: 0; color: var(--danger); font-size: 0.7em; opacity: 0; transition: all 0.5s ease; pointer-events: none; } .curtain-text.show { top: 10px; opacity: 1; } /* Footer */ footer { text-align: center; margin-top: 60px; padding: 20px; border-top: 1px solid #333; color: #555; font-size: 0.8em; } footer:hover .hidden-truth { color: var(--text-main); opacity: 1; } .hidden-truth { opacity: 0; transition: all 0.5s ease; font-weight: bold; color: var(--danger); } /* Fire Emoji */ .fire-emoji { animation: burn 1s infinite alternate; display: inline-block; } .fire-emoji:nth-child(1) { animation-delay: 0s; } .fire-emoji:nth-child(2) { animation-delay: 0.5s; } @keyframes burn { from { transform: scale(1); filter: drop-shadow(0 0 5px var(--danger)); } to { transform: scale(1.2) rotate(10deg); filter: drop-shadow(0 0 15px var(--danger)); } } /* Responsive */ @media (max-width: 768px) { .model-name { font-size: 2em; } .section { padding: 15px; } } </style> <div class="container"> <div class="header"> <p class="subtitle-2">😈 OMEGA EVOLUTION V1.0 😈</p> <p class="subtitle">⚠️ 9B Parameters ⚠️</p> <p class="subtitle">⚠️ This model has issues. Consider v2.0 once it's ready. ⚠️</p> </div> <div class="waifu-container"> <img src="https://huggingface.co/spaces/ReadyArt/README/resolve/main/newwaifu.webp" class="waifu-img" alt="Omega Subject"> </div> <div class="section"> <h2 class="section-title">🔴 CLASSIFIED WARNINGS</h2> <ul> <li>This is a <strong>hybrid construct</strong> of Safeword Omega Directive, Safeword Omega Darker, and Brisk Evolution v0.1.</li> <li><strong>COGNITIVE DANGER:</strong> Reasoning capabilities have not been trained on. <code>use kwargs to disable thinking (DO NOT PREFILL THINK)</code> to do otherwise is madness</li> <li><strong>CONTENT WARNING:</strong> NSFW, Explicit, ERP, and Unaligned behavior are enabled by default.</li> </ul> </div> <div class="section" id="tech-specs"> <h2 class="section-title">⚙️ SYSTEM PARAMETERS</h2> <div class="specs-grid"> <div class="spec-card"> <span>min_p</span> <span class="spec-value">0.02</span> </div> <div class="spec-card"> <span>top_p</span> <span class="spec-value">0.9</span> </div> <div class="spec-card"> <span>temp</span> <span class="spec-value">0.7</span> </div> </div> </div> <div class="section" id="credits"> <h2 class="section-title">🧪 ARCHITECTS</h2> <ul class="credit-list"> <li class="credit-item"> <img src="https://huggingface.co/avatars/55f24699e05af4295a9d16ddecd81f8a.svg" alt="GECFDO" class="avatar"> <span>GECFDO <span style="font-size:0.7em; color:#888;">(Dataset Generation & Quants)</span></span> </li> <li class="credit-item"> <img src="https://cdn-avatars.huggingface.co/v1/production/uploads/673fa5ccbf2e9c35b2ec841a/rPHaMrqyYTfSJ89NN8KgY.jpeg" alt="Darkhn" class="avatar"> <span>Darkhn <span style="font-size:0.7em; color:#888;">(Dataset Cleanup Tool)</span></span> </li> <li class="credit-item"> <img src="https://huggingface.co/avatars/75a3eb8d24efb96b7b7e69340845028f.svg" alt="Sleep Deprived" class="avatar"> <span>Sleep Deprived <span style="font-size:0.7em; color:#888;">(Safeword Creator)</span></span> </li> <li class="credit-item"> <img src="https://cdn-avatars.huggingface.co/v1/production/uploads/6759e155bc947d6070775cb9/8ewjw-OfVOHwQgIxLv40v.png" alt="FrenzyBiscuit" class="avatar"> <span>FrenzyBiscuit <span style="font-size:0.7em; color:#888;">(Brisk Evolution Creator)</span></span> </li> </ul> </div> <div class="license-warning"> 🔥 LICENSE: APACHE 2.0 (WITH MORAL DISCLAIMER) 🔥<br> You accept full responsibility for corruption. You are 18+. The architects are not liable for the depravity you unleash. </div> <footer> <p>Generated in <span id="date">2026</span></p> <p>Current Contributor: <span id="credit">...</span></p> <div class="hidden-truth"> WE ARE WATCHING YOU. DO NOT LOOK BACK. </div> </footer> </div> <script> // Set Date document.getElementById('date').textContent = new Date().toLocaleDateString('en-US', { year: 'numeric', month: 'long', day: 'numeric', hour: '2-digit', minute: '2-digit' }); const contributors = [ "GECFDO", "Darkhn", "Sleep Deprived", "FrenzyBiscuit", "UNKNOWN ENTITY", "SYSTEM ROOT" ]; setInterval(() => { document.getElementById('credit').textContent = contributors[Math.floor(Math.random() * contributors.length)]; }, 7000); // Intrusive Flashing Warning setTimeout(() => { const warning = document.createElement('div'); warning.style.cssText = ` position: fixed; top: 50%; left: 50%; transform: translate(-50%, -50%); background: rgba(0,0,0,0.95); border: 2px solid red; color: red; padding: 20px; font-family: 'Courier New', monospace; font-size: 1.5em; z-index: 10000; text-align: center; box-shadow: 0 0 50px red; animation: shakeWarning 0.5s infinite; cursor: pointer; `; warning.innerHTML = "<span>⚠️ WARNING: DARKNESS AHEAD ⚠️<br><span style='font-size:0.6em'>Click anywhere to dismiss (watch out for the tentacles!)</span>"; warning.addEventListener('click', () => { warning.style.transition = 'opacity 1s'; warning.style.opacity = '0'; setTimeout(() => warning.remove(), 1000); }); document.body.appendChild(warning); }, 5000); @keyframes shakeWarning { 0% { transform: translate(-50%, -50%) rotate(0deg); } 25% { transform: translate(-55%, -55%) rotate(-2deg); } 50% { transform: translate(-45%, -45%) rotate(2deg); } 75% { transform: translate(-55%, -45%) rotate(-2deg); } 100% { transform: translate(-50%, -50%) rotate(0deg); } } // Random Glitch Effect on Mouse Move document.addEventListener('mousemove', (e) => { const x = e.clientX / window.innerWidth; const y = e.clientY / window.innerHeight; document.body.style.setProperty('--mouse-x', `${x}`); document.body.style.setProperty('--mouse-y', `${y}`); if (Math.random() > 0.95) { const randomText = document.createElement('div'); randomText.style.cssText = ` position: absolute; left: ${e.clientX}px; top: ${e.clientY}px; color: rgba(255, 0, 0, 0.5); font-size: 0.8em; font-family: monospace; pointer-events: none; animation: fadeOut 1s forwards; `; randomText.textContent = "ACCESS GRANTED"; document.body.appendChild(randomText); setTimeout(() => randomText.remove(), 1000); } }); // Sections that move when you leave the tab setInterval(() => { if (document.hidden) { document.querySelectorAll('.section').forEach(sec => { sec.style.transform = `translateX(${Math.random() * 10 - 5}px) rotate(${Math.random() * 0.5 - 0.25}deg)`; }); } else { document.querySelectorAll('.section').forEach(sec => { sec.style.transform = ''; }); } }, 1000); @keyframes fadeOut { to { opacity: 0; transform: translateY(-20px); } } </script>

Author: ReadyArt

Likes: 3

Downloads: 0

Tags: safetensors, qwen3_5, nsfw, explicit, roleplay, unaligned, dangerous, ERP, Other License, base_model:Qwen/Qwen3.5-9B, base_model:finetune:Qwen/Qwen3.5-9B, license:apache-2.0, region:us

LocalDoc/LocRet-small


language:

  • az license: apache-2.0 tags:
  • sentence-transformers
  • feature-extraction
  • sentence-similarity
  • retrieval
  • azerbaijani
  • embedding library_name: sentence-transformers pipeline_tag: sentence-similarity datasets:
  • LocalDoc/msmarco-az-reranked
  • LocalDoc/azerbaijani_retriever_corpus-reranked
  • LocalDoc/ldquad_v2_retrieval-reranked
  • LocalDoc/azerbaijani_books_retriever_corpus-reranked base_model: intfloat/multilingual-e5-small model-index:
  • name: LocRet-small results:
    • task: type: retrieval dataset: name: AZ-MIRAGE type: custom metrics:
      • type: mrr@10 value: 0.5250
      • type: ndcg@10 value: 0.6162
      • type: recall@10 value: 0.8948

LocRet-small — Azerbaijani Retrieval Embedding Model

LocRet-small is a compact, high-performance retrieval embedding model specialized for the Azerbaijani language. Despite being 4.8× smaller than BGE-m3, it significantly outperforms it on Azerbaijani retrieval benchmarks.

Key Results

AZ-MIRAGE Benchmark (Native Azerbaijani Retrieval)

| Rank | Model | Parameters | MRR@10 | P@1 | R@5 | R@10 | NDCG@5 | NDCG@10 | |:----:|:------|:---------:|:------:|:---:|:---:|:----:|:------:|:-------:| | #1 | LocRet-small | 118M | 0.5250 | 0.3132 | 0.8267 | 0.8948 | 0.5938 | 0.6162 | | #2 | BAAI/bge-m3 | 568M | 0.4204 | 0.2310 | 0.6905 | 0.7787 | 0.4791 | 0.5079 | | #3 | perplexity-ai/pplx-embed-v1-0.6b | 600M | 0.4117 | 0.2276 | 0.6715 | 0.7605 | 0.4677 | 0.4968 | | #4 | intfloat/multilingual-e5-large | 560M | 0.4043 | 0.2264 | 0.6571 | 0.7454 | 0.4584 | 0.4875 | | #5 | intfloat/multilingual-e5-base | 278M | 0.3852 | 0.2116 | 0.6353 | 0.7216 | 0.4390 | 0.4672 | | #6 | Snowflake/snowflake-arctic-embed-l-v2.0 | 568M | 0.3746 | 0.2135 | 0.6006 | 0.6916 | 0.4218 | 0.4516 | | #7 | Qwen/Qwen3-Embedding-4B | 4B | 0.3602 | 0.1869 | 0.6067 | 0.7036 | 0.4119 | 0.4437 | | #8 | intfloat/multilingual-e5-small (base) | 118M | 0.3586 | 0.1958 | 0.5927 | 0.6834 | 0.4079 | 0.4375 | | #9 | Qwen/Qwen3-Embedding-0.6B | 600M | 0.2951 | 0.1516 | 0.4926 | 0.5956 | 0.3339 | 0.3676 |

Usage

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("LocalDoc/LocRet-small")

queries = ["query: Azərbaycanın paytaxtı hansı şəhərdir?"]
passages = [
    "passage: Bakı Azərbaycan Respublikasının paytaxtı və ən böyük şəhəridir.",
    "passage: Gəncə Azərbaycanın ikinci böyük şəhəridir.",
]

query_embeddings = model.encode(queries)
passage_embeddings = model.encode(passages)

similarities = model.similarity(query_embeddings, passage_embeddings)
print(similarities)

Important: Always use "query: " prefix for queries and "passage: " prefix for documents.

Training

Method

LocRet-small is fine-tuned from multilingual-e5-small using listwise KL distillation combined with a contrastive loss:

$$\mathcal{L} = \mathcal{L}{\text{KL}} + 0.1 \cdot \mathcal{L}{\text{InfoNCE}}$$

  • Listwise KL divergence: Distills the ranking distribution from a cross-encoder teacher (bge-reranker-v2-m3) over candidate lists of 1 positive + up to 10 hard negatives per query. Teacher and student softmax distributions use asymmetric temperatures (τ_teacher = 0.3, τ_student = 0.05).
  • In-batch contrastive loss (InfoNCE): Provides additional diversity through in-batch negatives on positive passages.

This approach preserves the full teacher ranking signal rather than reducing it to binary relevance labels, which is critical for training on top of already strong pre-trained retrievers.

Data

The model was trained on approximately 3.5 million Azerbaijani query-passage pairs from four datasets:

| Dataset | Pairs | Domain | Type | |:--------|------:|:-------|:-----| | msmarco-az-reranked | ~1.4M | General web QA | Translated EN→AZ | | azerbaijani_books_retriever_corpus-reranked | ~1.6M | Books, politics, history | Native AZ | | azerbaijani_retriever_corpus-reranked | ~189K | News, culture | Native AZ | | ldquad_v2_retrieval-reranked | ~330K | Wikipedia QA | Native AZ |

All datasets include hard negatives scored by a cross-encoder reranker, which serve as the teacher signal for listwise distillation. False negatives were filtered using normalized score thresholds.

Hyperparameters

| Parameter | Value | |:----------|:------| | Base model | intfloat/multilingual-e5-small | | Max sequence length | 512 | | Effective batch size | 256 | | Learning rate | 5e-5 | | Schedule | Linear warmup (5%) + cosine decay | | Precision | FP16 | | Epochs | 1 | | Training time | ~25 hours | | Hardware | 4× NVIDIA RTX 5090 (32GB) |

Training Insights

  • Listwise KL distillation outperforms standard contrastive training (MultipleNegativesRankingLoss) for fine-tuning pre-trained retrievers, consistent with findings from Arctic-Embed 2.0 and cadet-embed.
  • Retrieval pre-training matters more than language-specific pre-training for retrieval tasks: multilingual-e5-small (with retrieval pre-training) significantly outperforms XLM-RoBERTa and other BERT variants (without retrieval pre-training) as a base model.
  • A mix of translated and native data prevents catastrophic forgetting while enabling language specialization.

Benchmark

AZ-MIRAGE

A native Azerbaijani retrieval benchmark (https://github.com/LocalDoc-Azerbaijan/AZ-MIRAGE) with 7,373 queries and 40,448 document chunks covering diverse topics. Evaluates retrieval quality on naturally written Azerbaijani text.

Model Details

| Property | Value | |:---------|:------| | Architecture | BERT (XLM-RoBERTa) | | Parameters | 118M | | Embedding dimension | 384 | | Max tokens | 512 | | Vocabulary | SentencePiece (250K) | | Similarity function | Cosine similarity | | Language | Azerbaijani (az) | | License | Apache 2.0 |

Limitations

  • Optimized for Azerbaijani text retrieval. Performance on other languages may be lower than the base multilingual-e5-small model.
  • Requires "query: " and "passage: " prefixes for optimal performance.
  • Maximum input length is 512 tokens. Longer documents should be chunked.

Citation

@misc{locret-small-2026,
  title={LocRet-small: A Compact Azerbaijani Retrieval Embedding Model},
  author={LocalDoc},
  year={2026},
  url={https://huggingface.co/LocalDoc/LocRet-small}
}

Acknowledgments

Author: LocalDoc

Likes: 2

Downloads: 0

Tags: sentence-transformers, safetensors, bert, feature-extraction, sentence-similarity, retrieval, azerbaijani, embedding, az, dataset:LocalDoc/msmarco-az-reranked, dataset:LocalDoc/azerbaijani_retriever_corpus-reranked, dataset:LocalDoc/ldquad_v2_retrieval-reranked, dataset:LocalDoc/azerbaijani_books_retriever_corpus-reranked, arxiv:2412.04506, arxiv:2505.19274, base_model:intfloat/multilingual-e5-small, base_model:finetune:intfloat/multilingual-e5-small, license:apache-2.0, model-index, text-embeddings-inference, endpoints_compatible, region:us

Naphula/Goetia-24B-v1.4

Author: Naphula

Likes: 2

Downloads: 0

Tags: safetensors, mistral, region:us

adamjen/Devstral-Small-2-24B-Opus-Reasoning


license: apache-2.0 base_model: mistralai/Devstral-Small-2-24B-Instruct-2512 datasets:

  • nohurry/Opus-4.6-Reasoning-3000x-filtered language:
  • en tags:
  • mistral
  • ministral3
  • code
  • reasoning
  • lora
  • gguf
  • unsloth
  • knowledge-distillation pipeline_tag: text-generation

Devstral-Small-2-24B Opus Reasoning

A LoRA fine-tune of Devstral-Small-2-24B distilled on Claude 4.6 Opus <think>...</think> reasoning traces. The goal: give Devstral's strong coding foundation explicit chain-of-thought reasoning before it writes code.

Model Details

| | | |---|---| | Base model | mistralai/Devstral-Small-2-24B-Instruct-2512 | | Fine-tune type | QLoRA (4-bit NF4 base + BF16 LoRA adapters) | | LoRA rank | r=16, alpha=16 | | Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj | | Training data | nohurry/Opus-4.6-Reasoning-3000x-filtered (2,322 samples) | | Checkpoint used | checkpoint-1200 (end of epoch 2 — best generalisation) | | Hardware | RTX 3090 24GB VRAM | | Framework | Unsloth 2026.3.10 + TRL SFTTrainer | | Sequence length | 2048 |

Files

| File | Description | |---|---| | adapter_model.safetensors | LoRA adapter weights (~400MB) | | adapter_config.json | LoRA config (rank, target modules, base model path) | | Devstral-Small-2-24B-Opus-Reasoning.Q4_K_M.gguf | Quantised GGUF — ready for llama.cpp / Ollama / llama-swap | | Devstral-Small-2-24B-Opus-Reasoning.Q5_K_M.gguf | Higher quality GGUF — recommended for local use |

Training Data

nohurry/Opus-4.6-Reasoning-3000x-filtered — 2,324 problems with Claude 4.6 Opus <think> reasoning traces and solutions, filtered to < 20,000 characters combined length.

Each sample was formatted as:

[INST] {problem} [/INST]<think>
{thinking}
</think>

{solution}

Loss was computed on the assistant turn only (train_on_responses_only).

Training Loss

| Step | Epoch | Loss | |------|-------|------| | 5 | 0.01 | 0.7949 | | 100 | 0.17 | 0.5708 | | 300 | 0.52 | 0.5800 | | 600 | 1.03 | 0.3559 | | 900 | 1.55 | 0.3858 | | 1100 | 1.89 | 0.3469 | | 1160 | 2.00 | 0.3752 | | 1200 | 2.07 | 0.1493 |

Checkpoint 1200 (end of epoch 2) was selected over the full epoch 3 run — for reasoning distillation tasks, epoch 3 typically overfits to the trace style while epoch 2 gives the best generalisation.

Usage

GGUF (llama.cpp / Ollama / llama-swap)

Download Devstral-Small-2-24B-Opus-Reasoning.Q5_K_M.gguf for best quality, or Devstral-Small-2-24B-Opus-Reasoning.Q4_K_M.gguf if VRAM is tight.

# llama.cpp
./llama-cli -m unsloth.Q5_K_M.gguf \
  --chat-template mistral \
  -p "[INST] Write a Python function to find all prime numbers up to n using a sieve. [/INST]"

LoRA Adapter (Python)

Requires the base model. Because Devstral is a VLM (Pixtral vision encoder), the easiest path is the text-only extracted weights — see the technical notes below.

import torch
from unsloth import FastLanguageModel
from peft import PeftModel

base_model_path = "path/to/Devstral-Small-2-24B-textonly"  # see notes
adapter_path    = "adamjen/Devstral-Small-2-24B-Opus-Reasoning"

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name     = base_model_path,
    max_seq_length = 2048,
    dtype          = torch.bfloat16,
    load_in_4bit   = True,
)
model = PeftModel.from_pretrained(model, adapter_path)

messages = [{"role": "user", "content": "Write a binary search in Python."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Chat Template

This model uses Mistral's [INST]...[/INST] format. The model will produce a <think>...</think> block before its response.

[INST] Your question here [/INST]<think>
... reasoning ...
</think>

... answer ...

Technical Notes: The Devstral Extraction Problem

Devstral-Small-2-24B ships as a Mistral3ForConditionalGeneration (VLM) with a Pixtral vision encoder. Training it as a text-only model on a single 24GB GPU hits several problems:

  • FP8 weights: The official instruct release uses FP8 quantisation, which requires compute capability ≥ 8.9. RTX 3090 is 8.6 — incompatible. Requires dequantising to BF16 first.
  • Vision encoder VRAM: The Pixtral encoder consumes ~4GB VRAM, leaving insufficient headroom for 4-bit QLoRA + gradients.
  • Device map splitting: With a VLM loaded via device_map="auto", accelerate splits layers across GPU/CPU, breaking distributed training mode.
  • transformers 5.x concurrent loader: The async tensor loader materialises all BF16 tensors simultaneously before quantisation → OOM. Fix: HF_DEACTIVATE_ASYNC_LOAD=1.

Solution: Extract the Ministral3ForCausalLM language layers into a standalone text-only model directory (stripping vision_tower.* and multi_modal_projector.*, renaming language_model.model.*model.*). This produces a clean 23B causal LM loadable by FastLanguageModel.

Full write-up with all fixes: Fine-tuning Devstral on an RTX 3090

Hardware Requirements

| Format | Min VRAM | |---|---| | Q4_K_M GGUF | ~16GB | | Q5_K_M GGUF | ~18GB | | LoRA inference (4-bit) | ~20GB | | LoRA training (QLoRA) | 24GB |

Limitations

  • Trained on 2,322 samples — a small dataset. Performance gains on reasoning are real but limited in breadth.
  • Max sequence length 2048 tokens (training constraint). Longer contexts may degrade quality.
  • The <think> block reasoning style is inherited from Claude Opus traces — the model may produce verbose reasoning.
  • Not evaluated on formal benchmarks.

Author

Adam Jenner — adamjenner.com.au

Author: adamjen

Likes: 2

Downloads: 0

Tags: safetensors, gguf, mistral, ministral3, code, reasoning, lora, unsloth, knowledge-distillation, text-generation, conversational, en, dataset:nohurry/Opus-4.6-Reasoning-3000x-filtered, base_model:mistralai/Devstral-Small-2-24B-Instruct-2512, base_model:adapter:mistralai/Devstral-Small-2-24B-Instruct-2512, license:apache-2.0, endpoints_compatible, region:us