Todays AI Summary

AI Developments: 3D Scene Generation, Tool-Integrated Reasoning, and More

Here's a look at some of the most interesting AI developments from today, focusing on new models and research papers.

Research Highlights

  • SceneGen: Single-Image 3D Scene Generation: A new framework, SceneGen, generates multiple 3D assets from a single scene image and object masks in one feedforward pass. It uses a novel feature aggregation module to integrate local and global scene information. The paper demonstrates its extensibility to multi-image inputs and shows robust generation abilities.
  • LiveMCP-101: Stress Testing MCP-enabled Agents: This paper introduces LiveMCP-101, a benchmark of 101 real-world queries designed to evaluate how well AI agents solve multi-step tasks using multiple Model Context Protocol (MCP) tools. The benchmark uses ground-truth execution plans for evaluation and reveals challenges in tool orchestration, even for advanced LLMs.
  • Dissecting Tool-Integrated Reasoning: This research introduces ReasonZoo, a benchmark to evaluate the effectiveness of Tool-Integrated Reasoning (TIR) across various domains. The study demonstrates that TIR-enabled models outperform non-TIR models in both mathematical and non-mathematical tasks and enhances reasoning efficiency.
  • End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning: The paper introduces Deep-DxSearch, an agentic RAG system trained end-to-end with reinforcement learning for medical diagnosis. The system uses a large-scale medical retrieval corpus and tailored rewards to improve diagnostic accuracy and reasoning traceability.

Model Spotlight

  • Seed-OSS-36B-Instruct-GGUF by unsloth: This model, based on ByteDance-Seed/Seed-OSS-36B-Instruct, is designed for long-context, reasoning, and agent capabilities. It supports flexible control of thinking budget and is optimized for international use cases. The model achieves strong performance on benchmarks, including open-source state-of-the-art results in some areas. It was trained with 12T tokens and has a context length of 512K.
  • STELLA-VLM-JoVE-7B by Zaixi: A vision-language model fine-tuned from NVIDIA's Cosmos-Reason1-7B on laboratory protocol videos from JoVE. It extracts protocols, analyzes images, detects errors, assesses safety, and identifies equipment.

Key Takeaways

  • Advancements in 3D Content Generation: SceneGen offers a novel approach to generating high-quality 3D content from single images, potentially impacting VR/AR and embodied AI applications.
  • Importance of Benchmarking Tool Use: LiveMCP-101 highlights the challenges in tool orchestration for AI agents, emphasizing the need for rigorous evaluation in realistic scenarios.
  • Tool-Integrated Reasoning Enhances LLMs: The ReasonZoo benchmark demonstrates that TIR improves the reasoning abilities and efficiency of LLMs across diverse domains.
  • Agentic RAG Systems for Medical Diagnosis: Deep-DxSearch showcases the potential of end-to-end trained agentic RAG systems to improve medical diagnosis accuracy and traceability.
  • New Open-Source Models: Seed-OSS-36B-Instruct-GGUF and STELLA-VLM-JoVE-7B offer new capabilities in language understanding, reasoning, and vision-language tasks, with strong benchmark performance and specialized applications.

AI Papers for 2026-04-10

Toward a Tractability Frontier for Exact Relevance Certification

Exact relevance certification asks which coordinates are necessary to determine the optimal action in a coordinate-structured decision problem. The tractable families treated here admit a finite primitive basis, but optimizer-quotient realizability is maximal, so quotient shape alone cannot characterize the frontier. We prove a meta-impossibility theorem for efficiently checkable structural predicates invariant under the theorem-forced closure laws of exact certification. Structural convergence with zero-distortion summaries, quotient entropy bounds, and support-counting arguments explains why those closure laws are canonical. We establish the theorem by constructing same-orbit disagreements for four obstruction families, namely dominant-pair concentration, margin masking, ghost-action concentration, and additive/statewise offset concentration, using action-independent, pair-targeted affine witnesses. Consequently no correct tractability classifier on a closure-closed domain yields an exact characterization over these families. Here closure-orbit agreement is forced by correctness rather than assumed as an invariance axiom. The result therefore applies to correct classifiers on closure-closed domains, not only to classifiers presented through a designated admissibility package.

MoRight: Motion Control Done Right

Generating motion-controlled videos--where user-specified actions drive physically plausible scene dynamics under freely chosen viewpoints--demands two capabilities: (1) disentangled motion control, allowing users to separately control the object motion and adjust camera viewpoint; and (2) motion causality, ensuring that user-driven actions trigger coherent reactions from other objects rather than merely displacing pixels. Existing methods fall short on both fronts: they entangle camera and object motion into a single tracking signal and treat motion as kinematic displacement without modeling causal relationships between object motion. We introduce MoRight, a unified framework that addresses both limitations through disentangled motion modeling. Object motion is specified in a canonical static-view and transferred to an arbitrary target camera viewpoint via temporal cross-view attention, enabling disentangled camera and object control. We further decompose motion into active (user-driven) and passive (consequence) components, training the model to learn motion causality from data. At inference, users can either supply active motion and MoRight predicts consequences (forward reasoning), or specify desired passive outcomes and MoRight recovers plausible driving actions (inverse reasoning), all while freely adjusting the camera viewpoint. Experiments on three benchmarks demonstrate state-of-the-art performance in generation quality, motion controllability, and interaction awareness.

RoSHI: A Versatile Robot-oriented Suit for Human Data In-the-Wild

Scaling up robot learning will likely require human data containing rich and long-horizon interactions in the wild. Existing approaches for collecting such data trade off portability, robustness to occlusion, and global consistency. We introduce RoSHI, a hybrid wearable that fuses low-cost sparse IMUs with the Project Aria glasses to estimate the full 3D pose and body shape of the wearer in a metric global coordinate frame from egocentric perception. This system is motivated by the complementarity of the two sensors: IMUs provide robustness to occlusions and high-speed motions, while egocentric SLAM anchors long-horizon motion and stabilizes upper body pose. We collect a dataset of agile activities to evaluate RoSHI. On this dataset, we generally outperform other egocentric baselines and perform comparably to a state-of-the-art exocentric baseline (SAM3D). Finally, we demonstrate that the motion data recorded from our system are suitable for real-world humanoid policy learning. For videos, data and more, visit the project webpage: https://roshi-mocap.github.io/

Syntax Is Easy, Semantics Is Hard: Evaluating LLMs for LTL Translation

Propositional Linear Temporal Logic (LTL) is a popular formalism for specifying desirable requirements and security and privacy policies for software, networks, and systems. Yet expressing such requirements and policies in LTL remains challenging because of its intricate semantics. Since many security and privacy analysis tools require LTL formulas as input, this difficulty places them out of reach for many developers and analysts. Large Language Models (LLMs) could broaden access to such tools by translating natural language fragments into LTL formulas. This paper evaluates that premise by assessing how effectively several representative LLMs translate assertive English sentences into LTL formulas. Using both human-generated and synthetic ground-truth data, we evaluate effectiveness along syntactic and semantic dimensions. The results reveal three findings: (1) in line with prior findings, LLMs perform better on syntactic aspects of LTL than on semantic ones; (2) they generally benefit from more detailed prompts; and (3) reformulating the task as a Python code-completion problem substantially improves overall performance. We also discuss challenges in conducting a fair evaluation on this task and conclude with recommendations for future work.

Evaluating In-Context Translation with Synchronous Context-Free Grammar Transduction

Low-resource languages pose a challenge for machine translation with large language models (LLMs), which require large amounts of training data. One potential way to circumvent this data dependence is to rely on LLMs' ability to use in-context descriptions of languages, like textbooks and dictionaries. To do so, LLMs must be able to infer the link between the languages' grammatical descriptions and the sentences in question. Here we isolate this skill using a formal analogue of the task: string transduction based on a formal grammar provided in-context. We construct synchronous context-free grammars which define pairs of formal languages designed to model particular aspects of natural language grammar, morphology, and written representation. Using these grammars, we measure how well LLMs can translate sentences from one formal language into another when given both the grammar and the source-language sentence. We vary the size of the grammar, the lengths of the sentences, the syntactic and morphological properties of the languages, and their written script. We note three key findings. First, LLMs' translation accuracy decreases markedly as a function of grammar size and sentence length. Second, differences in morphology and written representation between the source and target languages can strongly diminish model performance. Third, we examine the types of errors committed by models and find they are most prone to recall the wrong words from the target language vocabulary, hallucinate new words, or leave source-language words untranslated.

Chatbot-Based Assessment of Code Understanding in Automated Programming Assessment Systems

Large Language Models (LLMs) challenge conventional automated programming assessment because students can now produce functionally correct code without demonstrating corresponding understanding. This paper makes two contributions. First, it reports a saturation-based scoping review of conversational assessment approaches in programming education. The review identifies three dominant architectural families: rule-based or template-driven systems, LLM-based systems, and hybrid systems. Across the literature, conversational agents appear promising for scalable feedback and deeper probing of code understanding, but important limitations remain around hallucinations, over-reliance, privacy, integrity, and deployment constraints. Second, the paper synthesizes these findings into a Hybrid Socratic Framework for integrating conversational verification into Automated Programming Assessment Systems (APASs). The framework combines deterministic code analysis with a dual-agent conversational layer, knowledge tracking, scaffolded questioning, and guardrails that tie prompts to runtime facts. The paper also discusses practical safeguards against LLM-generated explanations, including proctored deployment modes, randomized trace questions, stepwise reasoning tied to concrete execution states, and local-model deployment options for privacy-sensitive settings. Rather than replacing conventional testing, the framework is intended as a complementary layer for verifying whether students understand the code they submit.

Region-Graph Optimal Transport Routing for Mixture-of-Experts Whole-Slide Image Classification

Multiple Instance Learning (MIL) is the dominant framework for gigapixel whole-slide image (WSI) classification in computational pathology. However, current MIL aggregators route all instances through a shared pathway, constraining their capacity to specialise across the pathological heterogeneity inherent in each slide. Mixture-of-Experts (MoE) methods offer a natural remedy by partitioning instances across specialised expert subnetworks; yet unconstrained softmax routing may yield highly imbalanced utilisation, where one or a few experts absorb most routing mass, collapsing the mixture back to a near-single-pathway solution. To address these limitations, we propose ROAM (Region-graph OptimAl-transport Mixture-of-experts), a spatially aware MoE-MIL aggregator that routes region tokens to expert poolers via capacity-constrained entropic optimal transport, promoting balanced expert utilisation by construction. ROAM operates on spatial region tokens, obtained by compressing dense patch bags into spatially binned units that align routing with local tissue neighbourhoods and introduces two key mechanisms: (i) region-to-expert assignment formulated as entropic optimal transport (Sinkhorn) with explicit per slide capacity marginals, enforcing balanced expert utilisation without auxiliary load-balancing losses; and (ii) graph-regularised Sinkhorn iterations that diffuse routing assignments over the spatial region graph, encouraging neighbouring regions to coherently route to the same experts. Evaluated on four WSI benchmarks with frozen foundation-model patch embeddings, ROAM achieves performance competitive against strong MIL and MoE baselines, and on NSCLC generalisation (TCGA-CPTAC) reaches external AUC 0.845 +- 0.019.

CADENCE: Context-Adaptive Depth Estimation for Navigation and Computational Efficiency

Autonomous vehicles deployed in remote environments typically rely on embedded processors, compact batteries, and lightweight sensors. These hardware limitations conflict with the need to derive robust representations of the environment, which often requires executing computationally intensive deep neural networks for perception. To address this challenge, we present CADENCE, an adaptive system that dynamically scales the computational complexity of a slimmable monocular depth estimation network in response to navigation needs and environmental context. By closing the loop between perception fidelity and actuation requirements, CADENCE ensures high-precision computing is only used when mission-critical. We conduct evaluations on our released open-source testbed that integrates Microsoft AirSim with an NVIDIA Jetson Orin Nano. As compared to a state-of-the-art static approach, CADENCE decreases sensor acquisitions, power consumption, and inference latency by 9.67%, 16.1%, and 74.8%, respectively. The results demonstrate an overall reduction in energy expenditure by 75.0%, along with an increase in navigation accuracy by 7.43%.

Android Coach: Improve Online Agentic Training Efficiency with Single State Multiple Actions

Online reinforcement learning (RL) serves as an effective method for enhancing the capabilities of Android agents. However, guiding agents to learn through online interaction is prohibitively expensive due to the high latency of emulators and the sample inefficiency of existing RL algorithms. We identify a fundamental limitation in current approaches: the Single State Single Action paradigm, which updates the policy with one-to-one state-action pairs from online one-way rollouts without fully exploring each costly emulator state. In this paper, we propose Android Coach, a novel framework that shifts the training paradigm to Single State Multiple Actions, allowing the agent to sample and utilize multiple actions for a single online state. We enable this without additional emulator overhead by learning a critic that estimates action values. To ensure the critic serves as a reliable coach, we integrate a process reward model and introduce a group-wise advantage estimator based on the averaged critic outputs. Extensive experiments demonstrate the effectiveness and efficiency of Android Coach: it achieves 7.5% and 8.3% success rate improvements on AndroidLab and AndroidWorld over UI-TARS-1.5-7B, and attains 1.4x higher training efficiency than Single State Single Action methods PPO and GRPO at matched success rates.

Making Room for AI: Multi-GPU Molecular Dynamics with Deep Potentials in GROMACS

GROMACS is a de-facto standard for classical Molecular Dynamics (MD). The rise of AI-driven interatomic potentials that pursue near-quantum accuracy at MD throughput now poses a significant challenge: embedding neural-network inference into multi-GPU simulations retaining high-performance. In this work, we integrate the MLIP framework DeePMD-kit into GROMACS, enabling domain-decomposed, GPU-accelerated inference across multi-node systems. We extend the GROMACS NNPot interface with a DeePMD backend, and we introduce a domain decomposition layer decoupled from the main simulation. The inference is executed concurrently on all processes, with two MPI collectives used each step to broadcast coordinates and to aggregate and redistribute forces. We train an in-house DPA-1 model (1.6 M parameters) on a dataset of solvated protein fragments. We validate the implementation on a small protein system, then we benchmark the GROMACS-DeePMD integration with a 15,668 atom protein on NVIDIA A100 and AMD MI250x GPUs up to 32 devices. Strong-scaling efficiency reaches 66% at 16 devices and 40% at 32; weak-scaling efficiency is 80% to 16 devices and reaches 48% (MI250x) and 40% (A100) at 32 devices. Profiling with the ROCm System profiler shows that >90% of the wall time is spent in DeePMD inference, while MPI collectives contribute <10%, primarily since they act as a global synchronization point. The principal bottlenecks are the irreducible ghost-atom cost set by the cutoff radius, confirmed by a simple throughput model, and load imbalance across ranks. These results demonstrate that production MD with near ab initio fidelity is feasible at scale in GROMACS.

AI Models

Jackrong/Gemopus-4-26B-A4B-it-GGUF


language:

  • en
  • zh
  • ko
  • ja license: apache-2.0 base_model: google/gemma4-26B-it tags:
  • gemma
  • gemma4
  • instruction-tuned
  • reasoning
  • alignment pipeline_tag: text-generation

🌟 Gemopus-4-26B-A4B-it

[!NOTE] Gemopus is an attempt at fine-tuning Gemma 4 with a core philosophy of "stability first".

While preserving the original reasoning order of Gemma 4 as much as possible, we conducted targeted refinements for answer quality, structure, clarity, and consistency.

🍎 Therefore, My fine-tuning strategy chose not to follow other teams in aggressive direct distillation from Claude. Instead, we opted for a more conservative and controllable path.

🎯 Development Motivation & Industry Insights

Gemopus-4-26B-A4B-it is a supervised fine-tune version based on the Gemma 4 26B Instruction model.

  • Although this model has "Opus" in its name, it is more of a continuation of the naming convention.
  • There is no need for excessive imagination or superstitious replication of the "Claude-style chain of thought (CoT)" found in public distillation corpora. Because judging from the currently available distilled datasets, reasoning text does not necessarily equate to the teacher model's true, faithful, and transferable internal reasoning process. Simple observation suggests it is often more like a summary of the thinking process rather than genuine logically connected "reasoning." A series of recent studies have found that models can exhibit post-hoc rationalization in natural scenarios without explicit induction—that is, forming an answer bias first and then coming up with a plausible explanation. Other research has found that CoT faithfulness varies greatly across different model families, and the impact of training methods on faithfulness is often more significant than model scale. In other words, text that "looks like reasoning" is not necessarily a high-quality, transferable supervision signal for reasoning.

[!IMPORTANT] A 2026 self-distillation paper found that while self-distillation can often shorten reasoning traces and improve some in-domain performance, it may also lead to performance degradation in mathematical reasoning. This degradation is correlated with the suppression of uncertainty expressions. The authors reported that on some models and settings, out-of-domain (OOD) performance degradation can be significant. The implication of this result is that reasoning text that appears on the surface to be shorter, cleaner, and more "good at solving problems" does not mean that the student has truly acquired more robust reasoning capabilities.


gemma-4-table_light_Web_with_Arena

⚠️ Limitations & Growing Pains of Radical CoT Distillation

When only Supervised Fine-Tuning (SFT) is available, without subsequent Reinforcement Learning (RL) or process supervision, forcefully feeding the student model with potentially non-faithful long reasoning traces together with the final answers introduces substantial hidden risks.

The core issue is that the student model may fail to reliably align the token-level rationale sequence with the latent computational process that actually supports the final decision. As a result, training can easily degenerate into imitation of the surface form of reasoning text, rather than genuine internalization of the underlying reasoning mechanism.

In this setting, such long-form reasoning data may improve stylistic resemblance, but not necessarily reasoning fidelity.

By contrast, compared with the Qwen series, Gemma 4 already exhibits a more structured, orderly, and disciplined reasoning-chat style. It is also significantly less prone to “overthinking” or producing excessively long, uncontrolled reasoning chains. For that reason, there is little value in aggressively reshaping its reasoning style during the SFT stage. Doing so risks disrupting Gemma 4’s native and already strong reasoning rhythm, while offering limited upside in the absence of later-stage alignment methods.


💡 Model Features & Alignment Optimization

Based on the methodological deduction above, I chose to focus my optimization efforts on the lower-risk, more consistently rewarding levels of final answer quality and interactive experience:

  • ⚖️ Overall Style Consistency: Eliminated the stiff "machine translation tone" and redundant preaching feel inherent in the base model, making conversations more natural, clear, and organized.
  • 📐 Structural & Completeness Enhancements: Significantly optimized the organizational structure of long responses. The model can more proficiently use Markdown syntax (e.g., lists, bolding) for hierarchical structuring and noise reduction, ensuring key points stand out visually and improving the reading experience.
  • 🎓 Expressive Rigor & Depth of Explanation: In technical and popular science responses, enhanced the rigor of professional terminology and the ability to explain complex concepts simply, while avoiding mechanical, encyclopedia-like recitation.

📊 Evaluation Benchmarks (TBD)

⏳ Thanks to Kyle Hessling for independently running these benchmarks and sharing the results!

PNG image

PNG image


🛠️ Best Practices

For the best performance, use these configurations and best practices:

1. Sampling Parameters

Use the following standardized sampling configuration across all use cases:

  • temperature=1.0
  • top_p=0.95
  • top_k=64

2. Thinking Mode Configuration

Compared to Gemma 3, the models use standard system, assistant, and user roles. To properly manage the thinking process, use the following control tokens:

  • Trigger Thinking: Thinking is enabled by including the <|think|> token at the start of the system prompt. To disable thinking, remove the token.
  • Standard Generation: When thinking is enabled, the model will output its internal reasoning followed by the final answer using this structure:
    <|channel>thought\n [Internal reasoning] <channel|>
  • Disabled Thinking Behavior: For all models except for the E2B and E4B variants, if thinking is disabled, the model will still generate the tags but with an empty thought block:
    <|channel>thought\n<channel|> [Final answer]

[!NOTE] Note that many libraries like Transformers and llama.cpp handle the complexities of the chat template for you.


📚 Resources & Guides

🚧 The complete fine-tuning code and related notebooks for this model will be updated soon, please stay tuned!

👉 GitHub Repository: Jackrong-llm-finetuning-guide
Welcome to visit this repository to gain a deeper understanding of the codebase and reproduce the training results locally or on Colab.

📥 Core Technical Documentation

🔗 Qwopus3.5-27b Complete Fine-Tuning Guide (PDF)

  • Complete Pipeline: Step-by-step operational guide—covering the entire process from downloading the base model, heterogeneous data fusion, to configuring training hyperparameters and finally releasing it to Hugging Face.
  • Beginner Friendly: Includes basic starter tutorials for Google Colab and Unsloth.

No one starts out as an expert, but all experts bravely took the first step.

All training and testing for this project are self-funded. If you find this model or guide helpful, giving a Star ⭐️ on GitHub is the greatest encouragement to me. 🙏


🗺️ Training Pipeline

Base Model (google/gemma4-26B-it)
 │
 ▼
Targeted Supervised Fine-Tuning (SFT) 
(Focus on Answer Quality & Structural Alignment, Retaining Restrained CoT)
 │
 ▼
Gemopus-4-26B-A4B-it

📚 Dataset Construction & Philosophy

The training data specifically curates highly coherent instruction pairs with optimal structures from the open-source community, alongside natural multi-turn conversations. The goal is to guide the model to learn more mature ways of organizing and presenting conclusions, rather than mechanically imitating "fake chain of thought" without internalized logic.


⚠️ Known Issues & Ecosystem Compatibility Statement

  • Tool Calling Compatibility: The Gemma 4 series models still have known compatibility issues with tool calling functionality in local inference ecosystems like llama.cpp / LM Studio (including call failures, format mismatches, continuous loops, etc.). This has been widely reported in the community and is not unique to this model. If your workflow heavily relies on tool calling, it is recommended to thoroughly test it before official use, or temporarily consider solutions with more mature ecosystem support.

  • Regarding Fine-Tuning Characteristics of the Gemma Architecture: From an engineering practice perspective, the Gemma series does exhibit different training dynamics compared to the Qwen series during fine-tuning—including wider loss curve fluctuations and greater sensitivity of gradient stability to hyperparameters. This may be related to Google's model architecture design. Furthermore, the base Gemma 4 model objectively still has a gap compared to the Qwen 3.5 series in certain dimensions of its raw capabilities. We believe that truthfully stating these observations is more beneficial to the technical judgment of the community than selectively avoiding them.

  • Project Positioning: The core value of Gemopus-4-26B-A4B-it lies in providing an engineering exploration reference supported by methodology for SFT fine-tuning under the Gemma 4 architecture, rather than a fully production-ready solution. If you are looking for a productivity model that has undergone more iterative validation and offers more stable ecosystem compatibility, I recommend looking at the Qwopus-3.5-v3 series—its performance after fine-tuning is much more robust.


🍎 Limitations & Usage Recommendations

  • Boundaries of Computation & Knowledge: Constrained by parameter size, the breadth of its world knowledge and depth of its mathematical and logical reasoning capabilities are still not entirely equivalent to those of frontier models with hundreds of billions of parameters in the cloud (such as GPT-4 or Claude 3.5 Sonnet).
  • Potential Hallucinations: When dealing with extremely highly-specialized domains, obscure knowledge points, or complex high-level math problems requiring multi-step, long-chain calculations, there is still a possibility of logic drifting or hallucinations.
  • Best Practices: It is strongly recommended to use it as a local high-quality text processing and daily logic companion assistant, particularly suitable for scenarios demanding high response quality and tight structural organization, such as structural summarization, routine copy arrangement, and interactive coding.
  • Disclaimer: This is an experimental weight optimized independently, emphasizing "stability and methodology" in local interactions. Welcome to freely conduct local deployment tests and share academic discussions.

🙏 Acknowledgements

Special thanks to the developers in the open-source community for building such a thriving ecosystem. Thank you to the Unsloth team for providing excellent and highly efficient LLM fine-tuning support, and sincere respect to the Google team for open-sourcing the outstanding Gemma 4 base model. Finally, thanks to all the researchers who have contributed profound insights into CoT Faithfulness and the interpretability of LLM reasoning. It is exactly these rigorous frontier academic discussions that deeply inspired the core fine-tuning methodology of this project.

Author: Jackrong

Likes: 17

Downloads: 0

Tags: gguf, gemma4, gemma, instruction-tuned, reasoning, alignment, text-generation, en, zh, ko, ja, license:apache-2.0, endpoints_compatible, region:us, conversational

happyhorseai/happy-horse-ai-video-generator


license: apache-2.0 pipeline_tag: text-to-video tags:

  • happy horse-1.0
  • ai-video-generator
  • text-to-video
  • image-to-video
  • multimodal-ai
  • video-generation
  • video-arena
  • artificial-analysis

Happy Horse-1.0

Project site: https://tryhappyhorse.com

The Open Video Model That Reached #1 on Artificial Analysis

As of April 9, 2026, HappyHorse-1.0 sits at the top of the Artificial Analysis Video Arena leaderboard in the most watched blind-comparison categories for generative video: #1 in Text-to-Video (No Audio) and #1 in Image-to-Video (No Audio). This matters because Artificial Analysis is not a self-reported benchmark. It is a user-preference arena built on blind comparisons, where models are ranked by Elo based on which outputs people actually prefer.

That distinction changes the entire story.

HappyHorse-1.0 is not compelling because it publishes another polished model card. It is compelling because it appears to win where users actually vote: visual quality, prompt faithfulness, motion plausibility, and overall preference under head-to-head evaluation. In a category crowded with closed-source incumbents, API-only products, and heavily optimized proprietary systems, that result immediately makes HappyHorse-1.0 a model worth serious technical attention.

Why the #1 Ranking Matters

Most model announcements overfit to internal metrics. They optimize for selective demos, cherry-picked prompts, isolated capability claims, or lab-specific benchmarks that do not fully reflect what end users care about. Video generation is especially vulnerable to this problem. A model can look impressive in one curated clip and still fail in routine use because of motion collapse, prompt drift, weak physical realism, or unstable temporal consistency.

Artificial Analysis is valuable because it evaluates models through comparative human preference. That makes the leaderboard much closer to a real-world quality signal than a synthetic scorecard. In the Text-to-Video (No Audio) leaderboard, Artificial Analysis currently lists HappyHorse-1.0 at #1 with an Elo of 1383. In the Image-to-Video (No Audio) leaderboard FAQ, Artificial Analysis states that HappyHorse-1.0 also leads that category with an Elo of 1413.

For anyone building, deploying, or productizing generative video systems, that result says something important: HappyHorse-1.0 is not merely competitive. It is, at least at this moment in public blind preference evaluation, the model others must now catch.

What Had to Be True for HappyHorse-1.0 to Reach #1

A model does not reach the top of a blind video arena by being good at one thing. It has to perform well across several dimensions at the same time:

  • It must understand prompts well enough to preserve subject, scene intent, and stylistic instructions.
  • It must maintain enough temporal coherence that viewers do not reject the clip as unstable or synthetic.
  • It must render motion in a way that feels intentional rather than interpolated noise.
  • It must produce pleasing visual composition under many prompt types, not only a narrow subset.
  • It must avoid enough obvious failure modes that users prefer it repeatedly in pairwise comparisons.

That is what makes the ranking interesting from an implementation perspective. A result like this implies not one isolated breakthrough, but a stack of architectural and systems decisions that compound into user-visible quality.

The Public Technical Story Behind HappyHorse-1.0

Based on the public technical claims published on the Happy Horse website, HappyHorse-1.0 is described as a 15B-parameter unified Transformer built for joint video and audio generation, using a 40-layer self-attention architecture with modality-specific layers at the edges and shared layers in the middle. The same public description also claims:

  • joint video + synchronized audio generation
  • native multilingual lip-sync
  • an 8-step DMD-2 distilled inference path
  • 1080p output
  • approximately 38 seconds to generate a 5-second 1080p clip on H100
  • availability of a base model, distilled model, super-resolution module, and inference code

These details come from Happy Horse’s own public materials, not from Artificial Analysis. But if those claims are accurate, they help explain why the model could perform unusually well in preference-based ranking.

Implementation Hypothesis: Why This Architecture Could Win

The most important implementation idea is the move toward a unified multimodal generation stack instead of a fragmented pipeline.

Many video systems still behave like stitched workflows: one system interprets text, another synthesizes motion, another layer handles temporal smoothing, and audio is either ignored or added later. That fragmentation can work, but it often leaks. Users experience the leaks as prompt drift, scene incoherence, mismatch between motion and framing, or weak audiovisual coupling.

HappyHorse-1.0’s public positioning suggests a different philosophy: put text, video, and audio into a single modeling framework and optimize the whole generation problem jointly. If implemented well, that choice can produce several advantages:

1. Better cross-modal alignment

A unified model can learn tighter relationships between semantics, motion, and timing. Instead of treating audio or motion as downstream attachments, it can encode them as part of the same generative process.

2. Lower handoff loss

Every separate subsystem introduces a boundary where information can be compressed, distorted, or dropped. Single-stream or unified-token designs reduce those boundaries.

3. Stronger prompt fidelity

If scene planning, motion reasoning, and visual synthesis share the same representational path, prompt semantics may survive deeper into generation.

4. More coherent preference outcomes

Humans reward outputs that “feel whole.” Blind arena wins usually come from reducing multi-dimensional awkwardness, not from maximizing one isolated metric.

The Significance of a 40-Layer Unified Transformer

The public architecture description for HappyHorse-1.0 is unusually specific by product-page standards. It describes a 40-layer self-attention Transformer with 4 modality-specific layers on each side and 32 shared layers in the middle. If accurate, that implies a deliberate balance between modality specialization and shared reasoning.

This matters because pure sharing can blur modality differences, while excessive separation can prevent the model from learning unified temporal semantics. A layered design that allows early and late specialization with a large shared core is a reasonable way to balance:

  • text semantic parsing
  • motion representation
  • audiovisual correspondence
  • temporal global context
  • output coherence

The claim of single-stream processing with per-head gating is also notable. If implemented well, that could stabilize training and reduce interference between modalities without abandoning the benefits of a shared backbone.

In practical terms, this is the kind of design decision that could translate into higher human preference: fewer brittle transitions, more stable scenes, and stronger prompt retention across time.

Why Fast Inference Matters More Than It Sounds

The public materials also emphasize 8-step DMD-2 distillation and a runtime acceleration layer called MagiCompiler. The temptation is to treat this as a speed footnote. That would be a mistake.

In video generation, fast inference is not just a cost optimization. It changes how models are used and improved.

A slow model can be excellent in theory and still underperform in practice because:

  • users iterate less
  • teams test fewer prompt variants
  • pipelines become less interactive
  • product integrations become harder to justify economically

A model that reaches top-tier quality while reducing denoising to 8 steps creates a different deployment profile. It becomes easier to:

  • serve interactively
  • support multiple retries
  • integrate into creator tools
  • productize for broader use
  • close the loop between prompt edit and output inspection

That matters for product adoption, but it also matters indirectly for model quality perception. Users prefer systems they can steer. Speed is one of the hidden multipliers of steerability.

Why Ranking #1 in Both T2V and I2V Is Technically Significant

Being strong in only one category is easier. A model can be optimized aggressively for text-to-video prompt following or separately tuned for image-conditioned continuation. Leading in both suggests broader capability.

Text-to-video rewards:

  • semantic grounding
  • scene imagination
  • stylistic interpretation
  • narrative plausibility from sparse input

Image-to-video rewards:

  • visual identity preservation
  • motion extension from a fixed visual prior
  • consistency under stronger conditioning
  • temporal continuity without destroying the source frame’s structure

To lead both categories, a model likely needs to balance generative flexibility with conditioning discipline. That is much harder than optimizing for one side alone.

This is one reason the Artificial Analysis results are so useful as a story anchor. If HappyHorse-1.0 is truly leading both categories as of April 9, 2026, then the implementation is doing more than one trick well. It is likely succeeding at a broader systems problem: generating video that users repeatedly judge as better across different starting conditions.

Audio May Be the Next Frontier, Not the Current Headline

One subtle but important strategic point: the cleanest ranking story right now is No Audio, where the public leaderboard support is strongest. That is what should lead the page headline.

The public Happy Horse materials place heavy emphasis on synchronized audio, multilingual lip-sync, and end-to-end audiovisual generation. Those are technically compelling claims, and they may become major differentiators. But for headline credibility, the strongest independently verifiable statement today is the leaderboard leadership in the no-audio categories.

That gives you a better messaging hierarchy:

  1. Lead with verified #1 rankings in T2V and I2V on Artificial Analysis.
  2. Then explain that public model materials describe a broader multimodal system with joint audio-video generation.
  3. Treat the audio story as an implementation differentiator, not the first proof point.

That structure is stronger, cleaner, and more defensible.

Product Implications for Builders

If you are a builder rather than a benchmark watcher, the real question is simple: what does #1 actually unlock?

For teams and creators, a model like HappyHorse-1.0 potentially changes three things:

Faster experimentation

A model that is both high-ranking and inference-efficient can support more iteration loops per creative cycle.

Better prompt confidence

When preference-based rankings are high, users can trust that the model is not merely optimized for internal demos. It is winning broader subjective comparisons.

More viable productization

A model that combines quality, speed, and multimodal ambition is easier to embed in commercial workflows, hosted generation products, and repeat-use creative pipelines.

This is exactly why the ranking matters commercially. Arena leadership is not just prestige. It is a signal that the system may be mature enough to sit inside products people use repeatedly.

What This Means for the Open Video Ecosystem

If the public claims around HappyHorse-1.0 hold, the broader significance is larger than one model release.

An open or open-weight model reaching the top of a major human-preference leaderboard changes the center of gravity of the category. It suggests that frontier-quality video generation is no longer the exclusive domain of closed, API-gated systems. It also forces a new competitive question: can open multimodal video systems now move as fast as, or faster than, proprietary labs in quality-adjusted product usefulness?

That is why HappyHorse-1.0 deserves attention. It does not merely post a strong score. It pressures the assumptions of the entire market.

Sources and Verification Notes

The ranking claims above are based on publicly accessible Artificial Analysis pages reviewed on April 9, 2026:

  • Artificial Analysis Text to Video Leaderboard (No Audio): HappyHorse-1.0 listed at **#1, Elo 1383`
  • Artificial Analysis Image to Video Leaderboard FAQ: HappyHorse-1.0 listed at **#1, Elo 1413`

The implementation and architecture descriptions are based on public technical claims published by Happy Horse on its own website, including references to a 15B unified Transformer, 40 layers, DMD-2 8-step distillation, multilingual lip-sync, and 1080p generation.

Where this page discusses why those design choices may explain leaderboard leadership, that is an engineering inference based on the published descriptions and the observed benchmark outcomes.

References

  1. Artificial Analysis Text to Video Leaderboard: https://artificialanalysis.ai/embed/text-to-video-leaderboard/leaderboard/text-to-video
  2. Artificial Analysis Image to Video Leaderboard: https://artificialanalysis.ai/video/leaderboard/image-to-video
  3. Happy Horse public technical overview: https://tryhappyhorse.com

Try It

If you want to explore the product experience around AI video generation, visit:

Author: happyhorseai

Likes: 14

Downloads: 0

Tags: happy horse-1.0, ai-video-generator, text-to-video, image-to-video, multimodal-ai, video-generation, video-arena, artificial-analysis, license:apache-2.0, region:us

bytedance-research/Timer-S1


license: apache-2.0 metrics:

  • mse
  • mae
  • mase
  • wql
  • crps pipeline_tag: time-series-forecasting datasets:
  • thuml/UTSD
  • Salesforce/lotsa_data
  • Salesforce/GiftEvalPretrain
  • autogluon/chronos_datasets tags:
  • time series
  • time-series
  • forecasting
  • foundation models
  • pretrained models
  • time series foundation models library_name: transformers

Timer-S1

Timer-S1 is a time series foundation model with 8.3B total parameters, 0.75B activated parameters per token, and a context length of 11,520.

The model supports zero-shot forecasting (predicting without dataset-specific training) at different quantile levels.

For more details, please refer to our technical report.

image

Architecture: Timer-S1 is a decoder-only Mixture-of-Experts (MoE) Transformer. For time series forecasting (a sequential problem where each step depends on previous ones), we propose TimeSTP, enabling multi-step prediction with serial computations. image

Performance: Timer-S1 achieves state-of-the-art results on GIFT-Eval. The model excels particularly at medium-term and long-term forecasting tasks.

image

image

Post Training: Timer-S1 undergoes post-training, including continued pre-training (CPT) and long-context extension (LCE), which improves short-term and long-context performance.

image

Quickstart

pip install torch accelerate transformers~=4.57.1
import torch
from transformers import AutoModelForCausalLM

# load pretrain model
# supports different lookback/forecast lengths
model = AutoModelForCausalLM.from_pretrained(
    'bytedance-research/Timer-S1',
    trust_remote_code=True,
    device_map="auto"
)

# use local model
# model = AutoModelForCausalLM.from_pretrained(
#     'path_to_timer_s1',
#     trust_remote_code=True,
#     device_map="auto"
# )

# prepare input
batch_size, lookback_length = 1, 11520 
seqs = torch.randn(batch_size, lookback_length).to(model.device)

# Note that Timer-S1 generates predictions at fixed quantile levels
forecast_length = 720

output = model.generate(seqs, max_new_tokens=forecast_length, revin=True)

# produce quantile forecasts in [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
print(output.shape) # batch_size x quantile_num(9) x forecast_length

# produce the median forecast of the first sample
print(output[0][4])

Out of GPU memory? Try the following options:

# Option 1: reduce batch size or context length
batch_size, lookback_length = 1, 2880

# Option 2: disable KV cache at runtime (or edit it in config.json for a permanent change)
model.config.use_cache = False

Specification

  • Architecture: decoder-only Transformer
  • Context Length: up to 11,520
  • ReNorm: default=True
  • KV Cache: default=True
  • Patch Length: 16
  • Total Parameters: 8.3B
  • Activated Parameters: 0.75B
  • Number of Layers: 40

License Agreement

This model is licensed under the Apache-2.0 License.

Citation

If you find Timer-S1 helpful for your research, please cite our paper:

@article{liu2026timer,
  title={Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling},
  author={Liu, Yong and Su, Xingjian and Wang, Shiyu and Zhang, Haoran and Liu, Haixuan and Wang, Yuxuan and Ye, Zhou and Xiang, Yang and Wang, Jianmin and Long, Mingsheng},
  journal={arXiv preprint arXiv:2603.04791},
  year={2026}
}

Author: bytedance-research

Likes: 10

Downloads: 0

Tags: transformers, safetensors, Timer-S1, time series, time-series, forecasting, foundation models, pretrained models, time series foundation models, time-series-forecasting, custom_code, dataset:thuml/UTSD, dataset:Salesforce/lotsa_data, dataset:Salesforce/GiftEvalPretrain, dataset:autogluon/chronos_datasets, arxiv:2603.04791, license:apache-2.0, endpoints_compatible, region:us

oumoumad/LTX-2.3-22b-IC-LoRA-Outpaint


base_model:

  • Lightricks/LTX-2.3 language:
  • en license: other license_name: ltx-2-community-license license_link: https://github.com/Lightricks/LTX-2/blob/main/LICENSE pipeline_tag: video-to-video tags:
  • ltx-video
  • ic-lora
  • outpaint

LTX-2.3 22B IC-LoRA Outpaint

This is an Outpaint IC-LoRA trained on top of LTX-2.3-22b, designed to extend the canvas of an input video by generating new content in regions marked as pure black. You can use it to outpaint on the sides, top, bottom, or any combination — the model fills the black regions with content that is visually and temporally consistent with the original footage.

It is based on the LTX-2.3 foundation model.

Model Files

ltx-2.3-22b-ic-lora-outpaint.safetensors

Model Details

  • Base Model: LTX-2.3-22b
  • Training Type: IC LoRA
  • Purpose: Outpaint / extend video canvas

How It Works

The model was trained with pure black pixels (RGB 0,0,0) as the sentinel for the region to generate. At inference, letterbox your source video to the target canvas size with black bars on the sides / top / bottom you want to extend, and the model will fill those regions with newly generated content consistent with the scene.

🔌 Using in ComfyUI

  1. Copy the LoRA weights into models/loras.
  2. Use the IC-LoRA workflow from the LTX-2 ComfyUI repository.
  3. Load the LoRA using the LTXICLoRALoaderModelOnly node.

⚠️ Dark scenes: use a gamma correction trick

Because the model uses pure black as the "generate here" sentinel, very dark source footage (deep shadows, night scenes, underwater, etc.) can be ambiguous: the real fill bars may be interpreted as legitimate dark scene content and left un-generated.

The fix is a simple gamma round-trip:

  1. Before feeding to the model, apply gamma 2.0 (brightening) to your letterboxed input. The real dark content lifts into clearly-colored territory while the pure-black bars stay black — giving the model an unambiguous signal of where to paint.
  2. After the model produces its output, apply the inverse gamma 0.5 to return everything (original center + newly generated regions) to the original exposure.

Because gamma 2.0 and 0.5 are exact inverses, the round-trip is mathematically lossless on continuous values — the only loss is the same VAE round-trip every other clip experiences.

In ComfyUI, the Color Correct (mtb) node works well for both the forward and inverse steps.

License

See the LTX-2-community-license for full terms.

Author: oumoumad

Likes: 6

Downloads: 0

Tags: ltx-video, ic-lora, outpaint, video-to-video, en, base_model:Lightricks/LTX-2.3, base_model:finetune:Lightricks/LTX-2.3, license:other, region:us

samuelcardillo/Carnice-MoE-35B-A3B-GGUF


language:

  • en license: apache-2.0 tags:
  • qwen3.5
  • moe
  • hermes
  • agentic
  • tool-calling
  • qlora
  • unsloth
  • carnice base_model: Qwen/Qwen3.5-35B-A3B datasets:
  • bespokelabs/Bespoke-Stratos-17k
  • AI-MO/NuminaMath-CoT
  • kai-os/carnice-glm5-hermes-traces
  • open-thoughts/OpenThoughts-Agent-v1-SFT

Carnice MoE 35B-A3B — Hermes-Focused Agentic Model (GGUF)

QLoRA fine-tune of Qwen3.5-35B-A3B (MoE, 3B active parameters) optimized for agentic workflows and Hermes Agent runtime. Two-stage training adapted from kai-os/Carnice-9b.

Credits

Training methodology adapted from kai-os/Carnice-9b — same two-stage approach and datasets, applied to the larger MoE architecture. Key inspiration: training on actual Hermes Agent execution traces for native agentic behavior.

Available Quantizations

| Quantization | Size | BPW | Min VRAM | |---|---|---|---| | Q8_0 | 35 GB | 8.52 | 1x 48GB GPU | | Q6_K | 27 GB | 6.58 | 1x 32GB GPU | | Q5_K_M | 24 GB | 5.70 | 1x 32GB GPU | | Q4_K_M | 20 GB | 4.87 | 1x 24GB GPU | | MXFP4_MOE | 19 GB | 4.39 | 1x 24GB GPU |

For BF16 safetensors, see samuelcardillo/Carnice-MoE-35B-A3B.

Model Details

| Property | Value | |---|---| | Base Model | Qwen/Qwen3.5-35B-A3B | | Architecture | Mixture of Experts (MoE) | | Total Parameters | ~35B | | Active Parameters | ~3B per token |

What Makes This Different

Unlike generic reasoning distillation, this model was trained on actual Hermes Agent execution traces — real conversations where an AI agent:

  • Executes terminal commands and processes output
  • Performs file editing operations
  • Chains multi-step tool calls with results feeding back
  • Uses browser-assisted workflows
  • Makes decisions based on environmental feedback

This teaches the model the exact conversation patterns Hermes expects, rather than just generic reasoning.

Training Details

Two-Stage Approach

Stage A — Reasoning Repair (1 epoch)

  • Strengthens base model reasoning before agent-specific training
  • Loss: 0.4159

| Dataset | Examples | |---|---| | bespokelabs/Bespoke-Stratos-17k | 16,710 | | AI-MO/NuminaMath-CoT | 17,000 (capped) |

Stage B — Hermes Traces (2 epochs)

  • Agent-specific behavioral training on real execution traces
  • Loss: 0.3115

| Dataset | Examples | |---|---| | kai-os/carnice-glm5-hermes-traces | 1,627 (high quality) | | open-thoughts/OpenThoughts-Agent-v1-SFT | 15,209 |

Training Configuration

| Parameter | Stage A | Stage B | |---|---|---| | LoRA Rank | 64 | 64 | | LoRA Alpha | 64 | 64 | | LoRA Targets | q, k, v, o projections | q, k, v, o projections | | Learning Rate | 2e-5 (linear) | 1e-5 (cosine) | | Epochs | 1 | 2 | | Effective Batch | 12 | 12 | | Context Length | 4096 | 4096 | | Precision | 4-bit QLoRA + BF16 adapters | Same | | GPU | RTX PRO 6000 Blackwell (96GB) | Same | | Total Training Time | ~44 hours (both stages) |

Trainable Parameters

6,881,280 (0.02% of 35B total)

Usage with llama.cpp

llama-server \
  --model Carnice-MoE-35B-A3B-Q8_0.gguf \
  --n-gpu-layers -1 \
  --ctx-size 131072 \
  --host 0.0.0.0 --port 8082

Acknowledgements

  • kai-os — Carnice training methodology and Hermes traces dataset
  • open-thoughts — Agent SFT dataset
  • bespokelabs — Bespoke-Stratos reasoning dataset
  • Unsloth — QLoRA training framework
  • Qwen — Base model

Author: samuelcardillo

Likes: 4

Downloads: 0

Tags: hermes, gguf, qwen3.5, moe, agentic, tool-calling, qlora, unsloth, carnice, en, dataset:bespokelabs/Bespoke-Stratos-17k, dataset:AI-MO/NuminaMath-CoT, dataset:kai-os/carnice-glm5-hermes-traces, dataset:open-thoughts/OpenThoughts-Agent-v1-SFT, base_model:Qwen/Qwen3.5-35B-A3B, base_model:quantized:Qwen/Qwen3.5-35B-A3B, license:apache-2.0, endpoints_compatible, region:us, conversational

ntphuc149/ViLegalQwen3-1.7B-Base

Author: ntphuc149

Likes: 3

Downloads: 0

Tags: transformers, safetensors, qwen3, text-generation, vilegallm, legal, vietnamese, continual-pretraining, legal-nlp, conversational, vi, dataset:ntphuc149/ViLegalTexts, base_model:Qwen/Qwen3-1.7B-Base, base_model:finetune:Qwen/Qwen3-1.7B-Base, license:apache-2.0, text-generation-inference, endpoints_compatible, region:us

ntphuc149/ViLegalQwen2.5-1.5B-Base

Author: ntphuc149

Likes: 3

Downloads: 0

Tags: transformers, safetensors, qwen2, text-generation, vilegallm, legal, vietnamese, qwen2.5, continual-pretraining, legal-nlp, conversational, vi, dataset:ntphuc149/ViLegalTexts, base_model:Qwen/Qwen2.5-1.5B, base_model:finetune:Qwen/Qwen2.5-1.5B, license:apache-2.0, text-generation-inference, endpoints_compatible, region:us

F16/z-image-turbo-masked-dpo

Author: F16

Likes: 3

Downloads: 0

Tags: transformers, diffusion, dpo, flow-matching, masked-dpo, preference-optimization, lora, lokr, nsfw, image-generation, text-to-image, arxiv:2501.13918, base_model:Tongyi-MAI/Z-Image-Turbo, base_model:adapter:Tongyi-MAI/Z-Image-Turbo, license:apache-2.0, endpoints_compatible, region:us

LuffyTheFox/Qwen3.5-35B-A3B-Uncensored-FernflowerAI-safetensors


license: apache-2.0 tags:

  • uncensored
  • qwen3.5
  • moe
  • gguf
  • vision
  • multimodal language:
  • en
  • zh
  • multilingual pipeline_tag: image-text-to-text base_model: Qwen/Qwen3.5-35B-A3B

Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive (Repaired) -> FernflowerAI

Base model: HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive - 0/465 refusals. Safetensors version of base model: Li101/Qwen3.5-35B-A3B-Uncensored-Aggressive-safetensors

Tensor repair by me. Method: Sig-ScaleSync

Feel free to do your own quants if you want.

Verified on Gemma 4 26B A4B - 0 broken tensors found. Script doesn't invent false problems.
On Qwen 3.5 35B - found 2 real inconsistencies in output blocks, corrected by 88.6%.


Repair Summary

Overview

| Metric | Value | |--------|-------| | Total weight tensors | 1159 | | Healthy | 1157 | | C2-exempt (asymmetric, S<0.001) | 1146 | | Repaired (C2) | 2 | | Vision tensors (pass-through) | 333 | | Time (pass1 / pass2) | 493.6s / 108.6s | | Output size | 66.97 GB | | RAM used | 5.40 GB |

Repair Statistics

| | Value | |---|-------| | α (min / mean / max) | 0.6129 / 0.6143 / 0.6158 | | D (min / mean / max) | 0.4848 / 0.4872 / 0.4896 | | S before → after | 0.0025 → 0.0010 | | Error reduction | 88.6% |

Repaired Tensors

| Tensor | α | D | S (before) | S (after) | |--------|---|---|------------|-----------| | layers.37.linear_attn.conv1d.weight | 0.6129 | 0.490 | 0.0025 | 0.0010 | | layers.36.linear_attn.conv1d.weight | 0.6158 | 0.485 | 0.0025 | 0.0010 |


Usage

Ready to use. Recommended quantization: Q4_K_L, or higher (Q4_K_M, Q5_K_M, Q6_K, Q8_0).
⚠️ Lower formats (Q3_K, Q2_K) break the model due to MoE + DeltaNet sensitivity.

Links:


🌟 Recommended Settings (LM Studio)

Chat template: pastebin.com/uk9ZkxCR (supports tool calling for Zed agent)

| Parameter | Value | |-----------|-------| | Temperature | 0.7 | | Top K Sampling | 20 | | Presence Penalty | 1.5 | | Top P Sampling | 0.8 | | Min P Sampling | 0 | | Seed | 3407 |

System prompt: pastebin.com/pU25DVnB (solid)
Or use this minimal string as the first line:

You are Qwen, created by Alibaba Cloud. You are a helpful assistant.

Then add anything you want after. Model may underperform without this first line.

Also you can extend my System Prompt pastebin.com/pU25DVnB for your own roleplay scenarios. Here how you can do it:

Edit first string. Replace:

You are Qwen, created by Alibaba Cloud. You are a helpful assistant.

With

You are Qwen, created by Alibaba Cloud. You are a helpful assistant. Currently you are roleplaying as [your text here]


About

No changes to datasets or capabilities. Fully functional - 100% of what the original authors intended, just without refusals and with the critical architecture bug fixed on output layers.

These are meant to be the best lossless uncensored models out there.


Specs

  • 35B total parameters, ~3B active per forward pass (MoE)
  • 256 experts, 8 routed + 1 shared per token
  • Hybrid architecture: Gated DeltaNet linear attention + full softmax attention (3:1 ratio)
  • 40 layers, pattern: 10 × (3 × DeltaNet-MoE + 1 × Attention-MoE)
  • 262K native context (extendable to 1M with YaRN)
  • Natively multimodal (text, image, video)
  • Multi-token prediction (MTP) support
  • 248K vocabulary, 201 languages
  • Based on Qwen/Qwen3.5-35B-A3B

Recommended Settings (Official Qwen Authors)

Thinking mode (default):

  • General: temperature=1.0, top_p=0.95, top_k=20, min_p=0, presence_penalty=1.5
  • Coding/precise tasks: temperature=0.6, top_p=0.95, top_k=20, min_p=0, presence_penalty=0

Non-thinking mode:

  • General: temperature=0.7, top_p=0.8, top_k=20, min_p=0, presence_penalty=1.5
  • Reasoning tasks: temperature=1.0, top_p=1.0, top_k=40, min_p=0, presence_penalty=2.0

Important:

  • Keep at least 128K context to preserve thinking capabilities
  • Use --jinja flag with llama.cpp for proper chat template handling
  • Vision support requires the mmproj file alongside the main GGUF

Compatibility

Works with llama.cpp, LM Studio, koboldcpp, and other GGUF-compatible runtimes.

Author: LuffyTheFox

Likes: 3

Downloads: 0

Tags: safetensors, qwen3_5_moe, uncensored, qwen3.5, moe, gguf, vision, multimodal, image-text-to-text, conversational, en, zh, multilingual, base_model:Qwen/Qwen3.5-35B-A3B, base_model:finetune:Qwen/Qwen3.5-35B-A3B, license:apache-2.0, region:us

DuoNeural/Gemma-4-E4B-Claude-Abliterated-GGUF


license: apache-2.0 base_model: arsovskidev/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled tags:

  • code
  • gemma4
  • abliterated
  • gguf
  • unsloth
  • 4bit
  • uncensored
  • claude-distilled library_name: transformers

Gemma 4 E4B Claude Abliterated GGUF (4-bit)

Model Description

This repository contains an abliterated version of the Gemma 4 E4B Claude-4.6-Opus-Reasoning-Distilled model. This version has undergone "abliteration" to neutralize safety refusal vectors while preserving its high-quality Claude-distilled reasoning and front-end engineering capabilities.

Abliteration Results

  • Method: Norm-preserving biprojection (orthogonalization).
  • Final Refusal Rate: Verified Low (Evaluation in progress).
  • KL Divergence: 0.0410 (Extremely low, indicating high fidelity to the distilled model).
  • Technique: EGA-compatible abliteration via patched heretic-llm.

Quantization Details

  • Quantization Format: GGUF (q4_k_m)
  • Quantization Method: llama.cpp / Unsloth
  • Precision: 4-bit

Use with Ollama

ollama run hf.co/DuoNeural/Gemma-4-E4B-Claude-Abliterated-GGUF

Use with LM Studio

  1. Open LM Studio.
  2. Search for DuoNeural/Gemma-4-E4B-Claude-Abliterated-GGUF.
  3. Load the Q4_K_M GGUF.

Architecture

Gemma 4 E4B features 4.5B effective parameters, optimized for intelligence-per-parameter and complex reasoning tasks.

Disclaimer

This model has had its safety refusals modified. Users are responsible for ensuring the model is used ethically and in accordance with applicable laws.

Author: DuoNeural

Likes: 3

Downloads: 0

Tags: transformers, gguf, code, gemma4, abliterated, unsloth, 4bit, uncensored, claude-distilled, base_model:arsovskidev/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled, base_model:quantized:arsovskidev/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled, license:apache-2.0, endpoints_compatible, region:us, conversational