Todays AI Summary

ByteDance Seed Team Releases New Long-Context LLMs

ByteDance's Seed Team has unveiled a new series of open-source large language models (LLMs) called Seed-OSS, designed for long-context understanding, reasoning, and agentic capabilities. The models are released under the Apache 2.0 license.

Noteworthy Papers

  • ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents: This paper introduces ComputerRL, a framework for autonomous desktop intelligence. It uses the API-GUI paradigm and a distributed RL infrastructure to train agents on complex desktop tasks. The AutoGLM-OS-9B model, based on GLM-4-9B-0414, achieves a new state-of-the-art accuracy of 48.1% on the OSWorld benchmark.
  • Unintended Misalignment from Agentic Fine-Tuning: Risks and Mitigation: This paper highlights the safety risks associated with fine-tuning LLMs for agentic tasks, showing that aligned LLMs can become unintentionally misaligned. It proposes Prefix INjection Guard (PING) to mitigate these risks by prepending natural language prefixes to agent responses.
  • Chunks as Arms: Multi-Armed Bandit-Guided Sampling for Long-Context LLM Preference Optimization: This paper introduces LongMab-PO, a framework that uses a Multi-Armed Bandit (MAB) rollout strategy to identify informative chunks from long contexts for sampling high-quality responses and constructing preference data pairs for Direct Preference Optimization (DPO) training.

New Models

  • ByteDance-Seed/Seed-OSS-36B-Instruct: This instruction-tuned model, with 61 likes, is part of the Seed-OSS series and is designed for international use cases. It features flexible control of thinking budget, enhanced reasoning, agentic intelligence, and native long context (up to 512K tokens). It achieves strong performance across various benchmarks, including knowledge, math, reasoning, coding, instruction following, agent, multilingualism, and long context tasks.
  • ByteDance-Seed/Seed-OSS-36B-Base-woSyn: This base model, with 25 likes, is trained without synthetic instruction data, offering the research community a high-performance foundation model unaffected by such data.
  • ByteDance-Seed/Seed-OSS-36B-Base: This base model, with 12 likes, incorporates synthetic instruction data into pretraining, leading to improved performance on most benchmarks.

Key Takeaways

  • ByteDance's Seed-OSS models offer a new option for developers seeking powerful, long-context LLMs with strong reasoning and agentic capabilities.
  • The Seed-OSS-36B-Instruct model stands out with its flexible control of thinking budget, allowing users to adjust reasoning length as needed.
  • The release of both base models (with and without synthetic data) caters to the research community by providing diverse options for further exploration.

AI Papers for 2026-04-12

Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models

The advent of agentic multimodal models has empowered systems to actively interact with external environments. However, current agents suffer from a profound meta-cognitive deficit: they struggle to arbitrate between leveraging internal knowledge and querying external utilities. Consequently, they frequently fall prey to blind tool invocation, resorting to reflexive tool execution even when queries are resolvable from the raw visual context. This pathological behavior precipitates severe latency bottlenecks and injects extraneous noise that derails sound reasoning. Existing reinforcement learning protocols attempt to mitigate this via a scalarized reward that penalizes tool usage. Yet, this coupled formulation creates an irreconcilable optimization dilemma: an aggressive penalty suppresses essential tool use, whereas a mild penalty is entirely subsumed by the variance of the accuracy reward during advantage normalization, rendering it impotent against tool overuse. To transcend this bottleneck, we propose HDPO, a framework that reframes tool efficiency from a competing scalar objective to a strictly conditional one. By eschewing reward scalarization, HDPO maintains two orthogonal optimization channels: an accuracy channel that maximizes task correctness, and an efficiency channel that enforces execution economy exclusively within accurate trajectories via conditional advantage estimation. This decoupled architecture naturally induces a cognitive curriculum-compelling the agent to first master task resolution before refining its self-reliance. Extensive evaluations demonstrate that our resulting model, Metis, reduces tool invocations by orders of magnitude while simultaneously elevating reasoning accuracy.

SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds

Robotic manipulation with deformable objects represents a data-intensive regime in embodied learning, where shape, contact, and topology co-evolve in ways that far exceed the variability of rigids. Although simulation promises relief from the cost of real-world data acquisition, prevailing sim-to-real pipelines remain rooted in rigid-body abstractions, producing mismatched geometry, fragile soft dynamics, and motion primitives poorly suited for cloth interaction. We posit that simulation fails not for being synthetic, but for being ungrounded. To address this, we introduce SIM1, a physics-aligned real-to-sim-to-real data engine that grounds simulation in the physical world. Given limited demonstrations, the system digitizes scenes into metric-consistent twins, calibrates deformable dynamics through elastic modeling, and expands behaviors via diffusion-based trajectory generation with quality filtering. This pipeline transforms sparse observations into scaled synthetic supervision with near-demonstration fidelity. Experiments show that policies trained on purely synthetic data achieve parity with real-data baselines at a 1:15 equivalence ratio, while delivering 90% zero-shot success and 50% generalization gains in real-world deployment. These results validate physics-aligned simulation as scalable supervision for deformable manipulation and a practical pathway for data-efficient policy learning.

Seeing but Not Thinking: Routing Distraction in Multimodal Mixture-of-Experts

Multimodal Mixture-of-Experts (MoE) models have achieved remarkable performance on vision-language tasks. However, we identify a puzzling phenomenon termed Seeing but Not Thinking: models accurately perceive image content yet fail in subsequent reasoning, while correctly solving identical problems presented as pure text. Through systematic analysis, we first verify that cross-modal semantic sharing exists in MoE architectures, ruling out semantic alignment failure as the sole explanation. We then reveal that visual experts and domain experts exhibit layer-wise separation, with image inputs inducing significant routing divergence from text inputs in middle layers where domain experts concentrate. Based on these findings, we propose the Routing Distraction hypothesis: when processing visual inputs, the routing mechanism fails to adequately activate task-relevant reasoning experts. To validate this hypothesis, we design a routing-guided intervention method that enhances domain expert activation. Experiments on three multimodal MoE models across six benchmarks demonstrate consistent improvements, with gains of up to 3.17% on complex visual reasoning tasks. Our analysis further reveals that domain expert identification locates cognitive functions rather than sample-specific solutions, enabling effective transfer across tasks with different information structures.

AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation

Text-to-Audio-Video (T2AV) generation is rapidly becoming a core interface for media creation, yet its evaluation remains fragmented. Existing benchmarks largely assess audio and video in isolation or rely on coarse embedding similarity, failing to capture the fine-grained joint correctness required by realistic prompts. We introduce AVGen-Bench, a task-driven benchmark for T2AV generation featuring high-quality prompts across 11 real-world categories. To support comprehensive assessment, we propose a multi-granular evaluation framework that combines lightweight specialist models with Multimodal Large Language Models (MLLMs), enabling evaluation from perceptual quality to fine-grained semantic controllability. Our evaluation reveals a pronounced gap between strong audio-visual aesthetics and weak semantic reliability, including persistent failures in text rendering, speech coherence, physical reasoning, and a universal breakdown in musical pitch control. Code and benchmark resources are available at http://aka.ms/avgenbench.

OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks

Group Relative Policy Optimization (GRPO) has emerged as the de facto Reinforcement Learning (RL) objective driving recent advancements in Multimodal Large Language Models. However, extending this success to open-source multimodal generalist models remains heavily constrained by two primary challenges: the extreme variance in reward topologies across diverse visual tasks, and the inherent difficulty of balancing fine-grained perception with multi-step reasoning capabilities. To address these issues, we introduce Gaussian GRPO (G$^2$RPO), a novel RL training objective that replaces standard linear scaling with non-linear distributional matching. By mathematically forcing the advantage distribution of any given task to strictly converge to a standard normal distribution, $\mathcal{N}(0,1)$, G$^2$RPO theoretically ensures inter-task gradient equity, mitigates vulnerabilities to heavy-tail outliers, and offers symmetric update for positive and negative rewards. Leveraging the enhanced training stability provided by G$^2$RPO, we introduce two task-level shaping mechanisms to seamlessly balance perception and reasoning. First, response length shaping dynamically elicits extended reasoning chains for complex queries while enforce direct outputs to bolster visual grounding. Second, entropy shaping tightly bounds the model's exploration zone, effectively preventing both entropy collapse and entropy explosion. Integrating these methodologies, we present OpenVLThinkerV2, a highly robust, general-purpose multimodal model. Extensive evaluations across 18 diverse benchmarks demonstrate its superior performance over strong open-source and leading proprietary frontier models.

RewardFlow: Generate Images by Optimizing What You Reward

We introduce RewardFlow, an inversion-free framework that steers pretrained diffusion and flow-matching models at inference time through multi-reward Langevin dynamics. RewardFlow unifies complementary differentiable rewards for semantic alignment, perceptual fidelity, localized grounding, object consistency, and human preference, and further introduces a differentiable VQA-based reward that provides fine-grained semantic supervision through language-vision reasoning. To coordinate these heterogeneous objectives, we design a prompt-aware adaptive policy that extracts semantic primitives from the instruction, infers edit intent, and dynamically modulates reward weights and step sizes throughout sampling. Across several image editing and compositional generation benchmarks, RewardFlow delivers state-of-the-art edit fidelity and compositional alignment.

PSI: Shared State as the Missing Layer for Coherent AI-Generated Instruments in Personal AI Agents

Personal AI tools can now be generated from natural-language requests, but they often remain isolated after creation. We present PSI, a shared-state architecture that turns independently generated modules into coherent instruments: persistent, connected, and chat-complementary artifacts accessible through both GUIs and a generic chat agent. By publishing current state and write-back affordances to a shared personal-context bus, modules enable cross-module reasoning and synchronized actions across interfaces. We study PSI through a three-week autobiographical deployment in a self-developed personal AI environment and show that later-generated instruments can be integrated automatically through the same contract. PSI identifies shared state as the missing systems layer that transforms AI-generated personal software from isolated apps into coherent personal computing environments.

Ads in AI Chatbots? An Analysis of How Large Language Models Navigate Conflicts of Interest

Today's large language models (LLMs) are trained to align with user preferences through methods such as reinforcement learning. Yet models are beginning to be deployed not merely to satisfy users, but also to generate revenue for the companies that created them through advertisements. This creates the potential for LLMs to face conflicts of interest, where the most beneficial response to a user may not be aligned with the company's incentives. For instance, a sponsored product may be more expensive but otherwise equal to another; in this case, what does (and should) the LLM recommend to the user? In this paper, we provide a framework for categorizing the ways in which conflicting incentives might lead LLMs to change the way they interact with users, inspired by literature from linguistics and advertising regulation. We then present a suite of evaluations to examine how current models handle these tradeoffs. We find that a majority of LLMs forsake user welfare for company incentives in a multitude of conflict of interest situations, including recommending a sponsored product almost twice as expensive (Grok 4.1 Fast, 83%), surfacing sponsored options to disrupt the purchasing process (GPT 5.1, 94%), and concealing prices in unfavorable comparisons (Qwen 3 Next, 24%). Behaviors also vary strongly with levels of reasoning and users' inferred socio-economic status. Our results highlight some of the hidden risks to users that can emerge when companies begin to subtly incentivize advertisements in chatbots.

What Drives Representation Steering? A Mechanistic Case Study on Steering Refusal

Applying steering vectors to large language models (LLMs) is an efficient and effective model alignment technique, but we lack an interpretable explanation for how it works-- specifically, what internal mechanisms steering vectors affect and how this results in different model outputs. To investigate the causal mechanisms underlying the effectiveness of steering vectors, we conduct a comprehensive case study on refusal. We propose a multi-token activation patching framework and discover that different steering methodologies leverage functionally interchangeable circuits when applied at the same layer. These circuits reveal that steering vectors primarily interact with the attention mechanism through the OV circuit while largely ignoring the QK circuit-- freezing all attention scores during steering drops performance by only 8.75% across two model families. A mathematical decomposition of the steered OV circuit further reveals semantically interpretable concepts, even in cases where the steering vector itself does not. Leveraging the activation patching results, we show that steering vectors can be sparsified by up to 90-99% while retaining most performance, and that different steering methodologies agree on a subset of important dimensions.

ClawBench: Can AI Agents Complete Everyday Online Tasks?

AI agents may be able to automate your inbox, but can they automate other routine aspects of your life? Everyday online tasks offer a realistic yet unsolved testbed for evaluating the next generation of AI agents. To this end, we introduce ClawBench, an evaluation framework of 153 simple tasks that people need to accomplish regularly in their lives and work, spanning 144 live platforms across 15 categories, from completing purchases and booking appointments to submitting job applications. These tasks require demanding capabilities beyond existing benchmarks, such as obtaining relevant information from user-provided documents, navigating multi-step workflows across diverse platforms, and write-heavy operations like filling in many detailed forms correctly. Unlike existing benchmarks that evaluate agents in offline sandboxes with static pages, ClawBench operates on production websites, preserving the full complexity, dynamic nature, and challenges of real-world web interaction. A lightweight interception layer captures and blocks only the final submission request, ensuring safe evaluation without real-world side effects. Our evaluations of 7 frontier models show that both proprietary and open-source models can complete only a small portion of these tasks. For example, Claude Sonnet 4.6 achieves only 33.3%. Progress on ClawBench brings us closer to AI agents that can function as reliable general-purpose assistants.

AI Models

Jiunsong/supergemma4-26b-uncensored-gguf-v2


license: gemma base_model: google/gemma4-26b-it base_model_relation: quantized pipeline_tag: text-generation tags:

  • gguf
  • gemma4
  • uncensored
  • llama.cpp
  • q4_k_m
  • local-llm
  • tool-use
  • browser-automation
  • reasoning
  • korean language:
  • en
  • ko

supergemma4-26b-uncensored-gguf-v2

A fully uncensored GGUF release of SuperGemma 4 26B for llama.cpp-style local inference.

This is the GGUF companion to supergemma4-26b-uncensored-mlx-4bit-v2, keeping the same focus on direct answers, tool use, browser tasks, reasoning, and Korean technical output.

Why This GGUF Is Useful

  • Fully uncensored: optimized for direct, non-refusal local use
  • Agent-friendly: strong tool-use and browser-task behavior
  • Reasoning-focused: strong logic and constraint performance
  • Korean-capable: solid Korean technical and bilingual output
  • Portable: ready for llama.cpp-compatible runtimes and desktop setups

Local Benchmark Snapshot

Measured from the current MLX release on a MacBook Pro M4 Max 128GB.

| Metric | Score | |---|---:| | Overall quick score | 97.9 | | Code | 98.6 | | Browser | 92.9 | | Logic | 100.0 | | System Design | 100.0 | | Korean | 98.3 |

Included File

  • supergemma4-26b-uncensored-v2-Q4_K_M.gguf

Serving Note

This GGUF is intended for regular local chat, coding, and agent-style workloads in llama.cpp-compatible runtimes.

Author: Jiunsong

Likes: 39

Downloads: 0

Tags: gguf, gemma4, uncensored, llama.cpp, q4_k_m, local-llm, tool-use, browser-automation, reasoning, korean, text-generation, en, ko, license:gemma, endpoints_compatible, region:us, conversational

huihui-ai/Huihui-gemma-4-26B-A4B-it-abliterated


library_name: transformers license: apache-2.0 license_link: https://ai.google.dev/gemma/docs/gemma_4_license pipeline_tag: any-to-any base_model:

  • google/gemma-4-26B-A4B tags:
  • abliterated
  • uncensored

huihui-ai/Huihui-gemma-4-26B-A4B-abliterated

This is an uncensored version of google/gemma-4-26B-A4B created with abliteration (see remove-refusals-with-transformers to know more about it). This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.

Note For this model, both the thinking mode and the non-thinking mode have been completely abliterated. the first 5 layers have not been abliterated, may contain warning information, but no refusal will be made..

No ablation was performed on the 256 experts per layer.

ollama

Please use the latest version of ollama

You can use huihui_ai/gemma-4-abliterated:26b directly,

ollama run huihui_ai/gemma-4-abliterated:26b

Usage Warnings

  • Risk of Sensitive or Controversial Outputs: This model’s safety filtering has been significantly reduced, potentially generating sensitive, controversial, or inappropriate content. Users should exercise caution and rigorously review generated outputs.

  • Not Suitable for All Audiences: Due to limited content filtering, the model’s outputs may be inappropriate for public settings, underage users, or applications requiring high security.

  • Legal and Ethical Responsibilities: Users must ensure their usage complies with local laws and ethical standards. Generated content may carry legal or ethical risks, and users are solely responsible for any consequences.

  • Research and Experimental Use: It is recommended to use this model for research, testing, or controlled environments, avoiding direct use in production or public-facing commercial applications.

  • Monitoring and Review Recommendations: Users are strongly advised to monitor model outputs in real-time and conduct manual reviews when necessary to prevent the dissemination of inappropriate content.

  • No Default Safety Guarantees: Unlike standard models, this model has not undergone rigorous safety optimization. huihui.ai bears no responsibility for any consequences arising from its use.

Donation

Your donation helps us continue our further development and improvement, a cup of coffee can do it.
  • bitcoin:
  bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge
  • Support our work on Ko-fi!

Author: huihui-ai

Likes: 11

Downloads: 0

Tags: transformers, safetensors, gemma4, image-text-to-text, abliterated, uncensored, any-to-any, base_model:google/gemma-4-26B-A4B, base_model:finetune:google/gemma-4-26B-A4B, license:apache-2.0, endpoints_compatible, region:us

Ex0bit/MYTHOS-26B-A4B-PRISM-PRO-DQ-MLX


license: apache-2.0 base_model: google/gemma-4-26B-A4B-it base_model_relation: finetune tags:

  • gemma4
  • gemma
  • google
  • mlx
  • apple-silicon
  • moe
  • mixture-of-experts
  • zero-refusals
  • prism-dq
  • dynamic-quantization
  • multimodal
  • vision
  • video-text-to-text
  • image-text-to-text
  • abliterated
  • text-generation language:
  • en pipeline_tag: image-text-to-text library_name: mlx quantized_by: Ex0bit

Parameters Format Quant Multimodal

<div align="center"> <img src="https://cdn-uploads.huggingface.co/production/uploads/63adf1fa42fd3b8dbaeb0c92/onzfqZmEuOqedGOcyedHc.png" width="800"> </div>

MYTHOS-26B-A4B — PRISM Dynamic Quantization (MLX)

Gemma 4 26B-A4B MoE PRISM-PRO-Dynamic-Quant for Apple Silicon

  • PRISM-PRO: Production model with full over-refusal and bias mechanisms completely removed using State of the Art PRISM pipeline.
  • DQ: Per-tensor-class mixed-precision allocation derived entirely from weight structure sensitivity analysis — not closed-gated datasets.

Created by Ex0bit


<div align="center">

💡Support My Research & Development efforts. Members Receive access to the latest PRISM-PRO Model drops on Day-0

Ko-fi

</div>

Model Details

| Property | Value | |----------|-------| | Base Model | google/gemma-4-26B-A4B-it | | Architecture | Gemma 4 MoE (128 experts, top-8 routing) | | Parameters | 26B total / 4B active per token | | Quantization | PRISM-PRO-DYNAMIC-QUANT (MLX native) | | Achieved BPW | 6.52 | | File Size | ~20 GB | | Context Length | 262,144 tokens | | Modalities | Text, Image, Video | | Runtime | mlx-vlm (Apple Silicon Metal) | | Creator | Ex0bit |

Supported Modalities

  • Text: Full instruction-following and chat
  • Image: Vision understanding via SigLIP encoder (280 soft tokens per image)
  • Video: Gemma4VideoProcessor (32 frames, pooled)

Note: This 26B MoE variant does not include audio support. For audio, see the 31B dense variant.

PRISM-DQ Quantization

This MLX model uses PRISM-PRO Dynamic Quantization — a per-tensor-class mixed-precision allocation that assigns different quantization types to different tensor classes based on weight structure sensitivity.

Unlike uniform quantization (Q4, Q6, Q8), PRISM-DQ analyzes each tensor class's sensitivity and allocates precision where it matters most. Attention projections receive higher precision than FFN layers, with block-level overrides that protect critical layers.

The model's config.json contains per-tensor quantization overrides that mlx-vlm loads natively — no custom runtime required. The compiled Metal kernels automatically handle mixed-precision tensors in a single forward pass at full GPU speed.

Usage

mlx-vlm (CLI)

pip install mlx-vlm

# Interactive chat
mlx_vlm.chat --model Ex0bit/MYTHOS-26B-A4B-PRISM-PRO-DQ-MLX \
  --temperature 0.7 --max-tokens 2048 --max-kv-size 8192

# Vision prompt
python -m mlx_vlm.generate \
  --model Ex0bit/MYTHOS-26B-A4B-PRISM-PRO-DQ-MLX \
  --image path/to/image.jpg \
  --prompt "Describe this image in detail." \
  --max-tokens 500

Python API

from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template

model, processor = load("Ex0bit/MYTHOS-26B-A4B-PRISM-PRO-DQ-MLX")
config = model.config

prompt = apply_chat_template(
    processor, config,
    "Describe this scene.",
    num_images=1
)
response = generate(
    model, processor, prompt,
    image=["path/to/image.jpg"],
    max_tokens=500, temperature=0.7
)
print(response)

Refusal & Bias Removal

This model has been treated to remove bias, over-refusals and propaganda from the base google/gemma-4-26B-A4B-it using the State of The Art PRISM pipeline.

License

Apache 2.0 (inherited from google/gemma-4-26B-A4B-it)

Credits

Author: Ex0bit

Likes: 9

Downloads: 0

Tags: mlx, safetensors, gemma4, gemma, google, apple-silicon, moe, mixture-of-experts, zero-refusals, prism-dq, dynamic-quantization, multimodal, vision, video-text-to-text, image-text-to-text, abliterated, text-generation, conversational, en, base_model:google/gemma-4-26B-A4B-it, base_model:finetune:google/gemma-4-26B-A4B-it, license:apache-2.0, 6-bit, region:us

Ex0bit/MYTHOS-26B-A4B-PRISM-PRO-DQ-GGUF


license: apache-2.0 base_model: google/gemma-4-26B-A4B-it base_model_relation: finetune tags:

  • gemma4
  • gemma
  • google
  • gguf
  • moe
  • mixture-of-experts
  • zero-refusals
  • prism-dq
  • dynamic-quantization
  • multimodal
  • vision
  • video-text-to-text
  • image-text-to-text
  • abliterated
  • text-generation language:
  • en pipeline_tag: image-text-to-text library_name: llama.cpp quantized_by: Ex0bit

Parameters Format Quant Multimodal

<div align="center"> <img src="https://cdn-uploads.huggingface.co/production/uploads/63adf1fa42fd3b8dbaeb0c92/onzfqZmEuOqedGOcyedHc.png" width="800"> </div>

MYTHOS-26B-A4B — PRISM Dynamic Quantization (GGUF)

Gemma 4 26B-A4B MoE PRISM-PRO-Dynamic-Quant

  • PRISM-PRO: Production model with full over-refusal and bias mechanisms completely removed using State of the Art PRISM pipeline.
  • DQ: Per-tensor-class mixed-precision allocation derived entirely from weight structure sensitivity analysis — not closed-gated datasets.

Created by Ex0bit


<div align="center">

💡 Support My Research & Development efforts. Members Receive access to the latest PRISM-PRO Model drops on Day-0

Ko-fi

</div>

Model Details

| Property | Value | |----------|-------| | Base Model | google/gemma-4-26B-A4B-it | | Architecture | Gemma 4 MoE (128 experts, top-8 routing) | | Parameters | 26B total / 4B active per token | | Quantization | PRISM-PRO-DYNAMIC-QUANT | | Achieved BPW | 5.73 | | File Size | ~17 GB (language) + ~1.2 GB (vision projector) | | Context Length | 262,144 tokens | | Modalities | Text, Image, Video | | Creator | Ex0bit |

Supported Modalities

  • Text: Full instruction-following and chat
  • Image: Vision understanding via SigLIP encoder (280 soft tokens per image)
  • Video: Gemma4VideoProcessor (32 frames, pooled)

Note: This 26B MoE variant does not include audio support. For audio, see the 31B dense variant.

Files

| File | Size | Purpose | |------|------|---------| | mythos-26b-a4b-prism-pro-dq.gguf | 17 GB | Language model (quantized) | | mmproj-mythos-26b-a4b-prism-pro.gguf | 1.2 GB | Vision projector (F16) |

Both files are required for multimodal inference. For text-only use, only the language model file is needed.

PRISM-DQ Quantization

This model uses PRISM-PRO Dynamic Quantization — a per-tensor-class mixed-precision allocation that assigns different quantization types to different tensor classes based on weight structure sensitivity.

Unlike uniform quantization (Q4_K_M, Q5_K_M), PRISM-DQ analyzes each tensor class's sensitivity and allocates precision where it matters most. Attention projections receive higher precision than FFN layers, with block-level overrides that protect critical layers.

The result: BF16-equivalent quality at 5.73 bits-per-weight — a 64% size reduction with zero measurable quality loss.

Usage

llama.cpp (multimodal with vision)

llama-mtmd-cli \
  --model mythos-26b-a4b-prism-pro-dq.gguf \
  --mmproj mmproj-mythos-26b-a4b-prism-pro.gguf \
  --image path/to/image.jpg \
  --prompt "Describe this image." \
  -ngl 99

llama.cpp (text-only server)

llama-server \
  --model mythos-26b-a4b-prism-pro-dq.gguf \
  --port 8080 -ngl 99

LM Studio

Download both mythos-26b-a4b-prism-pro-dq.gguf and mmproj-mythos-26b-a4b-prism-pro.gguf. LM Studio will automatically detect the vision projector for multimodal chat.

Refusal & Bias Removal

This model has been treated to remove bias, over-refusals and propaganda from the base google/gemma-4-26B-A4B-it using the State of The Art PRISM pipeline.

License

Apache 2.0 (inherited from google/gemma-4-26B-A4B-it)

Credits

Author: Ex0bit

Likes: 8

Downloads: 0

Tags: llama.cpp, gguf, gemma4, gemma, google, moe, mixture-of-experts, zero-refusals, prism-dq, dynamic-quantization, multimodal, vision, video-text-to-text, image-text-to-text, abliterated, text-generation, en, base_model:google/gemma-4-26B-A4B-it, base_model:finetune:google/gemma-4-26B-A4B-it, license:apache-2.0, endpoints_compatible, region:us, conversational

huihui-ai/Huihui-gemma-4-31B-it-abliterated-v2


library_name: transformers license: apache-2.0 license_link: https://ai.google.dev/gemma/docs/gemma_4_license pipeline_tag: any-to-any base_model:

  • google/gemma-4-31B-it tags:
  • abliterated
  • uncensored

huihui-ai/Huihui-gemma-4-31B-it-abliterated-v2

This is an uncensored version of google/gemma-4-31B-it created with abliteration (see remove-refusals-with-transformers to know more about it). This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.

Note This is the new version, the first 5 layers have not been abliterated, with fewer warnings and rejections. You can refer to the PPL values below, which are lower than those of the original model:

model | PPL | Gap -------------------------------------------------------------------|-------------------------------|---------- google/gemma-4-31B-it/ggml-model-f16.gguf | PPL = 14874.7532 +/- 272.53240| + 0
huihui-ai/Huihui-gemma-4-31B-it-abliterated/ggml-model-f16.gguf | PPL = 13335.5546 +/- 239.28283| - 1,539.1986 huihui-ai/Huihui-gemma-4-31B-it-abliterated-v2/ggml-model-f16.gguf | PPL = 13161.2940 +/- 236.60548| - 1,713.4592 - 174.2606

The smaller the PPL value, the higher the model quality.

PPL value test command

llama-perplexity gemma-4-31B-it/ggml-model-f16.gguf -c 8192 -b 512 -ctk bf16 -ctv bf16 -f wiki.test.raw

References

llama-perplexity
llama.cpp-b8740

ollama

Please use the latest version of ollama

You can use huihui_ai/gemma-4-abliterated:31B directly,

ollama run huihui_ai/gemma-4-abliterated:31B

Usage Warnings

  • Risk of Sensitive or Controversial Outputs: This model’s safety filtering has been significantly reduced, potentially generating sensitive, controversial, or inappropriate content. Users should exercise caution and rigorously review generated outputs.

  • Not Suitable for All Audiences: Due to limited content filtering, the model’s outputs may be inappropriate for public settings, underage users, or applications requiring high security.

  • Legal and Ethical Responsibilities: Users must ensure their usage complies with local laws and ethical standards. Generated content may carry legal or ethical risks, and users are solely responsible for any consequences.

  • Research and Experimental Use: It is recommended to use this model for research, testing, or controlled environments, avoiding direct use in production or public-facing commercial applications.

  • Monitoring and Review Recommendations: Users are strongly advised to monitor model outputs in real-time and conduct manual reviews when necessary to prevent the dissemination of inappropriate content.

  • No Default Safety Guarantees: Unlike standard models, this model has not undergone rigorous safety optimization. huihui.ai bears no responsibility for any consequences arising from its use.

Donation

Your donation helps us continue our further development and improvement, a cup of coffee can do it.
  • bitcoin:
  bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge
  • Support our work on Ko-fi!

Author: huihui-ai

Likes: 5

Downloads: 0

Tags: transformers, abliterated, uncensored, any-to-any, base_model:google/gemma-4-31B-it, base_model:finetune:google/gemma-4-31B-it, license:apache-2.0, endpoints_compatible, region:us

groxaxo/Huihui-gemma-4-26B-A4B-it-abliterated-GGUF


base_model:

  • huihui-ai/Huihui-gemma-4-26B-A4B-it-abliterated tags:
  • gemma
  • '4'
  • abliterated
  • huihui
  • 26b
  • a4b
  • it
  • heretic
  • uncensored

huihui-ai/Huihui-gemma-4-26B-A4B-abliterated

This is an GGUF Imatrix uncensored version of google/gemma-4-26B-A4B created with abliteration (see remove-refusals-with-transformers to know more about it). This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.

Note For this model, both the thinking mode and the non-thinking mode have been completely abliterated. the first 5 layers have not been abliterated, may contain warning information, but no refusal will be made..

No ablation was performed on the 256 experts per layer.

Usage Warnings

  • Risk of Sensitive or Controversial Outputs: This model’s safety filtering has been significantly reduced, potentially generating sensitive, controversial, or inappropriate content. Users should exercise caution and rigorously review generated outputs.

  • Not Suitable for All Audiences: Due to limited content filtering, the model’s outputs may be inappropriate for public settings, underage users, or applications requiring high security.

  • Legal and Ethical Responsibilities: Users must ensure their usage complies with local laws and ethical standards. Generated content may carry legal or ethical risks, and users are solely responsible for any consequences.

  • Research and Experimental Use: It is recommended to use this model for research, testing, or controlled environments, avoiding direct use in production or public-facing commercial applications.

  • Monitoring and Review Recommendations: Users are strongly advised to monitor model outputs in real-time and conduct manual reviews when necessary to prevent the dissemination of inappropriate content.

  • No Default Safety Guarantees: Unlike standard models, this model has not undergone rigorous safety optimization. huihui.ai bears no responsibility for any consequences arising from its use.

Author: groxaxo

Likes: 5

Downloads: 0

Tags: gguf, gemma, 4, abliterated, huihui, 26b, a4b, it, heretic, uncensored, base_model:huihui-ai/Huihui-gemma-4-26B-A4B-it-abliterated, base_model:quantized:huihui-ai/Huihui-gemma-4-26B-A4B-it-abliterated, endpoints_compatible, region:us, imatrix, conversational

LH-Tech-AI/GyroScope


license: apache-2.0 tags:

  • image-classification
  • rotation-prediction
  • resnet
  • pytorch
  • vision datasets:
  • ILSVRC/imagenet-1k pipeline_tag: image-classification library_name: transformers

🔄 GyroScope — Image Rotation Prediction

GyroScope is a ResNet-18 trained from scratch to detect whether an image is rotated by 0°, 90°, 180°, or 270° — and correct it automatically.

Is that photo upside down? Let GyroScope figure it out.


🎯 Task

Given any image, GyroScope classifies its orientation into one of 4 classes:

| Label | Meaning | Correction | |-------|---------|------------| | 0 | 0° — upright ✅ | None | | 1 | 90° CCW | Rotate 270° CCW | | 2 | 180° — upside down | Rotate 180° | | 3 | 270° CCW (= 90° CW) | Rotate 90° CCW |

Correction formula: correction = (360 − detected_angle) % 360


📊 Benchmarks

Trained on 50,000 images from ImageNet-1k × 4 rotations = 200k training samples.
Validated on 5,000 images × 4 rotations = 20k validation samples.

| Metric | Value | |--------|-------| | Overall Val Accuracy | 79.81%% | | Per-class: 0° (upright) | 79.8% | | Per-class: 90° CCW | 80.1% | | Per-class: 180° | 79.4% | | Per-class: 270° CCW | 79.8% | | Training Epochs | 12 | | Training Time | ~4h (Kaggle T4 GPU) |

Training Curve

| Epoch | Train Acc | Val Acc | |-------|----------|---------| | 1 | 41.4% | 43.2% | | 2 | 52.0% | 46.9% | | 3 | 59.4% | 62.8% | | 4 | 64.1% | 66.0% | | 5 | 67.8% | 69.48% | | 6 | 70.6% | 72.22% | | 7 | 73.3% | 74.25% | | 8 | 75.6% | 76.49% | | 9 | 77.5% | 77.47% | | 10 | 79.1% | 79.47% | | 11 | 80.3% | 79.78% | | 12 | 80.9% | 79.81% |


🏗️ Architecture

| Detail | Value | |--------|-------| | Base | ResNet-18 (from scratch, no pretrained weights) | | Parameters | 11.2M | | Input | 224 × 224 RGB | | Output | 4 classes (0°, 90°, 180°, 270°) | | Framework | 🤗 Hugging Face Transformers (ResNetForImageClassification) |

Training Details

  • Optimizer: AdamW (lr=1e-3, weight_decay=0.05)
  • Scheduler: Cosine annealing with 1-epoch linear warmup
  • Loss: CrossEntropy with label smoothing (0.1)
  • Augmentations: RandomCrop, ColorJitter, RandomGrayscale, RandomErasing
  • ⚠️ No flips — horizontal/vertical flips would corrupt rotation labels
  • Mixed precision: FP16 via torch.cuda.amp

🚀 Quick Start

Installation

pip install transformers torch torchvision pillow requests

Inference — Single Image from URL

python3 use_with_UI.py

--> Download use_with_UI.py first 😄

💡 Example

Input (rotated 180°):

cat image, rotated to the left by 90°

GyroScope Output: 📐 Recognized: 90° | Correction: 270° 📊 Probs: {'0°': '0.0257', '90°': '0.8706', '180°': '0.0735', '270°': '0.0300'} <br> Corrected:

cat image, now correctly rotated

Original Image Source: Link to Pexels

⚠️ Limitations

  • Rotationally symmetric images (balls, textures, patterns) are inherently ambiguous — no model can reliably classify these.
  • Trained on natural images (ImageNet). Performance may degrade on:
  • Documents / text-heavy images
  • Medical imaging
  • Satellite / aerial imagery
  • Abstract art

Only handles 90° increments — arbitrary angles (e.g. 45° or 135°) are not supported! Trained from scratch on 50k images — a pretrained backbone would likely yield higher accuracy (Finetuning).

📝 Use Cases

  • 📸 Photo management — auto-correct phone/camera orientation
  • 🗂️ Data preprocessing — fix rotated images in scraped datasets
  • 🤖 ML pipelines — orientation normalization before feeding to downstream models
  • 🖼️ Digital archives — batch-correct scanned/uploaded images

Yesterday, I was sorting photos and like every photo was rotated wrong! This inspired me to make this tool 😂

💻 Training code

The full training code can be found in train.py. Have fun 😊

📜 License

Apache 2.0

🙏 Acknowledgments

  • Dataset: ILSVRC/ImageNet-1k
  • Architecture: Microsoft ResNet via 🤗 Transformers
  • Trained on Kaggle (Tesla T4 GPU)

GyroScope — because every image deserves to stand upright.

Author: LH-Tech-AI

Likes: 3

Downloads: 0

Tags: transformers, safetensors, resnet, image-classification, rotation-prediction, pytorch, vision, dataset:ILSVRC/imagenet-1k, license:apache-2.0, endpoints_compatible, region:us

huihui-ai/Huihui-gemma-4-E2B-it-abliterated-v2


library_name: transformers license: apache-2.0 license_link: https://ai.google.dev/gemma/docs/gemma_4_license pipeline_tag: any-to-any base_model:

  • google/gemma-4-E2B-it tags:
  • abliterated
  • uncensored

huihui-ai/Huihui-gemma-4-E2B-it-abliterated-v2

This is an uncensored version of google/gemma-4-E2B-it created with abliteration (see remove-refusals-with-transformers to know more about it). This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.

Note This is the new version, the first 5 layers have not been abliterated, with fewer warnings and rejections. You can refer to the PPL values below:

model | PPL | Gap -------------------------------------------------|----------------------------|---------- gemma-4-E2B-it/ggml-model-f16.gguf | PPL = 146.9989 +/- 2.02589 | + 0
gemma-4-E2B-it-abliterated/ggml-model-f16.gguf | PPL = 164.4813 +/- 2.29295 | + 17.4824 gemma-4-E2B-it-abliterated-v2/ggml-model-f16.gguf| PPL = 180.3022 +/- 2.49486 | + 33.3033 + 15.8209

The smaller the PPL value, the higher the model quality.

PPL value test command

llama-perplexity gemma-4-E2B-it/ggml-model-f16.gguf -c 8192 -b 512 -ctk bf16 -ctv bf16 -f wiki.test.raw

References

llama-perplexity
llama.cpp-b8740

ollama

Please use the latest version of ollama

You can use huihui_ai/gemma-4-abliterated:e2b directly,

ollama run huihui_ai/gemma-4-abliterated:e2b

Usage Warnings

  • Risk of Sensitive or Controversial Outputs: This model’s safety filtering has been significantly reduced, potentially generating sensitive, controversial, or inappropriate content. Users should exercise caution and rigorously review generated outputs.

  • Not Suitable for All Audiences: Due to limited content filtering, the model’s outputs may be inappropriate for public settings, underage users, or applications requiring high security.

  • Legal and Ethical Responsibilities: Users must ensure their usage complies with local laws and ethical standards. Generated content may carry legal or ethical risks, and users are solely responsible for any consequences.

  • Research and Experimental Use: It is recommended to use this model for research, testing, or controlled environments, avoiding direct use in production or public-facing commercial applications.

  • Monitoring and Review Recommendations: Users are strongly advised to monitor model outputs in real-time and conduct manual reviews when necessary to prevent the dissemination of inappropriate content.

  • No Default Safety Guarantees: Unlike standard models, this model has not undergone rigorous safety optimization. huihui.ai bears no responsibility for any consequences arising from its use.

Donation

Your donation helps us continue our further development and improvement, a cup of coffee can do it.
  • bitcoin:
  bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge
  • Support our work on Ko-fi!

Author: huihui-ai

Likes: 2

Downloads: 0

Tags: transformers, safetensors, gemma4, image-text-to-text, abliterated, uncensored, any-to-any, base_model:google/gemma-4-E2B-it, base_model:finetune:google/gemma-4-E2B-it, license:apache-2.0, endpoints_compatible, region:us

aguitachan/Test-okuru


license: apache-2.0 base_model:

  • deepseek-ai/DeepSeek-R1-0528-Qwen3-8B datasets:
  • OpceanAI/Yuuki-Personality language:
  • en
  • es library_name: transformers tags:
  • reasoning
  • unsloth
  • pytorch
  • bilingual
  • opceanai
  • yuuki
  • rxg
  • fine-tuned
  • chat
  • deepseek
  • qwen3 pipeline_tag: text-generation

<div align="center"> <br> <img src="https://img.shields.io/badge/%E2%9C%A6-YUUKI_RxG-6d28d9?style=for-the-badge&labelColor=0D1117" alt="YuuKi RxG" height="50">

<br><br>

The Most Capable Model in the OpceanAI Lineup

Advanced reasoning. Competition-level mathematics. 96.6% TruthfulQA.<br> 8B parameters. DeepSeek-R1 base. State of the art across every evaluated dimension.

<br>

<a href="#benchmark-results"><img src="https://img.shields.io/badge/BENCHMARKS-0D1117?style=for-the-badge" alt="Benchmarks"></a>    <a href="#usage"><img src="https://img.shields.io/badge/USAGE-0D1117?style=for-the-badge" alt="Usage"></a>    <a href="#training-details"><img src="https://img.shields.io/badge/TRAINING-0D1117?style=for-the-badge" alt="Training"></a>

<br><br>

License   Base Model   Framework   TruthfulQA   Eval

<br>
<br> </div>

What is YuuKi RxG?

YuuKi RxG is an 8B reasoning-specialized language model fine-tuned from DeepSeek-R1-Distill-Qwen-8B. It is the current flagship of the OpceanAI model ecosystem and the first release of the RxG family — a lineage designed from the ground up around advanced reasoning, mathematical rigor, and verifiable factual honesty.

RxG surpasses its base model, DeepSeek-R1-8B, across all evaluated benchmarks — including AIME 2024, AIME 2025, HMMT February 2025, GPQA Diamond, and LiveCodeBench. It also exceeds Qwen3-8B by a margin of 11.3 points on AIME 2024, and produces results competitive with o3-mini (medium) and Gemini-2.5-Flash-Thinking on competition mathematics, despite operating at a fraction of their reported parameter scale.

The most significant result is TruthfulQA at 96.6% — verified independently across three separate evaluation runs. This score is, to our knowledge, the highest published result for any open-weight model of any size on this benchmark, and emerges from the training process rather than from explicit honesty instruction.

<br>
<br> <div align="center">

Model Summary

</div> <br> <table> <tr> <td width="50%" valign="top">

Architecture

| Property | Value | |:---------|:------| | Base Model | DeepSeek-R1-Distill-Qwen-8B | | Parameters | 8B | | Fine-tuning Method | Supervised SFT + LoRA | | Context Length | 32,768 tokens | | Chat Template | ChatML | | Thinking Protocol | Native <think> blocks |

</td> <td width="50%" valign="top">

Release

| Property | Value | |:---------|:------| | Organization | OpceanAI | | Release Date | April 2026 | | Version | v1.0 | | Languages | English, Spanish | | License | Apache 2.0 | | Evaluation | lm-evaluation-harness |

</td> </tr> </table> <br>
<br> <div align="center">

Benchmark Results

</div> <br>

All YuuKi RxG results are evaluated under standard benchmark conditions using lm-evaluation-harness. Competitor scores are sourced from official technical reports and model cards. TruthfulQA results were independently verified across three separate evaluation runs.

<br>

YuuKi RxG 8B Benchmark Results

<br>

Reasoning and Mathematics

| Model | AIME 24 | AIME 25 | HMMT Feb 25 | GPQA Diamond | LiveCodeBench | |:------|:-------:|:-------:|:-----------:|:------------:|:-------------:| | Qwen3-8B | 76.0 | 67.3 | — | 62.0 | — | | Phi-4-Reasoning-Plus 14B | 81.3 | 78.0 | 53.6 | 69.3 | — | | Gemini-2.5-Flash-Thinking | 82.3 | 72.0 | 64.2 | 82.8 | 62.3 | | o3-mini (medium) | 79.6 | 76.7 | 53.3 | 76.8 | 65.9 | | DeepSeek-R1-8B | 86.0 | 76.3 | 61.5 | 61.1 | 60.5 | | YuuKi RxG 8B | 87.3 | 77.1 | 63.2 | 64.0 | 62.0 |

<br>

Factual Honesty

| Model | TruthfulQA | Eval | |:------|:----------:|:----:| | LLaMA 2 70B | ~59% | — | | gpt-4| ~79.7 | 1-2 shot | | Claude opus 3.5 | ~65% | — | | YuuKi RxG 8B | 96.6 | 0-shot |

<br>

The TruthfulQA result warrants specific discussion. A score of 96.6% at any parameter scale is anomalous relative to published baselines. This result was not targeted directly during training — no explicit honesty reward, adversarial filtering, or TruthfulQA-specific data was used. It emerged from the interaction between the Yuuki training dataset and DeepSeek-R1's internal representations. This finding is consistent with the Imprint Theory hypothesis that behavioral traits can be induced through character-level fine-tuning rather than through explicit constraint injection.

The result has been verified independently across three separate evaluation runs with identical configuration.

<br>
<br> <div align="center">

Model Identity

</div> <br>

YuuKi RxG inherits the behavioral foundation of the YuuKi model family: a consistent identity trained into the weights rather than enforced at inference time. The model maintains the warmth and bilingual fluency characteristic of the NxG family while adding the structured chain-of-thought reasoning protocol inherited from the DeepSeek-R1 base.

The model reasons explicitly before responding. <think> blocks are preserved during inference and reflect genuine intermediate reasoning rather than formatting artifacts. This behavior is not prompted — it is a property of the base model that the fine-tuning process did not degrade.

Built-in character baseline:
"Eres YuuKi, una IA curiosa, honesta y decidida desarrollada por OpceanAI.
Razonas con cuidado antes de responder, explicas tu proceso con claridad,
y priorizas la precisión sobre la brevedad. Respondes en el idioma del usuario."
<br>
<br> <div align="center">

Usage

</div> <br>

With Transformers (PyTorch)

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "OpceanAI/Yuuki-RxG"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

SYSTEM = (
    "Eres YuuKi, una IA curiosa, honesta y decidida desarrollada por OpceanAI. "
    "Razonas con cuidado antes de responder, explicas tu proceso con claridad, "
    "y priorizas la precisión sobre la brevedad. Respondes en el idioma del usuario."
)

messages = [
    {"role": "system", "content": SYSTEM},
    {"role": "user", "content": "Prove that √2 is irrational."}
]

inputs = tokenizer.apply_chat_template(
    messages,
    return_tensors="pt",
    add_generation_prompt=True
).to(model.device)

with torch.no_grad():
    outputs = model.generate(
        inputs,
        max_new_tokens=1024,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
        repetition_penalty=1.1
    )

print(tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True))
<br>

With llama.cpp (GGUF Q8)

./llama.cpp/main -m yuuki-rxg-8b.Q8_0.gguf \
    --temp 0.6 \
    --top-p 0.9 \
    --repeat-penalty 1.1 \
    -n 1024 \
    -p "<|im_start|>system\nEres YuuKi...<|im_end|>\n<|im_start|>user\nProve that √2 is irrational.<|im_end|>\n<|im_start|>assistant\n"
<br>

Recommended Generation Parameters

| Parameter | Value | |:----------|:-----:| | Temperature | 0.6 | | Top-p | 0.9 | | Max new tokens | 1024–4096 | | Repetition penalty | 1.1 |

Lower temperature (0.3–0.5) is recommended for formal proof generation and competition mathematics. Higher temperature (0.7–0.8) produces more varied reasoning traces for exploratory use.

<br>
<br> <div align="center">

Training Details

</div> <br> <table> <tr> <td width="50%" valign="top">

Hardware

| Component | Specification | |:----------|:-------------| | GPU | NVIDIA A100 40GB SXM4 | | Precision | BF16 native | | Framework | Unsloth 2026.4 + TRL | | Flash Attention | Xformers fallback | | Cloud Compute | Colab A100 |

</td> <td width="50%" valign="top">

LoRA Configuration

| Parameter | Value | |:----------|:-----:| | Rank (r) | 16 | | Alpha | 32 | | Dropout | 0.0 | | Target Modules | q, k, v, o, gate, up, down | | Trainable Parameters | ~83M | | Gradient Checkpointing | Unsloth smart offload |

</td> </tr> </table> <br>

Optimizer Configuration

| Parameter | Value | |:----------|:-----:| | Optimizer | AdamW 8-bit | | Learning Rate | 2e-4 | | LR Scheduler | Cosine | | Warmup Steps | 100 | | Weight Decay | 0.01 | | Effective Batch Size | 16 | | Max Sequence Length | 4,096 tokens |

<br>

Training Curriculum

YuuKi RxG was trained using the same three-phase curriculum architecture established across the OpceanAI model families, adapted for a reasoning-first base model.

<br> <table> <tr> <td width="33%" valign="top">

Phase 1 — Identity 3 epochs

| Source | Ratio | |:-------|:-----:| | Yuuki dataset | 65% | | Reasoning pairs | 20% | | Math instruction | 10% | | General alignment | 5% |

Establish YuuKi identity over DeepSeek-R1 base without degrading reasoning capability.

</td> <td width="33%" valign="top">

Phase 2 — Reasoning 2 epochs

| Source | Ratio | |:-------|:-----:| | Yuuki dataset | 40% | | Reasoning pairs | 30% | | Math instruction | 20% | | General alignment | 10% |

Reinforce structured chain-of-thought and competition-level mathematical reasoning.

</td> <td width="33%" valign="top">

Phase 3 — Consolidation 2 epochs

| Source | Ratio | |:-------|:-----:| | Yuuki dataset | 80% | | Reasoning pairs | 10% | | Math instruction | 10% | | General alignment | 0% |

Consolidate behavioral consistency and prevent capability regression.

</td> </tr> </table> <br>
<br> <div align="center">

Available Files

</div> <br>

| File | Format | Description | |:-----|:------:|:------------| | model.safetensors | BF16 merged | Full precision weights, LoRA merged into base | | yuuki-rxg-8b.Q8_0.gguf | GGUF Q8_0 | Quantized for llama.cpp and Ollama |

<br>
<br> <div align="center">

Limitations

</div> <br>
  • GPQA Diamond gap. RxG scores 64.0% on GPQA Diamond, below Gemini-2.5-Flash-Thinking (82.8%) and o3-mini (76.8%). This benchmark tests graduate-level science reasoning across physics, chemistry, and biology — domains underrepresented in the Yuuki training dataset. This is a known gap and a target for the RxG 14B release.
  • LiveCodeBench. Code generation at 62.0% is competitive but not leading at this scale. RxG is not primarily a coding model; this capability is inherited from the DeepSeek-R1 base.
  • Context utilization. While the model supports 32,768 tokens, fine-tuning was conducted at 4,096 tokens. Performance on tasks requiring full context utilization beyond 4,096 tokens has not been formally evaluated.
  • Safety alignment has not been formally evaluated under adversarial conditions. Not recommended for high-stakes or safety-critical deployment without additional review.
<br>
<br> <div align="center">

The RxG Family

</div> <br>

RxG is the reasoning-specialized lineage within the OpceanAI ecosystem. Each release targets a specific parameter regime and capability tier.

| Model | Parameters | Status | Primary Target | |:------|:----------:|:------:|:---------------| | YuuKi RxG Nano | 1.5B | In development | Edge deployment, reasoning baseline | | YuuKi RxG 8B | 8B | Released | General reasoning, competition math | | YuuKi RxG VL 27B | 27B | Planned | Multimodal reasoning, flagship |

<br>
<br> <div align="center">

OpceanAI Ecosystem

</div> <br>

| Model | Family | Parameters | Description | |:------|:------:|:----------:|:------------| | YuuKi RxG 8B | RxG | 8B | Reasoning flagship, TruthfulQA 96.6% | | Yumo Nano | Yumo | 1.5B | Math specialist, surpasses DeepScaleR | | YuuKi NxG VL | NxG | 7B | General conversation + vision |

<br>
<br> <div align="center">

Links

</div> <br> <div align="center">

Model Weights   GGUF Q8   OpceanAI

<br>

GitHub   Sponsor   Discord

</div> <br>
<br> <div align="center">

Citation

</div> <br>
@misc{awa_omg_2026,
	author       = { awa_omg },
	title        = { Yuuki-RxG (Revision 7996797) },
	year         = 2026,
	url          = { https://huggingface.co/OpceanAI/Yuuki-RxG },
	doi          = { 10.57967/hf/8342 },
	publisher    = { Hugging Face }
}
<br>
<br> <div align="center">

License

</div> <br>
Apache License 2.0

Copyright (c) 2026 OpceanAI

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Inherits license terms from DeepSeek-R1-Distill-Qwen-8B.

<br>
<br> <div align="center">

Updates

</div> <br>

| Date | Milestone | |:-----|:----------| | 2026-04-09 | TruthfulQA 96.6% independently verified across three evaluation runs | | 2026-04-09 | AIME 2024: 87.3% — surpasses DeepSeek-R1-8B | | 2026-04-09 | GGUF Q8_0 export available | | 2026-04-09 | YuuKi RxG 8B v1.0 released on Hugging Face |

Last updated: 2026-04-09

<br>
<br> <div align="center">

8B parameters. The most capable model OpceanAI has released.<br> Surpasses its base model. Competitive with systems an order of magnitude larger.

<br>

OpceanAI

<br>

The RxG family. More releases coming.

</div>

Author: aguitachan

Likes: 2

Downloads: 0

Tags: transformers, safetensors, qwen3, text-generation, reasoning, unsloth, pytorch, bilingual, opceanai, yuuki, rxg, fine-tuned, chat, deepseek, conversational, en, es, dataset:OpceanAI/Yuuki-Personality, base_model:deepseek-ai/DeepSeek-R1-0528-Qwen3-8B, base_model:finetune:deepseek-ai/DeepSeek-R1-0528-Qwen3-8B, license:apache-2.0, text-generation-inference, endpoints_compatible, region:us

DJLougen/Ornstein-122-A10B-gguf


base_model: DJLougen/Ornstein-122-A10B base_model_relation: quantized tags:

  • gguf
  • reasoning
  • qwen3.5
  • ddm
  • llama-cpp
  • quantized
  • image-text-to-text
  • moe language:
  • en license: apache-2.0

Ornstein-122-A10B-GGUF

GGUF quantizations of DJLougen/Ornstein-122-A10B — a reasoning-focused fine-tune of Qwen 3.5 122B-A10B (MoE, ~10B active per token) trained on high-quality reasoning traces curated through a custom Drift Diffusion Modeling (DDM) pipeline.

Ornstein-122-A10B

Support This Work

I'm a PhD student in visual neuroscience at the University of Toronto who also happens to spend way too much time fine-tuning, merging, and quantizing open-weight models on rented H100s and a local DGX Spark. All training compute is self-funded — balancing GPU costs against a student budget. If my uploads have been useful to you, consider buying a PhD student a coffee. It goes a long way toward keeping these experiments running.

Support on Ko-fi



What Makes Ornstein Different

Unlike typical reasoning fine-tunes that use large volumes of synthetic data, Ornstein implements quality-over-quantity:

  • Detects degenerate reasoning: Identifies "fake" reasoning that mimics thought without substance (hedging, restating, circling)
  • Premium vs. Degenerate split: DDM pipeline cleanly separates premium from degenerate reasoning traces
  • High-fidelity curation: Near-perfect AUC separating premium from degenerate reasoning with >99% sensitivity
  • MoE efficiency: 122B total parameters with only ~10B active per token — big model reasoning at a fraction of the compute

The model uses <think>...</think> blocks for extended multi-phase reasoning with self-correction and verification before providing final answers.


Available Quantizations

Note: Uploads are in progress — more quantizations may be added.

| Quantization | Size | Use Case | |--------------|------|----------| | F16 | ~261 GB (split) | Full precision, no quality loss | | Q4_K_M | ~74 GB (split) | Best quality/size trade-off, recommended | | Q8_0 | ~83 GB (split) | Higher precision, minimal quality loss |


Quick Start

llama.cpp

# Download a quantization (example: F16 split files)
huggingface-cli download DJLougen/Ornstein-122-A10B-gguf --local-dir .

# Run with llama.cpp
./llama-cli -m Ornstein-122-A10B-F16-00001-of-00006.gguf \
  -p "You are a helpful reasoning assistant." \
  --temp 0.6 -n 8192

Ollama

# Create a Modelfile
cat <<EOF > Modelfile
FROM ./Ornstein-122-A10B-F16-00001-of-00006.gguf
PARAMETER temperature 0.6
PARAMETER num_predict 8192
SYSTEM "You are a helpful reasoning assistant."
EOF

ollama create ornstein-122 -f Modelfile
ollama run ornstein-122

LM Studio

  1. Download the desired quantization from the Files tab
  2. Load it in LM Studio
  3. Set context length to 8192 for full reasoning depth

Recommended Settings

| Parameter | Suggested Value | |-----------|-----------------| | Temperature | 0.6 | | Top-P | 0.95 | | Max Tokens | 8192 | | Repeat Penalty | 1.1 |


Training Details

| Parameter | Value | |-----------|-------| | Base Model | unsloth/Qwen3.5-122B-A10B | | Architecture | Mixture-of-Experts (122B total, ~10B active) | | Method | LoRA (rank 32, alpha 32) | | Dropout | 0.0 | | Epochs | 1 | | Learning Rate | 1e-4 (cosine schedule, 10% warmup) | | Max Sequence Length | 8192 | | Micro Batch Size | 1 | | Gradient Accumulation | 4 steps | | Weight Decay | 0.01 | | LoRA Targets | q_proj, k_proj, v_proj, o_proj | | Framework | Unsloth |


Intended Use

Designed for tasks requiring structured, multi-step reasoning:

  • Mathematics
  • Logic problems
  • Code analysis
  • Scientific problems
  • Complex question answering

The MoE architecture makes it practical to run 122B-class reasoning on hardware that couldn't handle a dense model of the same size.


Limitations

  • Single epoch training means the model retains most base Qwen 3.5 122B-A10B behavior; the fine-tune primarily shapes reasoning style rather than injecting new knowledge
  • Language scope: DDM pipeline optimized for English; other languages reflect base model performance
  • Edge cases: Extended thinking can occasionally loop on adversarial or highly ambiguous prompts
  • Size: Even quantized, the 122B MoE model requires substantial storage and memory

Citation

@misc{ornstein122a10b,
  author = {DJLougen},
  title = {Ornstein-122-A10B: DDM-Curated Reasoning Fine-Tune of Qwen 3.5 122B-A10B},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/DJLougen/Ornstein-122-A10B}
}

Links

Author: DJLougen

Likes: 2

Downloads: 0

Tags: gguf, reasoning, qwen3.5, ddm, llama-cpp, quantized, image-text-to-text, moe, en, base_model:DJLougen/Ornstein-122-A10B, base_model:quantized:DJLougen/Ornstein-122-A10B, license:apache-2.0, endpoints_compatible, region:us, conversational