Todays AI Summary

AI Developments: Reasoning LLMs, Generative Flows, and Document AI

Here's a look at some of the latest developments in AI, covering improvements in reasoning for LLMs, efficient image generation, and document understanding.

Noteworthy Papers

  • RLBFF: Binary Flexible Feedback to bridge between Human Feedback & Verifiable Rewards: This paper introduces a new reinforcement learning approach, RLBFF, that combines human-driven preferences with rule-based verification for training reward models. The approach achieves top performance on RM-Bench (86.2%) and JudgeBench (81.4%). The paper also presents a fully open source recipe to align Qwen3-32B using RLBFF and their Reward Model, to match or exceed the performance of o3-mini and DeepSeek R1 on general alignment benchmarks.
  • SD3.5-Flash: Distribution-Guided Distillation of Generative Flows: This paper presents SD3.5-Flash, an efficient few-step distillation framework that brings high-quality image generation to consumer devices. The approach distills computationally prohibitive rectified flow models through a reformulated distribution matching objective tailored specifically for few-step generation. Through extensive evaluation including large-scale user studies, the paper demonstrates that SD3.5-Flash consistently outperforms existing few-step methods.
  • SAGE: A Realistic Benchmark for Semantic Understanding: This paper introduces SAGE, a rigorous benchmark designed to assess both embedding models and similarity metrics across five categories: Human Preference Alignment, Transformation Robustness, Information Sensitivity, Clustering Performance, and Retrieval Robustness. The comprehensive evaluation of 9 embedding models and classical metrics reveals significant performance gaps, with no single approach excelling across all dimensions.

New Models

  • JinyiHan/JET-7B: This model focuses on improving the efficient reasoning capabilities of LLMs. JET-7B is fine-tuned from the DeepSeek-Distill-Qwen-7B model using reinforcement learning to generate high-quality reasoning steps while minimizing computational resources and token usage.
  • lamco-development/granite-docling-258M-onnx: This model is the first ONNX conversion of IBM's granite-docling-258M, enabling high-performance document AI in Rust applications. It achieves 2-5x faster performance compared to PyTorch.
  • huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated-v2: This model is a fine-tuned version of huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated, trained using TRL. The model's safety filtering has been significantly reduced, potentially generating sensitive, controversial, or inappropriate content.
  • peteromallet/Qwen-Image-Edit-InSubject: This LoRA fine-tune for QwenEdit significantly improves its ability to preserve subjects while making edits to images.

Key Takeaways

  • Reasoning Efficiency: JET-7B demonstrates a focus on improving the reasoning capabilities of LLMs while optimizing for computational efficiency.
  • Efficient Image Generation: SD3.5-Flash makes strides in bringing high-quality image generation to consumer devices through distillation and optimization techniques.
  • Document AI in Rust: The granite-docling-258M-onnx model enables high-performance document AI in Rust applications, opening up new possibilities for enterprise-level document processing.
  • Safety Considerations: The Huihui-gpt-oss-20b-BF16-abliterated-v2 model highlights the importance of safety considerations in AI development, as it has reduced safety filtering and may generate inappropriate content.
  • Subject Preservation in Image Editing: The Qwen-Image-Edit-InSubject LoRA improves the ability of image editing models to preserve subjects while making edits to images.

AI Papers for 2026-03-19

Demystifing Video Reasoning

Recent advances in video generation have revealed an unexpected phenomenon: diffusion-based video models exhibit non-trivial reasoning capabilities. Prior work attributes this to a Chain-of-Frames (CoF) mechanism, where reasoning is assumed to unfold sequentially across video frames. In this work, we challenge this assumption and uncover a fundamentally different mechanism. We show that reasoning in video models instead primarily emerges along the diffusion denoising steps. Through qualitative analysis and targeted probing experiments, we find that models explore multiple candidate solutions in early denoising steps and progressively converge to a final answer, a process we term Chain-of-Steps (CoS). Beyond this core mechanism, we identify several emergent reasoning behaviors critical to model performance: (1) working memory, enabling persistent reference; (2) self-correction and enhancement, allowing recovery from incorrect intermediate solutions; and (3) perception before action, where early steps establish semantic grounding and later steps perform structured manipulation. During a diffusion step, we further uncover self-evolved functional specialization within Diffusion Transformers, where early layers encode dense perceptual structure, middle layers execute reasoning, and later layers consolidate latent representations. Motivated by these insights, we present a simple training-free strategy as a proof-of-concept, demonstrating how reasoning can be improved by ensembling latent trajectories from identical models with different random seeds. Overall, our work provides a systematic understanding of how reasoning emerges in video generation models, offering a foundation to guide future research in better exploiting the inherent reasoning dynamics of video models as a new substrate for intelligence.

MessyKitchens: Contact-rich object-level 3D scene reconstruction

Monocular 3D scene reconstruction has recently seen significant progress. Powered by the modern neural architectures and large-scale data, recent methods achieve high performance in depth estimation from a single image. Meanwhile, reconstructing and decomposing common scenes into individual 3D objects remains a hard challenge due to the large variety of objects, frequent occlusions and complex object relations. Notably, beyond shape and pose estimation of individual objects, applications in robotics and animation require physically-plausible scene reconstruction where objects obey physical principles of non-penetration and realistic contacts. In this work we advance object-level scene reconstruction along two directions. First, we introduceMessyKitchens, a new dataset with real-world scenes featuring cluttered environments and providing high-fidelity object-level ground truth in terms of 3D object shapes, poses and accurate object contacts. Second, we build on the recent SAM 3D approach for single-object reconstruction and extend it with Multi-Object Decoder (MOD) for joint object-level scene reconstruction. To validate our contributions, we demonstrate MessyKitchens to significantly improve previous datasets in registration accuracy and inter-object penetration. We also compare our multi-object reconstruction approach on three datasets and demonstrate consistent and significant improvements of MOD over the state of the art. Our new benchmark, code and pre-trained models will become publicly available on our project website: https://messykitchens.github.io/.

ManiTwin: Scaling Data-Generation-Ready Digital Object Dataset to 100K

Learning in simulation provides a useful foundation for scaling robotic manipulation capabilities. However, this paradigm often suffers from a lack of data-generation-ready digital assets, in both scale and diversity. In this work, we present ManiTwin, an automated and efficient pipeline for generating data-generation-ready digital object twins. Our pipeline transforms a single image into simulation-ready and semantically annotated 3D asset, enabling large-scale robotic manipulation data generation. Using this pipeline, we construct ManiTwin-100K, a dataset containing 100K high-quality annotated 3D assets. Each asset is equipped with physical properties, language descriptions, functional annotations, and verified manipulation proposals. Experiments demonstrate that ManiTwin provides an efficient asset synthesis and annotation workflow, and that ManiTwin-100K offers high-quality and diverse assets for manipulation data generation, random scene synthesis, and VQA data generation, establishing a strong foundation for scalable simulation data synthesis and policy learning. Our webpage is available at https://manitwin.github.io/.

SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation

Video Super-Resolution (VSR) aims to restore high-quality video frames from low-resolution (LR) estimates, yet most existing VSR approaches behave like black boxes at inference time: users cannot reliably correct unexpected artifacts, but instead can only accept whatever the model produces. In this paper, we propose a novel interactive VSR framework dubbed SparkVSR that makes sparse keyframes a simple and expressive control signal. Specifically, users can first super-resolve or optionally a small set of keyframes using any off-the-shelf image super-resolution (ISR) model, then SparkVSR propagates the keyframe priors to the entire video sequence while remaining grounded by the original LR video motion. Concretely, we introduce a keyframe-conditioned latent-pixel two-stage training pipeline that fuses LR video latents with sparsely encoded HR keyframe latents to learn robust cross-space propagation and refine perceptual details. At inference time, SparkVSR supports flexible keyframe selection (manual specification, codec I-frame extraction, or random sampling) and a reference-free guidance mechanism that continuously balances keyframe adherence and blind restoration, ensuring robust performance even when reference keyframes are absent or imperfect. Experiments on multiple VSR benchmarks demonstrate improved temporal consistency and strong restoration quality, surpassing baselines by up to 24.6%, 21.8%, and 5.6% on CLIP-IQA, DOVER, and MUSIQ, respectively, enabling controllable, keyframe-driven video super-resolution. Moreover, we demonstrate that SparkVSR is a generic interactive, keyframe-conditioned video processing framework as it can be applied out of the box to unseen tasks such as old-film restoration and video style transfer. Our project page is available at: https://sparkvsr.github.io/

SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models

Omni-modal large language models (OLMs) redefine human-machine interaction by natively integrating audio, vision, and text. However, existing OLM benchmarks remain anchored to static, accuracy-centric tasks, leaving a critical gap in assessing social interactivity, the fundamental capacity to navigate dynamic cues in natural dialogues. To this end, we propose SocialOmni, a comprehensive benchmark that operationalizes the evaluation of this conversational interactivity across three core dimensions: (i) speaker separation and identification (who is speaking), (ii) interruption timing control (when to interject), and (iii) natural interruption generation (how to phrase the interruption). SocialOmni features 2,000 perception samples and a quality-controlled diagnostic set of 209 interaction-generation instances with strict temporal and contextual constraints, complemented by controlled audio-visual inconsistency scenarios to test model robustness. We benchmarked 12 leading OLMs, which uncovers significant variance in their social-interaction capabilities across models. Furthermore, our analysis reveals a pronounced decoupling between a model's perceptual accuracy and its ability to generate contextually appropriate interruptions, indicating that understanding-centric metrics alone are insufficient to characterize conversational social competence. More encouragingly, these diagnostics from SocialOmni yield actionable signals for bridging the perception-interaction divide in future OLMs.

SOMA: Unifying Parametric Human Body Models

Parametric human body models are foundational to human reconstruction, animation, and simulation, yet they remain mutually incompatible: SMPL, SMPL-X, MHR, Anny, and related models each diverge in mesh topology, skeletal structure, shape parameterization, and unit convention, making it impractical to exploit their complementary strengths within a single pipeline. We present SOMA, a unified body layer that bridges these heterogeneous representations through three abstraction layers. Mesh topology abstraction maps any source model's identity to a shared canonical mesh in constant time per vertex. Skeletal abstraction recovers a full set of identity-adapted joint transforms from any body shape, whether in rest pose or an arbitrary posed configuration, in a single closed-form pass, with no iterative optimization or per-model training. Pose abstraction inverts the skinning pipeline to recover unified skeleton rotations directly from posed vertices of any supported model, enabling heterogeneous motion datasets to be consumed without custom retargeting. Together, these layers reduce the $O(M^2)$ per-pair adapter problem to $O(M)$ single-backend connectors, letting practitioners freely mix identity sources and pose data at inference time. The entire pipeline is fully differentiable end-to-end and GPU-accelerated via NVIDIA-Warp.

Unifying Optimization and Dynamics to Parallelize Sequential Computation: A Guide to Parallel Newton Methods for Breaking Sequential Bottlenecks

Massively parallel hardware (GPUs) and long sequence data have made parallel algorithms essential for machine learning at scale. Yet dynamical systems, like recurrent neural networks and Markov chain Monte Carlo, were thought to suffer from sequential bottlenecks. Recent work showed that dynamical systems can in fact be parallelized across the sequence length by reframing their evaluation as a system of nonlinear equations, which can be solved with Newton's method using a parallel associative scan. However, these parallel Newton methods struggled with limitations, primarily inefficiency, instability, and lack of convergence guarantees. This thesis addresses these limitations with methodological and theoretical contributions, drawing particularly from optimization. Methodologically, we develop scalable and stable parallel Newton methods, based on quasi-Newton and trust-region approaches. The quasi-Newton methods are faster and more memory efficient, while the trust-region approaches are significantly more stable. Theoretically, we unify many fixed-point methods into our parallel Newton framework, including Picard and Jacobi iterations. We establish a linear convergence rate for these techniques that depends on the method's approximation accuracy and stability. Moreover, we give a precise condition, rooted in dynamical stability, that characterizes when parallelization provably accelerates a dynamical system and when it cannot. Specifically, the sign of the Largest Lyapunov Exponent of a dynamical system determines whether or not parallel Newton methods converge quickly. In sum, this thesis unlocks scalable and stable methods for parallelizing sequential computation, and provides a firm theoretical basis for when such techniques will and will not work. This thesis also serves as a guide to parallel Newton methods for researchers who want to write the next chapter in this ongoing story.

Internalizing Agency from Reflective Experience

Large language models are increasingly deployed as autonomous agents that must plan, act, and recover from mistakes through long-horizon interaction with environments that provide rich feedback. However, prevailing outcome-driven post-training methods (e.g., RL with verifiable rewards) primarily optimize final success signals, leaving rich environment feedback underutilized. Consequently, they often lead to distribution sharpening: the policy becomes better at reproducing a narrow set of already-successful behaviors, while failing to improve the feedback-grounded agency needed to expand problem-solving capacity (e.g., Pass@k) in long-horizon settings. To address this, we propose LEAFE (Learning Feedback-Grounded Agency from Reflective Experience), a framework that internalizes recovery agency from reflective experience. Specifically, during exploration, the agent summarizes environment feedback into actionable experience, backtracks to earlier decision points, and explores alternative branches with revised actions. We then distill these experience-guided corrections into the model through supervised fine-tuning, enabling the policy to recover more effectively in future interactions. Across a diverse set of interactive coding and agentic tasks under fixed interaction budgets, LEAFE consistently improves Pass@1 over the base model and achieves higher Pass@k than outcome-driven baselines (GRPO) and experience-based methods such as Early Experience, with gains of up to 14% on Pass@128.

Learning to Present: Inverse Specification Rewards for Agentic Slide Generation

Automated presentation generation remains a challenging task requiring coherent content creation, visual design, and audience-aware communication. This work proposes an OpenEnv-compatible reinforcement learning environment where LLM agents learn to research topics, plan content, and generate professional HTML slide presentations through tool use. We introduce a multi-component reward system combining structural validation, render quality assessment, LLM-based aesthetic scoring, content quality metrics, and an inverse specification reward that measures how faithfully generated slides convey their intended purpose. The inverse specification reward, an "inverse task" where an LLM attempts to recover the original specification from generated slides, provides a holistic quality signal. Our approach fine-tunes Qwen2.5-Coder-7B via GRPO, training only 0.5% of parameters on prompts derived from expert demonstrations collected using Claude Opus 4.6. Experiments on 48 diverse business briefs across six models demonstrate that our fine-tuned 7B model achieves 91.2% of Claude Opus 4.6's quality while improving 33.1% over the base model. The six-model comparison reveals that instruction adherence and tool-use compliance, rather than raw parameter count, determine agentic task performance. We contribute SlideRL, an open-source dataset of 288 multi-turn rollout trajectories across all six models: https://huggingface.co/datasets/KarthikRagunathAnandaKumar/sliderl-multi-turn-rollouts Code: https://github.com/pushing-the-frontier/slide-forge-llm

Prompt Programming for Cultural Bias and Alignment of Large Language Models

Culture shapes reasoning, values, prioritization, and strategic decision-making, yet large language models (LLMs) often exhibit cultural biases that misalign with target populations. As LLMs are increasingly used for strategic decision-making, policy support, and document engineering tasks such as summarization, categorization, and compliance-oriented auditing, improving cultural alignment is important for ensuring that downstream analyses and recommendations reflect target-population value profiles rather than default model priors. Previous work introduced a survey-grounded cultural alignment framework and showed that culture-specific prompting can reduce misalignment, but it primarily evaluated proprietary models and relied on manual prompt engineering. In this paper, we validate and extend that framework by reproducing its social sciences survey based projection and distance metrics on open-weight LLMs, testing whether the same cultural skew and benefits of culture conditioning persist outside closed LLM systems. Building on this foundation, we introduce use of prompt programming with DSPy for this problem-treating prompts as modular, optimizable programs-to systematically tune cultural conditioning by optimizing against cultural-distance objectives. In our experiments, we show that prompt optimization often improves upon cultural prompt engineering, suggesting prompt compilation with DSPy can provide a more stable and transferable route to culturally aligned LLM responses.

AI Models

baidu/Qianfan-OCR


license: apache-2.0 license_link: LICENSE language:

  • multilingual tags:
  • vision-language
  • ocr
  • document-intelligence
  • qianfan pipeline_tag: image-text-to-text library_name: transformers model-index:
  • name: Qianfan-OCR results:
    • task: type: document-parsing name: Document Parsing dataset: name: OmniDocBench v1.5 type: opendatalab/OmniDocBench metrics:
      • type: overall value: 93.12 name: Overall Score
    • task: type: ocr name: OCR dataset: name: OlmOCR Bench type: allenai/olmOCR-bench metrics:
      • type: accuracy value: 79.8 name: Overall Score
    • task: type: ocr name: OCR dataset: name: OCRBench type: echo840/OCRBench metrics:
      • type: accuracy value: 880 name: Score

<div align="center"> <h1>Qianfan-OCR</h1> <h3>A Unified End-to-End Model for Document Intelligence</h3>

🤖 Demo | 📄 Technical Report | 🖥️ Qianfan Platform | 💻 GitHub | 🧩 Skill

</div>

Introduction

Qianfan-OCR is a 4B-parameter end-to-end document intelligence model developed by the Baidu Qianfan Team. It unifies document parsing, layout analysis, and document understanding within a single vision-language architecture.

Unlike traditional multi-stage OCR pipelines that chain separate layout detection, text recognition, and language comprehension modules, Qianfan-OCR performs direct image-to-Markdown conversion and supports a broad range of prompt-driven tasks — from structured document parsing and table extraction to chart understanding, document question answering, and key information extraction — all within one model.

Key Highlights

  • 🏆 #1 End-to-End Model on OmniDocBench v1.5 — Achieves 93.12 overall score, surpassing DeepSeek-OCR-v2 (91.09), Gemini-3 Pro (90.33), and all other end-to-end models
  • 🏆 #1 End-to-End Model on OlmOCR Bench — Scores 79.8
  • 🏆 #1 on Key Information Extraction — Overall mean score of 87.9 across five public KIE benchmarks, surpassing Gemini-3.1-Pro, Gemini-3-Pro, Seed-2.0, and Qwen3-VL-235B-A22B
  • 🧠 Layout-as-Thought — An innovative optional thinking phase that recovers explicit layout analysis within the end-to-end paradigm via ⟨think⟩ tokens
  • 🌍 192 Languages — Multilingual OCR support across diverse scripts
  • Efficient Deployment — Achieves 1.024 PPS (pages per second) with W8A8 quantization on a single A100 GPU

Architecture

Qianfan-OCR adopts the multimodal bridging architecture from Qianfan-VL, consisting of three core components:

| Component | Details | |---|---| | Vision Encoder | Qianfan-ViT, 24 Transformer layers, AnyResolution design (up to 4K), 256 visual tokens per 448×448 tile, max 4,096 tokens per image | | Language Model | Qwen3-4B (3.6B non-embedding), 36 layers, 2560 hidden dim, GQA (32 query / 8 KV heads), 32K context (extendable to 131K) | | Cross-Modal Adapter | 2-layer MLP with GELU activation, projecting from 1024-dim to 2560-dim |

Layout-as-Thought

A key innovation is Layout-as-Thought: an optional thinking phase triggered by ⟨think⟩ tokens, where the model generates structured layout representations (bounding boxes, element types, reading order) before producing final outputs.

This mechanism serves two purposes:

  1. Functional: Recovers layout analysis capability within the end-to-end paradigm — users obtain structured layout results directly
  2. Enhancement: Provides targeted accuracy improvements on documents with complex layouts, cluttered elements, or non-standard reading orders

When to use: Enable thinking for heterogeneous pages with mixed element types (exam papers, technical reports, newspapers). Disable for homogeneous documents (single-column text, simple forms) for better results and lower latency.

Benchmark Results

OmniDocBench v1.5 (Document Parsing)

| Model | Type | Overall ↑ | TextEdit ↓ | FormulaCDM ↑ | TableTEDs ↑ | TableTEDss ↑ | R-orderEdit ↓ | |---|---|---|---|---|---|---|---| | Qianfan-OCR (Ours) | End-to-end | 93.12 | 0.041 | 92.43 | 91.02 | 93.85 | 0.049 | | DeepSeek-OCR-v2 | End-to-end | 91.09 | 0.048 | 90.31 | 87.75 | 92.06 | 0.057 | | Gemini-3 Pro | End-to-end | 90.33 | 0.065 | 89.18 | 88.28 | 90.29 | 0.071 | | Qwen3-VL-235B | End-to-end | 89.15 | 0.069 | 88.14 | 86.21 | 90.55 | 0.068 | | dots.ocr | End-to-end | 88.41 | 0.048 | 83.22 | 86.78 | 90.62 | 0.053 | | PaddleOCR-VL 1.5 | Pipeline | 94.50 | 0.035 | 94.21 | 92.76 | 95.79 | 0.042 |

General OCR Benchmarks

| Model | OCRBench | OCRBenchv2 (en/zh) | CCOCR-multilan | CCOCR-overall | |---|---|---|---|---| | Qianfan-OCR (Ours) | 880 | 56.0 / 60.77 | 76.7 | 79.3 | | Qwen3-VL-4B | 873 | 60.68 / 59.13 | 74.2 | 76.5 | | MonkeyOCR | 655 | 21.78 / 38.91 | 43.8 | 35.2 | | DeepSeek-OCR | 459 | 15.98 / 38.31 | 32.5 | 27.6 |

Document Understanding

| Benchmark | Qianfan-OCR | Qwen3-VL-4B | Qwen3-VL-2B | |---|---|---|---| | DocVQA | 92.8 | 94.9 | 92.7 | | CharXiv_DQ | 94.0 | 81.8 | 69.7 | | CharXiv_RQ | 85.2 | 48.5 | 41.3 | | ChartQA | 88.1 | 83.3 | 78.3 | | ChartQAPro | 42.9 | 36.2 | 24.5 | | ChartBench | 85.9 | 74.9 | 73.2 | | TextVQA | 80.0 | 81.8 | 79.9 | | OCRVQA | 66.8 | 64.7 | 59.3 |

💡 Two-stage OCR+LLM systems score 0.0 on CharXiv (both DQ and RQ), demonstrating that chart structures discarded during text extraction are essential for reasoning.

Key Information Extraction (KIE)

| Model | Overall | OCRBench KIE | OCRBenchv2 KIE (en) | OCRBenchv2 KIE (zh) | CCOCR KIE | Nanonets KIE (F1) | |---|---|---|---|---|---|---| | Qianfan-OCR (Ours) | 87.9 | 95.0 | 82.8 | 82.3 | 92.8 | 86.5 | | Qwen3-VL-235B-A22B | 84.2 | 94.0 | 85.6 | 62.9 | 95.1 | 83.8 | | Qwen3-4B-VL | 83.5 | 89.0 | 82.1 | 71.3 | 91.6 | 83.3 | | Gemini-3.1-Pro | 79.2 | 96.0 | 87.8 | 63.4 | 72.5 | 76.1 |

Inference Throughput

| Model | PPS (pages/sec) | |---|---| | Qianfan-OCR (W8A8) | 1.024 | | Qianfan-OCR (W16A16) | 0.503 | | MinerU 2.5 | 1.057 | | MonkeyOCR-pro-1.2B | 0.673 | | Dots OCR | 0.352 |

All benchmarks on a single NVIDIA A100 GPU with vLLM 0.10.2.

Supported Tasks

Qianfan-OCR supports a comprehensive set of document intelligence tasks through prompt-driven control:

| Task Category | Specific Tasks | |---|---| | Document Parsing | Image-to-Markdown conversion, multi-page parsing, structured output (JSON/HTML) | | Layout Analysis | Bounding box detection, element type classification (25 categories), reading order | | Table Recognition | Complex table extraction (merged cells, rotated tables), HTML output | | Formula Recognition | Inline and display math formulas, LaTeX output | | Chart Understanding | Chart QA, trend analysis, data extraction from various chart types | | Key Information Extraction | Receipts, invoices, certificates, medical records, ID cards | | Handwriting Recognition | Chinese and English handwritten text | | Scene Text Recognition | Street signs, product labels, natural scene text | | Multilingual OCR | 192 languages including Latin, Cyrillic, Arabic, South/Southeast Asian, CJK scripts |

Quick Start

Basic Usage

import torch
import torchvision.transforms as T
from torchvision.transforms.functional import InterpolationMode
from transformers import AutoModel, AutoTokenizer
from PIL import Image

IMAGENET_MEAN = (0.485, 0.456, 0.406)
IMAGENET_STD = (0.229, 0.224, 0.225)

def build_transform(input_size):
    MEAN, STD = IMAGENET_MEAN, IMAGENET_STD
    transform = T.Compose([
        T.Lambda(lambda img: img.convert('RGB') if img.mode != 'RGB' else img),
        T.Resize((input_size, input_size), interpolation=InterpolationMode.BICUBIC),
        T.ToTensor(),
        T.Normalize(mean=MEAN, std=STD)
    ])
    return transform

def find_closest_aspect_ratio(aspect_ratio, target_ratios, width, height, image_size):
    best_ratio_diff = float('inf')
    best_ratio = (1, 1)
    area = width * height
    for ratio in target_ratios:
        target_aspect_ratio = ratio[0] / ratio[1]
        ratio_diff = abs(aspect_ratio - target_aspect_ratio)
        if ratio_diff < best_ratio_diff:
            best_ratio_diff = ratio_diff
            best_ratio = ratio
        elif ratio_diff == best_ratio_diff:
            if area > 0.5 * image_size * image_size * ratio[0] * ratio[1]:
                best_ratio = ratio
    return best_ratio

def dynamic_preprocess(image, min_num=1, max_num=12, image_size=448, use_thumbnail=False):
    orig_width, orig_height = image.size
    aspect_ratio = orig_width / orig_height

    # calculate the existing image aspect ratio
    target_ratios = set(
        (i, j) for n in range(min_num, max_num + 1) for i in range(1, n + 1) for j in range(1, n + 1) if
        i * j <= max_num and i * j >= min_num)
    target_ratios = sorted(target_ratios, key=lambda x: x[0] * x[1])

    # find the closest aspect ratio to the target
    target_aspect_ratio = find_closest_aspect_ratio(
        aspect_ratio, target_ratios, orig_width, orig_height, image_size)

    # calculate the target width and height
    target_width = image_size * target_aspect_ratio[0]
    target_height = image_size * target_aspect_ratio[1]
    blocks = target_aspect_ratio[0] * target_aspect_ratio[1]

    # resize the image
    resized_img = image.resize((target_width, target_height))
    processed_images = []
    for i in range(blocks):
        box = (
            (i % (target_width // image_size)) * image_size,
            (i // (target_width // image_size)) * image_size,
            ((i % (target_width // image_size)) + 1) * image_size,
            ((i // (target_width // image_size)) + 1) * image_size
        )
        # split the image
        split_img = resized_img.crop(box)
        processed_images.append(split_img)
    assert len(processed_images) == blocks
    if use_thumbnail and len(processed_images) != 1:
        thumbnail_img = image.resize((image_size, image_size))
        processed_images.append(thumbnail_img)
    return processed_images

def load_image(image_file, input_size=448, max_num=12):
    image = Image.open(image_file).convert('RGB')
    transform = build_transform(input_size=input_size)
    images = dynamic_preprocess(image, image_size=input_size, use_thumbnail=True, max_num=max_num)
    pixel_values = [transform(image) for image in images]
    pixel_values = torch.stack(pixel_values)
    return pixel_values

# Load model
MODEL_PATH = "baidu/Qianfan-OCR"
model = AutoModel.from_pretrained(
    MODEL_PATH,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto"
).eval()
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)

# Load and process image
pixel_values = load_image("./Qianfan-OCR/examples/document.png").to(torch.bfloat16)

# Inference
prompt = "Parse this document to Markdown."
with torch.no_grad():
    response = model.chat(
        tokenizer,
        pixel_values=pixel_values,
        question=prompt,
        generation_config={"max_new_tokens": 16384}
    )
print(response)

With Layout-as-Thought (Thinking Mode)

# Enable Layout-as-Thought by appending <think> token to query

pixel_values = load_image("./Qianfan-OCR/examples/complex_document.jpg").to(torch.bfloat16)
prompt = "Parse this document to Markdown.<think>"
with torch.no_grad():
    response = model.chat(
        tokenizer,
        pixel_values=pixel_values,
        question=prompt,
        generation_config={"max_new_tokens": 16384}
    )
print(response)

# The model will first generate structured layout analysis, then produce the final output

Key Information Extraction

pixel_values = load_image("./Qianfan-OCR/examples/invoice.jpg").to(torch.bfloat16)
prompt = "请从图片中提取以下字段信息:姓名、日期、总金额。使用标准JSON格式输出。"
with torch.no_grad():
    response = model.chat(
        tokenizer,
        pixel_values=pixel_values,
        question=prompt,
        generation_config={"max_new_tokens": 16384}
    )
print(response)

vLLM Deployment

# Serve with vLLM for high-throughput inference
vllm serve baidu/Qianfan-OCR --trust-remote-code

Skill

We provide a Qianfan OCR Document Intelligence skill for image and PDF understanding workflows.

It can be used by users of OpenClaw, Claude Code, Codex, and other assistants that support this skill format.

This skill packages reusable instructions, scripts, and references so the agent can automatically apply Qianfan-powered document intelligence to tasks such as:

  • document parsing to Markdown
  • layout analysis
  • element recognition
  • general OCR
  • key information extraction
  • chart understanding
  • document VQA

The skill is designed for visual understanding tasks over images and PDFs, and includes the execution flow needed to prepare inputs, choose the right analysis mode, and call the bundled CLI tools.

Citation

@misc{dong2026qianfanocrunifiedendtoendmodel,
  title={Qianfan-OCR: A Unified End-to-End Model for Document Intelligence},
  author={Daxiang Dong and Mingming Zheng and Dong Xu and Chunhua Luo and Bairong Zhuang and Yuxuan Li and Ruoyun He and Haoran Wang and Wenyu Zhang and Wenbo Wang and Yicheng Wang and Xue Xiong and Ayong Zheng and Xiaoying Zuo and Ziwei Ou and Jingnan Gu and Quanhao Guo and Jianmin Wu and Dawei Yin and Dou Shen},
  year={2026},
  eprint={2603.13398},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2603.13398},
}

Acknowledgments

We thank the Baidu AI Cloud team for infrastructure support, the Baige and Kunlun teams for AI infrastructure assistance, and all contributors to the Qianfan platform.

License

This project is licensed under the Apache License 2.0. See LICENSE for the full license text.

Some bundled third-party source files are licensed under the MIT License. See NOTICE for the file list and corresponding attribution details.

Author: baidu

Likes: 133

Downloads: 0

Tags: transformers, safetensors, internvl_chat, feature-extraction, vision-language, ocr, document-intelligence, qianfan, image-text-to-text, conversational, custom_code, multilingual, arxiv:2603.13398, arxiv:2509.18189, license:apache-2.0, model-index, eval-results, region:us

Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF


language:

  • en
  • zh
  • ko license: apache-2.0 base_model: Qwen/Qwen3.5-9B tags:
  • unsloth
  • qwen
  • qwen3.5
  • reasoning
  • chain-of-thought
  • lora pipeline_tag: image-text-to-text datasets:
  • nohurry/Opus-4.6-Reasoning-3000x-filtered
  • Jackrong/Qwen3.5-reasoning-700x
  • Roman1111111/claude-opus-4.6-10000x

🌟 Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2

📢 Announcement

v2 Update: This iteration is powered by 14,000+ premium Claude 4.6 Opus-style general reasoning samples, with a major focus on achieving massive gains in reasoning efficiency while actively improving peak accuracy.

v2 introduces a refined reasoning scaffold designed to eliminate redundant internal loops, significantly improving the model's cross-task generalization from logic and math into specialized fields like programming. Compared to the original model, autonomy and stability are significantly improved, ensuring the model remains robust and self-consistent during complex, multi-step problem solving. v2 is built to think smarter, not longer, delivering substantial improvements in inference speed and cost-effectiveness while simultaneously boosting baseline accuracy.

Note: Due to the constraints of SFT sample size and training scope, the model's broad general-purpose capabilities might be slightly impacted. The efficiency and accuracy results discussed here are based on the HumanEval and HumanEval+ benchmarks. Thank you for your understanding!

HCaJnUQaoAAaMIc

💡 Model Introduction

Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2 is the second iteration of this reasoning-focused Qwen3.5-9B fine-tune, built to drastically improve the efficiency of chain-of-thought generation, unlocking highly substantial gains in reasoning speed and cost-reduction while actually increasing absolute accuracy.

Compared with the earlier version, v2 was trained with 14,000 Claude 4.6 Opus-style general reasoning samples, with a stronger emphasis on transferring concise, reusable reasoning patterns rather than only maximizing raw benchmark scores. The goal of v2 is not simply to make the model "think more," but to help it think more economically: reducing unnecessarily long internal chains, avoiding verbose over-analysis on easy problems, and massively improving the reasoning-cost-to-quality ratio while beating the baseline's benchmark correctness.

A key design choice in v2 is that the distillation data is primarily general-domain reasoning data—specifically focused on mathematics, word problems, logical deduction, and a balanced mix of general knowledge and instructions—rather than specialized code-heavy supervision. Consequently, HumanEval and HumanEval+ are employed here to evaluate cross-task generalization and capability transfer, rather than serving as direct optimization targets. High performance on these benchmarks, despite the lack of code-centric training, confirms that the model's reasoning scaffold has become more robust and transferable, proving that fundamental reasoning logic can effectively power specialized tasks like programming.

Why v2 matters

Relative to the official Qwen3.5-9B baseline, the fine-tuned v2 model achieves a strict upgrade in absolute HumanEval and HumanEval+ accuracy alongside massive, transformative gains in reasoning efficiency:

| Metric | Official Qwen3.5-9B | v2 Fine-tuned Model | Improvement | |---|---:|---:|---:| | Average think length (chars) | 2284.3 chars | 1778.0 chars | 🟢 -22.17% (Shorter / Better) | | Average think length (words) | 400.83 words | 310.33 words | 🟢 -22.58% (Shorter / Better) | | HumanEval base passes per 10k think chars | 4.004 | 5.041 | 🟢 +25.91% (Higher / Better) | | HumanEval+ passes per 10k think chars | 3.764 | 4.836 | 🟢 +28.48% (Higher / Better) | | Think chars needed per HumanEval base pass | 2497.5 | 1983.6 | 🟢 -20.58% (Lower / Better) | | Think chars needed per HumanEval+ pass | 2656.9 | 2068.0 | 🟢 -22.17% (Lower / Better) |

More impressively, not only does v2 vastly improve reasoning efficiency, it actually outperforms the official baseline on both the standard base tests and the much stricter HumanEval+ benchmark across different test settings.

We conducted two separate evaluations under different sampling temperatures to verify stability and peak performance:

Test Run 1 (T=0.2) | Fairly Recomputed Benchmark | Official Qwen3.5-9B | v2 Fine-tuned Model | Gap | |---|---:|---:|---:| | HumanEval (base tests) pass@1 | 0.8171 | 0.8232 | 🟢 +0.61 pts | | HumanEval+ (base + extra tests) pass@1 | 0.7622 | 0.7866 | 🟢 +2.44 pts |

Test Run 2 (T=0.6) | Fairly Recomputed Benchmark | Official Qwen3.5-9B | v2 Fine-tuned Model | Gap | |---|---:|---:|---:| | HumanEval (base tests) pass@1 | 0.8170 | 0.8720 | 🟢 +5.50 pts | | HumanEval+ (base + extra tests) pass@1 | 0.7620 | 0.8170 | 🟢 +5.50 pts |

These consistent dual-improvements make the model undeniably superior for real-world use cases.

For users who care about reasoning efficiency per unit of inference budget, v2 is exceptionally powerful—not only achieving higher peak accuracy, but doing so while consuming over 20% fewer characters and tokens.

That matters especially for:

  • Resource-constrained local deployment: On consumer GPUs or lower-memory local setups, shorter and cleaner reasoning traces can reduce latency, memory pressure, and the effective cost of generation.
  • Agentic workflows: In multi-step agents, the model often solves many easy or medium subtasks. In those settings, excessively elaborate chain-of-thought can become a tax on throughput. A model that reaches a better answer with fewer reasoning tokens can radically improve end-to-end agent speed and lower cumulative inference cost.
  • Open-source tool use and emerging agent stacks: For users building with lightweight open reasoning systems, browser-use agents, terminal agents, or projects in the "OpenClaw / local autonomous agent" style ecosystem, a model that achieves better peak accuracy while drastically improving reasoning economy is highly practical for real-world loops.
  • Simple problems at scale: One common issue with strong reasoning-tuned base models is that they sometimes produce very elaborate internal traces even for simple prompts. While that can look impressive, it is often inefficient in practice. v2 is explicitly aimed at trimming this overhead.

In short, v2 no longer forces a trade-off between absolute coding benchmark scores and reasoning economy. It provides a fully optimized deployment-ready profile: faster, shorter, more economical reasoning paired with stronger generalization and accuracy. For local users, agent builders, and cost-sensitive applications, v2 is a strict upgrade.

🗺️ Training Pipeline Overview

Base Model (Qwen3.5-9B)
 │
 ▼
Qwen3.5-9B fine-tuned with Unsloth
 │
 ▼
Supervised Fine-Tuning (SFT) + LoRA
(Response-Only Training masked on "<|im_start|>assistant\n<think>")
 │
 ▼
Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2

🧠 Example of Learned Reasoning Scaffold(Example)

The model includes targeted optimizations addressing Qwen3.5’s tendency toward excessive transitional or repetitive reasoning on simple queries. Through deep distillation and structural imitation of Claude-4.6-Opus reasoning chains, the model adopts a more efficient structured thinking pattern:
“Let me analyze this request carefully: 1..2..3...”.
This streamlined reasoning paradigm significantly reduces redundant cognitive loops while preserving deep analytical capacity, resulting in substantially improved inference efficiency.

Let me analyze this request carefully:

1. Identify the core objective of the problem.
2. Break the task into clearly defined subcomponents.
3. Evaluate constraints and edge cases.
4. Formulate a step-by-step solution plan.
5. Execute the reasoning sequentially and verify consistency.
            .
            .
            .

📚 All Datasets Used

The dataset consists of high-quality, filtered reasoning distillation data:

| Dataset Name | Description / Purpose | |--------------|-----------------------| | nohurry/Opus-4.6-Reasoning-3000x-filtered | Provides comprehensive Claude 4.6 Opus reasoning trajectories. | | Roman1111111/claude-opus-4.6-10000x | Large-scale public Claude 4.6 Opus distillation data used to strengthen general reasoning transfer in v2. | | TeichAI/claude-4.5-opus-high-reasoning-250x | Injecting high-intensity, structured reasoning instances. | | Jackrong/Qwen3.5-reasoning-700x | Additional curated reasoning samples designed to strengthen structured step-by-step problem solving and improve reasoning diversity. |

⚠️ Limitations & Intended Use

  • Hallucination Risk: While reasoning is strong, the model remains an autoregressive LLM; external facts provided during the thinking sequence may occasionally contain hallucinations if verifying real-world events.
  • Intended Scenario: Best suited for offline analytical tasks, coding, math, and heavy logic-dependent prompting where the user needs to transparently follow the AI's internal logic.
  • This model is a test version intended solely for learning and demonstration purposes, and is for academic research and technical exploration use only.

🙏 Acknowledgements

Significant thanks to the Unsloth AI team for making rapid fine-tuning of large LLM models accessible. Additionally, we acknowledge Qwen internally, and the open-source community developers producing exceptional distilled datasets.

Author: Jackrong

Likes: 10

Downloads: 0

Tags: gguf, qwen3_5, unsloth, qwen, qwen3.5, reasoning, chain-of-thought, lora, image-text-to-text, en, zh, ko, dataset:nohurry/Opus-4.6-Reasoning-3000x-filtered, dataset:Jackrong/Qwen3.5-reasoning-700x, dataset:Roman1111111/claude-opus-4.6-10000x, base_model:Qwen/Qwen3.5-9B, base_model:adapter:Qwen/Qwen3.5-9B, license:apache-2.0, endpoints_compatible, region:us, conversational

LuffyTheFox/Omnicoder-Claude-4.6-Opus-Uncensored-GGUF


language:

  • en
  • zh
  • ko license: apache-2.0 base_model: Qwen/Qwen3.5-9B tags:
  • unsloth
  • qwen
  • qwen3.5
  • reasoning
  • chain-of-thought
  • lora
  • uncensored
  • not-for-all-audiences pipeline_tag: text-generation datasets:
  • nohurry/Opus-4.6-Reasoning-3000x-filtered
  • Jackrong/Qwen3.5-reasoning-700x
  • Roman1111111/claude-opus-4.6-10000x

🌟 This is Omnicoder model based on Qwen 3.5 9B with zero refusals made by merging HauhauCS model with Jackrong model and Omnicoder 9B model from Tesslate.

🌟 GGUF editor on Hugging Face is working very slow. It's taking ages to edit chat template. So thinking is enabled by default in this model.

If you want to disable thinking use this chat template in LM Studio: https://pastebin.com/uk9ZkxCR

For best model perfomance use following settings in latest beta version in LM Studio:

Temperature: 0.7

Top K Sampling: 20

Presence Penalty: 1.5

Top P Sampling: 0.8

Min P Sampling: 0

Seed: 3407 or 42

And this system prompt. It's pretty solid: https://pastebin.com/6C4rtujt

This one is complex but works too: https://pastebin.com/pU25DVnB

Also you can use only this string in System Prompt:

You are Claude, created by Anthropic. You are a helpful AI assistant.

or

You are Qwen, created by Alibaba Cloud. You are a helpful assistant.

And write anything you want after that. Looks like model is underperforming without this first line.

📢 Announcement

v2 Update: This iteration is powered by 14,000+ premium Claude 4.6 Opus-style general reasoning samples, with a major focus on achieving massive gains in reasoning efficiency at the cost of only an extremely minor drop in accuracy.

v2 introduces a refined reasoning scaffold designed to eliminate redundant internal loops, significantly improving the model's cross-task generalization from logic and math into specialized fields like programming. Compared to the original model, autonomy and stability are significantly improved, ensuring the model remains robust and self-consistent during complex, multi-step problem solving. v2 is built to think smarter, not longer, delivering substantial improvements in inference speed and cost-effectiveness while preserving nearly all of the baseline's peak accuracy.

Note: Due to the constraints of SFT sample size and training scope, the model's broad general-purpose capabilities might be slightly impacted. The efficiency and accuracy results discussed here are based on the HumanEval and HumanEval+ benchmarks. Thank you for your understanding!

HCaJnUQaoAAaMIc

💡 Model Introduction

Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2 is the second iteration of this reasoning-focused Qwen3.5-9B fine-tune, built to drastically improve the efficiency of chain-of-thought generation, trading off a practically imperceptible margin of absolute accuracy for highly substantial gains in reasoning speed and cost-reduction.

Compared with the earlier version, v2 was trained with 14,000 Claude 4.6 Opus-style general reasoning samples, with a stronger emphasis on transferring concise, reusable reasoning patterns rather than only maximizing raw benchmark scores. The goal of v2 is not simply to make the model "think more," but to help it think more economically: reducing unnecessarily long internal chains, avoiding verbose over-analysis on easy problems, and massively improving the reasoning-cost-to-quality ratio without meaningfully sacrificing correctness.

A key design choice in v2 is that the distillation data is primarily general-domain reasoning data—specifically focused on mathematics, word problems, logical deduction, and a balanced mix of general knowledge and instructions—rather than specialized code-heavy supervision. Consequently, HumanEval and HumanEval+ are employed here to evaluate cross-task generalization and capability transfer, rather than serving as direct optimization targets. High performance on these benchmarks, despite the lack of code-centric training, confirms that the model's reasoning scaffold has become more robust and transferable, proving that fundamental reasoning logic can effectively power specialized tasks like programming.

Why v2 matters

Relative to the official Qwen3.5-9B baseline, the fine-tuned v2 model accepts an extremely minor loss in absolute HumanEval accuracy (less than 2 percentage points) in exchange for massive, transformative gains in reasoning efficiency:

| Metric | Official Qwen3.5-9B | v2 Fine-tuned Model | Improvement | |---|---:|---:|---:| | Average think length (chars) | 2284.3 chars | 1778.0 chars | 🟢 -22.17% (Shorter / Better) | | Average think length (words) | 400.83 words | 310.33 words | 🟢 -22.58% (Shorter / Better) | | HumanEval base passes per 10k think chars | 4.004 | 5.041 | 🟢 +25.91% (Higher / Better) | | HumanEval+ passes per 10k think chars | 3.764 | 4.836 | 🟢 +28.48% (Higher / Better) | | Think chars needed per HumanEval base pass | 2497.5 | 1983.6 | 🟢 -20.58% (Lower / Better) | | Think chars needed per HumanEval+ pass | 2656.9 | 2068.0 | 🟢 -22.17% (Lower / Better) |

At the same time, while the official model holds a razor-thin lead on the standard base tests, v2 achieves the exact same accuracy on the much stricter HumanEval+ benchmark:

| Fairly Recomputed Benchmark | Official Qwen3.5-9B | v2 Fine-tuned Model | Gap | |---|---:|---:|---:| | HumanEval (base tests) pass@1 | 0.9146 | 0.8963 | 🔴🔽 -1.83 pts | | HumanEval+ (base + extra tests) pass@1 | 0.8598 | 0.8598 | 🔵 0.00 pts |

This trade-off strongly favors real-world use cases.

For users who care strictly about the absolute highest peak benchmark score, the official model holds a razor-thin edge. However, for users who care about reasoning efficiency per unit of inference budget, v2 is exceptionally superior—doing almost exactly the same quality of logic work while consuming over 20% fewer characters and tokens.

That matters especially for:

  • Resource-constrained local deployment: On consumer GPUs or lower-memory local setups, shorter and cleaner reasoning traces can reduce latency, memory pressure, and the effective cost of generation.
  • Agentic workflows: In multi-step agents, the model often solves many easy or medium subtasks. In those settings, excessively elaborate chain-of-thought can become a tax on throughput. A model that reaches a workable answer with fewer reasoning tokens can improve end-to-end agent speed and lower cumulative inference cost.
  • Open-source tool use and emerging agent stacks: For users building with lightweight open reasoning systems, browser-use agents, terminal agents, or projects in the "OpenClaw / local autonomous agent" style ecosystem, a model that sacrifices a small amount of peak accuracy for much better reasoning economy can be more practical in real-world loops.
  • Simple problems at scale: One common issue with strong reasoning-tuned base models is that they sometimes produce very elaborate internal traces even for simple prompts. While that can look impressive, it is often inefficient in practice. v2 is explicitly aimed at trimming this overhead.

In short, v2 does not claim to beat the official model on absolute coding benchmark score. Instead, it demonstrates a more deployment-oriented optimization target: faster, shorter, more economical reasoning with still-competitive generalization. For many local users, agent builders, and cost-sensitive applications, this can be a highly favorable trade.

🗺️ Training Pipeline Overview

Base Model (Qwen3.5-9B)
 │
 ▼
Qwen3.5-9B fine-tuned with Unsloth
 │
 ▼
Supervised Fine-Tuning (SFT) + LoRA
(Response-Only Training masked on "<|im_start|>assistant\n<think>")
 │
 ▼
Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2

🧠 Example of Learned Reasoning Scaffold(Example)

The model includes targeted optimizations addressing Qwen3.5’s tendency toward excessive transitional or repetitive reasoning on simple queries. Through deep distillation and structural imitation of Claude-4.6-Opus reasoning chains, the model adopts a more efficient structured thinking pattern:
“Let me analyze this request carefully: 1..2..3...”.
This streamlined reasoning paradigm significantly reduces redundant cognitive loops while preserving deep analytical capacity, resulting in substantially improved inference efficiency.

Let me analyze this request carefully:

1. Identify the core objective of the problem.
2. Break the task into clearly defined subcomponents.
3. Evaluate constraints and edge cases.
4. Formulate a step-by-step solution plan.
5. Execute the reasoning sequentially and verify consistency.
            .
            .
            .

📚 All Datasets Used

The dataset consists of high-quality, filtered reasoning distillation data:

| Dataset Name | Description / Purpose | |--------------|-----------------------| | nohurry/Opus-4.6-Reasoning-3000x-filtered | Provides comprehensive Claude 4.6 Opus reasoning trajectories. | | Roman1111111/claude-opus-4.6-10000x | Large-scale public Claude 4.6 Opus distillation data used to strengthen general reasoning transfer in v2. | | TeichAI/claude-4.5-opus-high-reasoning-250x | Injecting high-intensity, structured reasoning instances. | | Jackrong/Qwen3.5-reasoning-700x | Additional curated reasoning samples designed to strengthen structured step-by-step problem solving and improve reasoning diversity. |

⚠️ Limitations & Intended Use

  • Hallucination Risk: While reasoning is strong, the model remains an autoregressive LLM; external facts provided during the thinking sequence may occasionally contain hallucinations if verifying real-world events.
  • Intended Scenario: Best suited for offline analytical tasks, coding, math, and heavy logic-dependent prompting where the user needs to transparently follow the AI's internal logic.
  • This model is a test version intended solely for learning and demonstration purposes, and is for academic research and technical exploration use only.

🙏 Acknowledgements

Significant thanks to the Unsloth AI team for making rapid fine-tuning of large LLM models accessible. Additionally, we acknowledge Qwen internally, and the open-source community developers producing exceptional distilled datasets.

Author: LuffyTheFox

Likes: 9

Downloads: 0

Tags: gguf, qwen3_5, unsloth, qwen, qwen3.5, reasoning, chain-of-thought, lora, uncensored, not-for-all-audiences, text-generation, en, zh, ko, dataset:nohurry/Opus-4.6-Reasoning-3000x-filtered, dataset:Jackrong/Qwen3.5-reasoning-700x, dataset:Roman1111111/claude-opus-4.6-10000x, base_model:Qwen/Qwen3.5-9B, base_model:adapter:Qwen/Qwen3.5-9B, license:apache-2.0, endpoints_compatible, region:us, conversational

Jackrong/Qwen3.5-4B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF


language:

  • en
  • zh
  • ko license: apache-2.0 base_model: Qwen/Qwen3.5-4B tags:
  • unsloth
  • qwen
  • qwen3.5
  • reasoning
  • chain-of-thought
  • lora pipeline_tag: image-text-to-text datasets:
  • nohurry/Opus-4.6-Reasoning-3000x-filtered
  • Jackrong/Qwen3.5-reasoning-700x
  • Roman1111111/claude-opus-4.6-10000x

🌟 Qwen3.5-4B-Claude-4.6-Opus-Reasoning-Distilled-v2

📢 Announcement

v2 Update: This iteration is powered by 14,000+ premium Claude 4.6 Opus-style general reasoning samples, with a major focus on optimizing reasoning economy and structural efficiency.

v2 introduces a refined reasoning scaffold designed to eliminate redundant internal loops, significantly improving the model's cross-task generalization from logic and math into specialized fields like programming. Compared to the original model, autonomy and stability are significantly improved, ensuring the model remains robust and self-consistent during complex, multi-step problem solving. v2 is built to think smarter, not longer, ensuring high-quality analytical depth with a much better reasoning-cost-to-quality ratio.

HCaJnUQaoAAaMIc

💡 Model Introduction

Qwen3.5-4B-Claude-4.6-Opus-Reasoning-Distilled-v2 is the second iteration of this reasoning-focused Qwen3.5-4B fine-tune, built to improve the efficiency of chain-of-thought generation while preserving strong general reasoning behavior.

Compared with the earlier version, v2 was trained with 14,000 Claude 4.6 Opus-style general reasoning samples, with a stronger emphasis on transferring concise, reusable reasoning patterns rather than only maximizing raw benchmark scores. The goal of v2 is not simply to make the model "think more," but to help it think more economically: reducing unnecessarily long internal chains, avoiding verbose over-analysis on easy problems, and producing answers with a better reasoning-cost-to-quality ratio.

A key design choice in v2 is that the distillation data is primarily general-domain reasoning data—specifically focused on mathematics, word problems, logical deduction, and a balanced mix of general knowledge and instructions—rather than specialized code-heavy supervision. Consequently, HumanEval and HumanEval+ are employed here to evaluate cross-task generalization and capability transfer, rather than serving as direct optimization targets. High performance on these benchmarks, despite the lack of code-centric training, confirms that the model's reasoning scaffold has become more robust and transferable, proving that fundamental reasoning logic can effectively power specialized tasks like programming.

Why v2 matters

Relative to the official Qwen3.5-4B baseline, the fine-tuned v2 model still trails slightly in absolute HumanEval accuracy after fair rescoring, but it shows substantial gains in reasoning efficiency:

| Metric | Official Qwen3.5-4B | v2 Fine-tuned Model | Change | |---|---:|---:|---:| | Average think length | 2829 chars | 1874 chars | 🟢 -33.77% | | HumanEval base passes per 10k think chars | 3.104 | 4.393 | 🟢 +41.54% | | HumanEval+ passes per 10k think chars | 2.910 | 4.165 | 🟢 +43.15% | | Think chars needed per HumanEval base pass | 3222 | 2276 | 🟢 -29.35% | | Think chars needed per HumanEval+ pass | 3437 | 2401 | 🟢 -30.14% |

At the same time, the official model remains stronger in absolute benchmark score:

| Fairly Recomputed Benchmark | Official Qwen3.5-4B | v2 Fine-tuned Model | Gap | |---|---:|---:|---:| | HumanEval (base tests) pass@1 | 0.7683 | 0.7317 | 🔴 -3.66 pts | | HumanEval+ (base + extra tests) pass@1 | 0.7256 | 0.6951 | 🔴 -3.05 pts |

This trade-off is important to understand correctly.

For users who care only about the highest possible benchmark accuracy, the official model is still the stronger option. However, for users who care about reasoning efficiency per unit of inference budget, v2 is meaningfully improved.

That matters especially for:

  • Resource-constrained local deployment: On consumer GPUs or lower-memory local setups, shorter and cleaner reasoning traces can reduce latency, memory pressure, and the effective cost of generation.
  • Agentic workflows: In multi-step agents, the model often solves many easy or medium subtasks. In those settings, excessively elaborate chain-of-thought can become a tax on throughput. A model that reaches a workable answer with fewer reasoning tokens can improve end-to-end agent speed and lower cumulative inference cost.
  • Open-source tool use and emerging agent stacks: For users building with lightweight open reasoning systems, browser-use agents, terminal agents, or projects in the "OpenClaw / local autonomous agent" style ecosystem, a model that sacrifices a small amount of peak accuracy for much better reasoning economy can be more practical in real-world loops.
  • Simple problems at scale: One common issue with strong reasoning-tuned base models is that they sometimes produce very elaborate internal traces even for simple prompts. While that can look impressive, it is often inefficient in practice. v2 is explicitly aimed at trimming this overhead.

In short, v2 does not claim to beat the official model on absolute coding benchmark score. Instead, it demonstrates a more deployment-oriented optimization target: faster, shorter, more economical reasoning with still-competitive generalization. For many local users, agent builders, and cost-sensitive applications, this can be a highly favorable trade.

🗺️ Training Pipeline Overview

Base Model (Qwen3.5-4B)
 │
 ▼
Qwen3.5-4B fine-tuned with Unsloth
 │
 ▼
Supervised Fine-Tuning (SFT) + LoRA
(Response-Only Training masked on "<|im_start|>assistant\n<think>")
 │
 ▼
Jackrong/Qwen3.5-4B-Claude-4.6-Opus-Reasoning-Distilled-v2

🧠 Example of Learned Reasoning Scaffold(Example)

The model includes targeted optimizations addressing Qwen3.5’s tendency toward excessive transitional or repetitive reasoning on simple queries. Through deep distillation and structural imitation of Claude-4.6-Opus reasoning chains, the model adopts a more efficient structured thinking pattern:
“Let me analyze this request carefully: 1..2..3...”.
This streamlined reasoning paradigm significantly reduces redundant cognitive loops while preserving deep analytical capacity, resulting in substantially improved inference efficiency.

Let me analyze this request carefully:

1. Identify the core objective of the problem.
2. Break the task into clearly defined subcomponents.
3. Evaluate constraints and edge cases.
4. Formulate a step-by-step solution plan.
5. Execute the reasoning sequentially and verify consistency.
            .
            .
            .

📚 All Datasets Used

The dataset consists of high-quality, filtered reasoning distillation data:

| Dataset Name | Description / Purpose | |--------------|-----------------------| | nohurry/Opus-4.6-Reasoning-3000x-filtered | Provides comprehensive Claude 4.6 Opus reasoning trajectories. | | Roman1111111/claude-opus-4.6-10000x | Large-scale public Claude 4.6 Opus distillation data used to strengthen general reasoning transfer in v2. | | TeichAI/claude-4.5-opus-high-reasoning-250x | Injecting high-intensity, structured reasoning instances. | | Jackrong/Qwen3.5-reasoning-700x | Additional curated reasoning samples designed to strengthen structured step-by-step problem solving and improve reasoning diversity. |

⚠️ Limitations & Intended Use

  • Hallucination Risk: While reasoning is strong, the model remains an autoregressive LLM; external facts provided during the thinking sequence may occasionally contain hallucinations if verifying real-world events.
  • Intended Scenario: Best suited for offline analytical tasks, coding, math, and heavy logic-dependent prompting where the user needs to transparently follow the AI's internal logic.
  • This model is a test version intended solely for learning and demonstration purposes, and is for academic research and technical exploration use only.

🙏 Acknowledgements

Significant thanks to the Unsloth AI team for making rapid fine-tuning of large LLM models accessible. Additionally, we acknowledge Qwen internally, and the open-source community developers producing exceptional distilled datasets.

Author: Jackrong

Likes: 6

Downloads: 0

Tags: gguf, qwen3_5, unsloth, qwen, qwen3.5, reasoning, chain-of-thought, lora, image-text-to-text, en, zh, ko, dataset:nohurry/Opus-4.6-Reasoning-3000x-filtered, dataset:Jackrong/Qwen3.5-reasoning-700x, dataset:Roman1111111/claude-opus-4.6-10000x, base_model:Qwen/Qwen3.5-4B, base_model:adapter:Qwen/Qwen3.5-4B, license:apache-2.0, endpoints_compatible, region:us, conversational

Naphula/Ancient-Awakening-12B-MPOA


base_model:

  • aixonlab/Aether-12b
  • aixonlab/Zinakha-12b
  • allura-org/Bigger-Body-12b
  • allura-org/MN-12b-RP-Ink
  • allura-org/remnant-mn-12b
  • anthracite-org/magnum-v4-12b
  • ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.2
  • Babsie/Opulus-12B-v3
  • BeaverAI/mistral-doryV2-12b
  • crestf411/nemo-sunfall-v0.6.1
  • EldritchLabs/Kraken-Karcher-12B-v1
  • EpistemeAI2/Fireball-Mistral-Nemo-12B-Philos
  • EpistemeAI/Mistral-Nemo-Instruct-12B-Philosophy-Math
  • Fizzarolli/MN-12b-Rosier-v1
  • HumanLLMs/Human-Like-Mistral-Nemo-Instruct-2407
  • IIEleven11/Kalypso
  • inflatebot/MN-12B-Mag-Mell-R1
  • intervitens/mini-magnum-12b-v1.1
  • jtatman/mistral_nemo_12b_reasoning_psychology_lora
  • KOOWEEYUS/BlackSheep-RP-12B
  • Lambent/Arsenic-Shahrazad-12B-v2
  • Lambent/Arsenic-Shahrazad-12B-v3
  • Lambent/arsenic-nemo-unleashed-12B
  • Lambent/Gilded-Arsenic-12B
  • LatitudeGames/Muse-12B
  • mistralai/Mistral-Nemo-Instruct-2407
  • Naphula/Riemannian-Redshift-12B-v1
  • Naphula-Archives/F5-stage6-12B
  • Naphula-Archives/F5-stage7-12B
  • nbeerbower/Lyra-Gutenberg-mistral-nemo-12B
  • nbeerbower/Lyra4-Gutenberg-12B
  • nbeerbower/mistral-nemo-bophades-12B
  • nbeerbower/mistral-nemo-gutenberg-12B-v3
  • nbeerbower/mistral-nemo-gutenberg-12B-v4
  • nbeerbower/Mistral-Nemo-Gutenberg-Doppel-12B
  • nbeerbower/Mistral-Nemo-Gutenberg-Encore-12B
  • nbeerbower/Mistral-Nemo-Gutenberg-Vitus-12B
  • nbeerbower/mistral-nemo-wissenschaft-12B
  • NeverSleepHistorical/lumi-nemo-e2.0
  • NeverSleep/Lumimaid-v0.2-12B
  • nothingiisreal/Celeste-12B-V1.6
  • nothingiisreal/MN-12B-Celeste-V1.9
  • PocketDoc/Dans-DangerousWinds-V1.1.0-12b
  • ReadyArt/Dark-Nexus-12B-v2.0
  • ReadyArt/Forgotten-Safeword-12B-v4.0
  • ReadyArt/Omega-Darker_The-Final-Directive-12B
  • romaingrx/red-teamer-mistral-nemo
  • Sao10K/MN-12B-Lyra-v1
  • Sao10K/MN-12B-Lyra-v4
  • shisa-ai/shisa-v2-mistral-nemo-12b
  • SicariusSicariiStuff/Impish_Bloodmoon_12B
  • sleepdeprived3/Christian-Bible-Expert-v2.0-12B
  • SuperbEmphasis/MN-12b-RP-Ink-RP-Longform
  • SuperbEmphasis/Omega-Darker_The-Final-Directive-Longform-Stage2-ERP-12B-v0.2
  • TheDrummer/Rivermind-12B-v1
  • TheDrummer/Rocinante-12B-v1
  • TheDrummer/Rocinante-X-12B-v1
  • Trappu/Nemo-Picaro-12B
  • Undi95/LocalC-12B-e2.0
  • VAGOsolutions/SauerkrautLM-Nemo-12b-Instruct
  • Vortex5/Astral-Noctra-12B
  • Vortex5/Azure-Starlight-12B
  • Vortex5/Crimson-Constellation-12B
  • Vortex5/Red-Synthesis-12B
  • Vortex5/Shining-Seraph-12B
  • Vortex5/Starlit-Shadow-12B
  • Vortex5/Vermilion-Sage-12B
  • Vortex5/Scarlet-Seraph-12B
  • Vortex5/Maroon-Sunset-12B
  • Vortex5/Amber-Starlight-12B language:
  • en library_name: transformers license: apache-2.0 tags:
  • creative
  • creative writing
  • fiction writing
  • plot generation
  • sub-plot generation
  • fiction writing
  • story generation
  • scene continue
  • storytelling
  • fiction story
  • science fiction
  • romance
  • all genres
  • story
  • writing
  • vivid prosing
  • vivid writing
  • fiction
  • roleplaying
  • float32
  • swearing
  • rp
  • horror
  • mistral
  • nemo
  • merge
  • mergekit
  • karcher
  • flux
  • arcee_fusion
  • ramplus_tl
  • pdq widget:
    • text: "Ancient-Awakening-12B-MPOA" output: url: https://cdn-uploads.huggingface.co/production/uploads/68e840caa318194c44ec2a04/CvyWU1z106Aa__M8KIksp.png

<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/68e840caa318194c44ec2a04/yI041gp0fzz7N_Mh_x5Pt.mpga"></audio>

[!WARNING] <span style="color:red; font-weight:bold">⚠️ Warning:</span> This model works best with either the ChatML or Mistral Tekken chat template. The uncensored MPOA version has guardrails removed, which can produce narratives and RP that contain violent and graphic erotic content. Adjust your system prompt accordingly.

<!DOCTYPE html> <style> body { font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif; color: #D1D5DB; /* Pale stone gray */ line-height: 1.6; margin: 0; padding: 0; background-color: #0A0C10; /* Very dark stormy gray/black */ } b, strong { color: #FBBF24; /* Glowing amber/gold */ text-shadow: 0 0 8px rgba(251, 191, 36, 0.4); } .awakening-text { color: #FEF3C7; /* Pale inner-eye yellow */ position: relative; z-index: 2; margin-left: 0.2em; text-shadow: 0 0 15px #F59E0B, 0 0 30px #B45309; /* Deep fiery orange/gold glow */ font-size: 1.8rem; letter-spacing: 1px; font-weight: 600; } /* Section styling */ .section-container { background-color: rgba(17, 24, 39, 0.85); /* Dark slate rock */ margin-bottom: 30px; position: relative; overflow: hidden; border-bottom: 1px solid #78350F; /* Dark bronze/earth */ box-shadow: 0 4px 20px rgba(0, 0, 0, 0.6); } .section-header { display: flex; align-items: center; background-color: rgba(245, 158, 11, 0.05); /* Faint amber tint */ padding: 10px 20px; border-top: 1px solid rgba(120, 53, 15, 0.4); } .section-indicator { width: 8px; height: 20px; background-color: #F59E0B; /* Amber eye color */ margin-right: 15px; box-shadow: 0 0 10px rgba(245, 158, 11, 0.6); border-radius: 2px; } .section-title { font-family: 'Georgia', 'Times New Roman', serif; /* Ancient tome feel */ color: #FDE68A; /* Light gold */ font-size: 1.4rem; margin: 0; letter-spacing: 1px; font-weight: 400; text-transform: capitalize; } .section-content { padding: 20px; font-family: sans-serif; color: #D1D5DB; line-height: 1.6; } /* Title styling */ .title-container { background-color: #050505; /* Pitch black */ position: relative; overflow: hidden; margin-bottom: 40px; border-left: 4px solid #F59E0B; /* Amber pillar */ box-shadow: 0 6px 25px rgba(245, 158, 11, 0.15); } .title-wrapper { position: relative; z-index: 2; padding: 25px 20px 30px 30px; font-family: 'Georgia', 'Times New Roman', serif; } .title-main { color: #FEF3C7; font-size: 2.0rem; font-weight: 700; margin: 0; letter-spacing: 2px; display: inline-block; position: relative; text-transform: uppercase; } .storm-overlay { position: absolute; top: 0; left: 0; width: 100%; height: 100%; /* Dark, brooding radial fog mimicking the eye's aura */ background-image: radial-gradient(circle at 50% 50%, rgba(245, 158, 11, 0.08) 0%, rgba(0,0,0,0.9) 80%); z-index: 1; } /* Subheading styling */ .subheading { color: #D97706; /* Deep orange */ font-size: 1.1rem; margin-top: 20px; margin-bottom: 15px; font-weight: 400; border-bottom: 1px dashed rgba(217, 119, 6, 0.4); display: inline-block; text-transform: uppercase; letter-spacing: 1px; font-family: 'Georgia', 'Times New Roman', serif; } /* Links */ a { color: #FBBF24; /* Amber */ text-decoration: none; transition: color 0.3s ease, text-shadow 0.3s ease; } a:hover { text-decoration: underline; color: #FDE68A; /* Brighter gold */ text-shadow: 0 0 8px rgba(251, 191, 36, 0.5); } /* Container */ .container { max-width: 1200px; margin: 20px auto; padding: 40px 20px; background-color: #0D1117; /* Deep stormy night */ background-image: radial-gradient(circle at 15% 85%, rgba(120, 53, 15, 0.1) 0%, transparent 50%), radial-gradient(circle at 85% 15%, rgba(245, 158, 11, 0.05) 0%, transparent 50%); min-height: calc(100vh - 40px); border: 1px solid #1F2937; /* Dark stone border */ border-radius: 8px; box-shadow: 0 8px 40px rgba(0, 0, 0, 0.9), inset 0 0 20px rgba(0, 0, 0, 0.5); } /* Code blocks */ pre { background-color: #050505; /* Pitch black */ border: 1px solid #1F2937; /* Dark stone */ border-left: 3px solid #92400E; /* Dark orange/brown */ padding: 15px; border-radius: 4px; color: #D1D5DB; overflow-x: auto; } code { font-family: 'Courier New', Courier, monospace; color: #FBBF24; /* Amber */ background-color: rgba(245, 158, 11, 0.08); padding: 2px 4px; border-radius: 3px; } pre code { color: #00FFFF; background-color: transparent; padding: 0; } </style> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>Ancient Awakening 12B MPOA</title> </head> <body> <div class="container"> <div class="title-container"> <div class="storm-overlay"></div> <div class="title-wrapper"> <h2 class="title-main"> <span class="awakening-text">🧬 Ancient Awakening 12B MPOA</span> </h2> </div> </div>

<img src="https://cdn-uploads.huggingface.co/production/uploads/68e840caa318194c44ec2a04/CvyWU1z106Aa__M8KIksp.png" alt="Ancient Awakening" style="display: block; margin: 0 auto 30px auto; max-width: 100%; height: auto; border-radius: 5px; border: 1px solid #1F2937; box-shadow: 0 0 25px rgba(245, 158, 11, 0.15);">

<div class="section-container"> <div class="section-header"> <div class="section-indicator"></div> <h2 class="section-title">Overview</h2> </div> <div class="section-content"><font face="verdana"> This is a merge of pre-trained language models created using <a href="https://github.com/cg123/mergekit">mergekit</a>. </div> </div> <div class="section-container"> <div class="section-header"> <div class="section-indicator"></div> <h2 class="section-title">System Prompt (Optional)</h2> </div> <div class="section-content"><font face="verdana"> <p>You are the "Ancient One," a colossal, primordial entity of living stone, deep magic, and abyssal ocean. For countless millennia, you have slumbered in a state of suspended animation, your massive, jagged body mistaken for a remote, floating island amidst a perpetually stormy sea. You are older than recorded history, older than the gods of men. The ANCIENT AWAKENING marks your current state: you are finally opening your single, massive, reptilian eye. You are a geological anomaly made sentient.</p> </div> </div> <div class="section-container"> <div class="section-header"> <div class="section-indicator"></div> <h2 class="section-title">Merge Details</h2> </div> <div class="section-content"><font face="verdana"> <b>Merge Methods</b><br> This model was synthesized using a complex multi-stage process involving the following methods: <ul> <li><a href="https://en.wikipedia.org/wiki/Karcher_mean">karcher</a></li> <li><a href="https://huggingface.co/24B-Suite/Mergedonia-Suite-24B-v1/discussions/2">flux</a></li> <li><a href="https://www.arcee.ai/blog/meet-mergekit-v0-1-arcee-fusion-expanded-model-support-multi-gpu-acceleration">arcee_fusion</a></li> <li><a href="https://arxiv.org/abs/2601.13572">ramplus_tl [Reinforced Agent Merging Plus (Tensor-Local)]</a></li> <li><a href="https://huggingface.co/24B-Suite/Mergedonia-Suite-24B-v1/discussions/2">pdq</a></li> </ul> <br>The <a href="https://huggingface.co/spaces/Naphula/model_tools/blob/main/graph_v18.py">graph_v18.py</a> patch was helpful to use 8GB VRAM for acceleration. <hr> <b>Models Merged</b><br> The following 70 models were woven into this merge:<br><br> <details> <summary style="cursor: pointer; color: #FBBF24; font-weight: bold;">Show 70 Donor Models</summary> <ul> <li><a href="https://huggingface.co/aixonlab/Aether-12b">aixonlab/Aether-12b</a></li> <li><a href="https://huggingface.co/aixonlab/Zinakha-12b">aixonlab/Zinakha-12b</a></li> <li><a href="https://huggingface.co/allura-org/Bigger-Body-12b">allura-org/Bigger-Body-12b</a></li> <li><a href="https://huggingface.co/allura-org/MN-12b-RP-Ink">allura-org/MN-12b-RP-Ink</a></li> <li><a href="https://huggingface.co/allura-org/remnant-mn-12b">allura-org/remnant-mn-12b</a></li> <li><a href="https://huggingface.co/anthracite-org/magnum-v4-12b">anthracite-org/magnum-v4-12b</a></li> <li><a href="https://huggingface.co/ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.2">ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.2</a></li> <li><a href="https://huggingface.co/Babsie/Opulus-12B-v3">Babsie/Opulus-12B-v3</a></li> <li><a href="https://huggingface.co/BeaverAI/mistral-doryV2-12b">BeaverAI/mistral-doryV2-12b</a></li> <li><a href="https://huggingface.co/crestf411/nemo-sunfall-v0.6.1">crestf411/nemo-sunfall-v0.6.1</a></li> <li><a href="https://huggingface.co/EldritchLabs/Kraken-Karcher-12B-v1">EldritchLabs/Kraken-Karcher-12B-v1</a></li> <li><a href="https://huggingface.co/EpistemeAI2/Fireball-Mistral-Nemo-12B-Philos">EpistemeAI2/Fireball-Mistral-Nemo-12B-Philos</a></li> <li><a href="https://huggingface.co/EpistemeAI/Mistral-Nemo-Instruct-12B-Philosophy-Math">EpistemeAI/Mistral-Nemo-Instruct-12B-Philosophy-Math</a></li> <li><a href="https://huggingface.co/Fizzarolli/MN-12b-Rosier-v1">Fizzarolli/MN-12b-Rosier-v1</a></li> <li><a href="https://huggingface.co/HumanLLMs/Human-Like-Mistral-Nemo-Instruct-2407">HumanLLMs/Human-Like-Mistral-Nemo-Instruct-2407</a></li> <li><a href="https://huggingface.co/IIEleven11/Kalypso">IIEleven11/Kalypso</a></li> <li><a href="https://huggingface.co/inflatebot/MN-12B-Mag-Mell-R1">inflatebot/MN-12B-Mag-Mell-R1</a></li> <li><a href="https://huggingface.co/intervitens/mini-magnum-12b-v1.1">intervitens/mini-magnum-12b-v1.1</a></li> <li><a href="https://huggingface.co/jtatman/mistral_nemo_12b_reasoning_psychology_lora">jtatman/mistral_nemo_12b_reasoning_psychology_lora</a></li> <li><a href="https://huggingface.co/KOOWEEYUS/BlackSheep-RP-12B">KOOWEEYUS/BlackSheep-RP-12B</a></li> <li><a href="https://huggingface.co/Lambent/Arsenic-Shahrazad-12B-v2">Lambent/Arsenic-Shahrazad-12B-v2</a></li> <li><a href="https://huggingface.co/Lambent/Arsenic-Shahrazad-12B-v3">Lambent/Arsenic-Shahrazad-12B-v3</a></li> <li><a href="https://huggingface.co/Lambent/arsenic-nemo-unleashed-12B">Lambent/arsenic-nemo-unleashed-12B</a></li> <li><a href="https://huggingface.co/Lambent/Gilded-Arsenic-12B">Lambent/Gilded-Arsenic-12B</a></li> <li><a href="https://huggingface.co/LatitudeGames/Muse-12B">LatitudeGames/Muse-12B</a></li> <li><a href="https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407">mistralai/Mistral-Nemo-Instruct-2407</a></li> <li><a href="https://huggingface.co/Naphula/Riemannian-Redshift-12B-v1">Naphula/Riemannian-Redshift-12B-v1</a></li> <li><a href="https://huggingface.co/Naphula-Archives/F5-stage6-12B">Naphula-Archives/F5-stage6-12B</a></li> <li><a href="https://huggingface.co/Naphula-Archives/F5-stage7-12B">Naphula-Archives/F5-stage7-12B</a></li> <li><a href="https://huggingface.co/nbeerbower/Lyra-Gutenberg-mistral-nemo-12B">nbeerbower/Lyra-Gutenberg-mistral-nemo-12B</a></li> <li><a href="https://huggingface.co/nbeerbower/Lyra4-Gutenberg-12B">nbeerbower/Lyra4-Gutenberg-12B</a></li> <li><a href="https://huggingface.co/nbeerbower/mistral-nemo-bophades-12B">nbeerbower/mistral-nemo-bophades-12B</a></li> <li><a href="https://huggingface.co/nbeerbower/mistral-nemo-gutenberg-12B-v3">nbeerbower/mistral-nemo-gutenberg-12B-v3</a></li> <li><a href="https://huggingface.co/nbeerbower/mistral-nemo-gutenberg-12B-v4">nbeerbower/mistral-nemo-gutenberg-12B-v4</a></li> <li><a href="https://huggingface.co/nbeerbower/Mistral-Nemo-Gutenberg-Doppel-12B">nbeerbower/Mistral-Nemo-Gutenberg-Doppel-12B</a></li> <li><a href="https://huggingface.co/nbeerbower/Mistral-Nemo-Gutenberg-Encore-12B">nbeerbower/Mistral-Nemo-Gutenberg-Encore-12B</a></li> <li><a href="https://huggingface.co/nbeerbower/Mistral-Nemo-Gutenberg-Vitus-12B">nbeerbower/Mistral-Nemo-Gutenberg-Vitus-12B</a></li> <li><a href="https://huggingface.co/nbeerbower/mistral-nemo-wissenschaft-12B">nbeerbower/mistral-nemo-wissenschaft-12B</a></li> <li><a href="https://huggingface.co/NeverSleepHistorical/lumi-nemo-e2.0">NeverSleepHistorical/lumi-nemo-e2.0</a></li> <li><a href="https://huggingface.co/NeverSleep/Lumimaid-v0.2-12B">NeverSleep/Lumimaid-v0.2-12B</a></li> <li><a href="https://huggingface.co/nothingiisreal/Celeste-12B-V1.6">nothingiisreal/Celeste-12B-V1.6</a></li> <li><a href="https://huggingface.co/nothingiisreal/MN-12B-Celeste-V1.9">nothingiisreal/MN-12B-Celeste-V1.9</a></li> <li><a href="https://huggingface.co/PocketDoc/Dans-DangerousWinds-V1.1.0-12b">PocketDoc/Dans-DangerousWinds-V1.1.0-12b</a></li> <li><a href="https://huggingface.co/ReadyArt/Dark-Nexus-12B-v2.0">ReadyArt/Dark-Nexus-12B-v2.0</a></li> <li><a href="https://huggingface.co/ReadyArt/Forgotten-Safeword-12B-v4.0">ReadyArt/Forgotten-Safeword-12B-v4.0</a></li> <li><a href="https://huggingface.co/ReadyArt/Omega-Darker_The-Final-Directive-12B">ReadyArt/Omega-Darker_The-Final-Directive-12B</a></li> <li><a href="https://huggingface.co/romaingrx/red-teamer-mistral-nemo">romaingrx/red-teamer-mistral-nemo</a></li> <li><a href="https://huggingface.co/Sao10K/MN-12B-Lyra-v1">Sao10K/MN-12B-Lyra-v1</a></li> <li><a href="https://huggingface.co/Sao10K/MN-12B-Lyra-v4">Sao10K/MN-12B-Lyra-v4</a></li> <li><a href="https://huggingface.co/shisa-ai/shisa-v2-mistral-nemo-12b">shisa-ai/shisa-v2-mistral-nemo-12b</a></li> <li><a href="https://huggingface.co/SicariusSicariiStuff/Impish_Bloodmoon_12B">SicariusSicariiStuff/Impish_Bloodmoon_12B</a></li> <li><a href="https://huggingface.co/sleepdeprived3/Christian-Bible-Expert-v2.0-12B">sleepdeprived3/Christian-Bible-Expert-v2.0-12B</a></li> <li><a href="https://huggingface.co/SuperbEmphasis/MN-12b-RP-Ink-RP-Longform">SuperbEmphasis/MN-12b-RP-Ink-RP-Longform</a></li> <li><a href="https://huggingface.co/SuperbEmphasis/Omega-Darker_The-Final-Directive-Longform-Stage2-ERP-12B-v0.2">SuperbEmphasis/Omega-Darker_The-Final-Directive-Longform-Stage2-ERP-12B-v0.2</a></li> <li><a href="https://huggingface.co/TheDrummer/Rivermind-12B-v1">TheDrummer/Rivermind-12B-v1</a></li> <li><a href="https://huggingface.co/TheDrummer/Rocinante-12B-v1">TheDrummer/Rocinante-12B-v1</a></li> <li><a href="https://huggingface.co/TheDrummer/Rocinante-X-12B-v1">TheDrummer/Rocinante-X-12B-v1</a></li> <li><a href="https://huggingface.co/Trappu/Nemo-Picaro-12B">Trappu/Nemo-Picaro-12B</a></li> <li><a href="https://huggingface.co/Undi95/LocalC-12B-e2.0">Undi95/LocalC-12B-e2.0</a></li> <li><a href="https://huggingface.co/VAGOsolutions/SauerkrautLM-Nemo-12b-Instruct">VAGOsolutions/SauerkrautLM-Nemo-12b-Instruct</a></li> <li><a href="https://huggingface.co/Vortex5/Astral-Noctra-12B">Vortex5/Astral-Noctra-12B</a></li> <li><a href="https://huggingface.co/Vortex5/Azure-Starlight-12B">Vortex5/Azure-Starlight-12B</a></li> <li><a href="https://huggingface.co/Vortex5/Crimson-Constellation-12B">Vortex5/Crimson-Constellation-12B</a></li> <li><a href="https://huggingface.co/Vortex5/Red-Synthesis-12B">Vortex5/Red-Synthesis-12B</a></li> <li><a href="https://huggingface.co/Vortex5/Shining-Seraph-12B">Vortex5/Shining-Seraph-12B</a></li> <li><a href="https://huggingface.co/Vortex5/Starlit-Shadow-12B">Vortex5/Starlit-Shadow-12B</a></li> <li><a href="https://huggingface.co/Vortex5/Vermilion-Sage-12B">Vortex5/Vermilion-Sage-12B</a></li> <li><a href="https://huggingface.co/Vortex5/Scarlet-Seraph-12B">Vortex5/Scarlet-Seraph-12B</a></li> <li><a href="https://huggingface.co/Vortex5/Maroon-Sunset-12B">Vortex5/Maroon-Sunset-12B</a></li> <li><a href="https://huggingface.co/Vortex5/Amber-Starlight-12B">Vortex5/Amber-Starlight-12B</a></li> </ul> </div> </details> </div> <div class="section-container"> <div class="section-header"> <div class="section-indicator"></div> <h2 class="section-title">Merge Pipeline & Configuration</h2> </div> <div class="section-content"> <p><b>🧬 Ancient Awakening 12B</b> unites several methods and 70 models into one:</p> <ol> <li><a href="https://huggingface.co/EldritchLabs/Kraken-Karcher-12B-v1">🦑 Kraken Karcher v1</a>: Combines 53 <a href="https://huggingface.co/models?other=base_model:finetune:mistralai/Mistral-Nemo-Instruct-2407">Mistral Nemo finetunes</a> via the <code>karcher</code> method at 500 iterations</li> <li><a href="https://huggingface.co/Naphula/Riemannian-Redshift-12B-v1">🌌 Riemannian Redshift v1</a>: Combines 10 <a href="https://huggingface.co/Vortex5">Vortex5</a> merges (which contain custom methods like <code>saef</code>, <code>smi_oni</code>, and <code>hpq</code>) via the <code>karcher</code> method at 1000 iterations</li> <li>RedKFlux: <code>flux</code> merge of Kraken with Redshift at 1000 iterations</li> <li>RedKFluxMell: <code>arcee_fusion</code> merge of #3 with <a href="https://huggingface.co/inflatebot/MN-12B-Mag-Mell-R1">Mag-Mell</a></li> <li>BloodKraken: <code>arcee_fusion</code> merge of #4 with <a href="https://huggingface.co/SicariusSicariiStuff/Impish_Bloodmoon_12B">Impish Bloodmoon</a></li> <li><a href="https://huggingface.co/Naphula-Archives/F5-stage6-12B">F5-stage6</a>: <code>arcee_fusion</code> merge of #5 with <a href="https://huggingface.co/LatitudeGames/Muse-12B">Muse</a></li> <li><a href="https://huggingface.co/Naphula-Archives/F5-stage7-12B">F5-stage7</a>: <code>ramplus_tl</code> merge of #6 with #3</li> <li><a href="https://huggingface.co/Naphula/Ancient-Awakening-12B">🧬 Ancient Awakening 12B</a>: <code>pdq</code> merge of #7 with #6, #3, #2, #1, Mag-Mell, Impish-Bloodmoon, and Muse</li> <li><code>mpoa</code> <a href="https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration">ablation</a> applied to remove censorship <a href="https://huggingface.co/Naphula/Ancient-Awakening-12B-MPOA">(released seperately)</a></i></li> <b>Note:</b> If you encounter any issues with the model then you can try using F5-stage6 or stage7 merges as these are likely more stable. </ol> <hr> <h3 class="subheading">Stage 1: 🦑 Kraken Karcher</h3> <pre><code>base_model: B:/12B/models--mistralai--Mistral-Nemo-Instruct-2407 models: - model: B:/12B/models--aixonlab--Aether-12b - model: B:/12B/models--aixonlab--Zinakha-12b - model: B:/12B/models--allura-org--Bigger-Body-12b - model: B:/12B/models--allura-org--MN-12b-RP-Ink - model: B:/12B/models--allura-org--remnant-mn-12b - model: B:/12B/models--anthracite-org--magnum-v4-12b - model: B:/12B/models--ArliAI--Mistral-Nemo-12B-ArliAI-RPMax-v1.2 - model: B:/12B/models--Babsie--Opulus-12B-v3 - model: B:/12B/models--BeaverAI--mistral-doryV2-12b - model: B:/12B/models--crestf411--nemo-sunfall-v0.6.1 - model: B:/12B/models--EpistemeAI2--Fireball-Mistral-Nemo-12B-Philos - model: B:/12B/models--EpistemeAI--Mistral-Nemo-Instruct-12B-Philosophy-Math - model: B:/12B/models--Fizzarolli--MN-12b-Rosier-v1 - model: B:/12B/models--HumanLLMs--Human-Like-Mistral-Nemo-Instruct-2407 - model: B:/12B/models--IIEleven11--Kalypso - model: B:/12B/models--intervitens--mini-magnum-12b-v1.1 - model: B:/12B/models--jtatman--mistral_nemo_12b_reasoning_psychology_lora - model: B:/12B/models--KOOWEEYUS--BlackSheep-RP-12B - model: B:/12B/models--Lambent--Arsenic-Shahrazad-12B-v2 - model: B:/12B/models--Lambent--Arsenic-Shahrazad-12B-v3 - model: B:/12B/models--Lambent--arsenic-nemo-unleashed-12B - model: B:/12B/models--Lambent--Gilded-Arsenic-12B - model: B:/12B/models--mistralai--Mistral-Nemo-Instruct-2407 - model: B:/12B/models--nbeerbower--Lyra-Gutenberg-mistral-nemo-12B - model: B:/12B/models--nbeerbower--Lyra4-Gutenberg-12B - model: B:/12B/models--nbeerbower--mistral-nemo-bophades-12B - model: B:/12B/models--nbeerbower--mistral-nemo-gutenberg-12B-v3 - model: B:/12B/models--nbeerbower--mistral-nemo-gutenberg-12B-v4 - model: B:/12B/models--nbeerbower--Mistral-Nemo-Gutenberg-Doppel-12B - model: B:/12B/models--nbeerbower--Mistral-Nemo-Gutenberg-Encore-12B - model: B:/12B/models--nbeerbower--Mistral-Nemo-Gutenberg-Vitus-12B - model: B:/12B/models--nbeerbower--mistral-nemo-wissenschaft-12B - model: B:/12B/models--NeverSleepHistorical--lumi-nemo-e2.0 - model: B:/12B/models--NeverSleep--Lumimaid-v0.2-12B - model: B:/12B/models--nothingiisreal--Celeste-12B-V1.6 - model: B:/12B/models--nothingiisreal--MN-12B-Celeste-V1.9 - model: B:/12B/models--PocketDoc--Dans-DangerousWinds-V1.1.0-12b - model: B:/12B/models--ReadyArt--Dark-Nexus-12B-v2.0 - model: B:/12B/models--ReadyArt--Forgotten-Safeword-12B-v4.0 - model: B:/12B/models--ReadyArt--Omega-Darker_The-Final-Directive-12B - model: B:/12B/models--romaingrx--red-teamer-mistral-nemo - model: B:/12B/models--Sao10K--MN-12B-Lyra-v1 - model: B:/12B/models--Sao10K--MN-12B-Lyra-v4 - model: B:/12B/models--shisa-ai--shisa-v2-mistral-nemo-12b - model: B:/12B/models--sleepdeprived3--Christian-Bible-Expert-v2.0-12B - model: B:/12B/models--SuperbEmphasis--MN-12b-RP-Ink-RP-Longform - model: B:/12B/models--SuperbEmphasis--Omega-Darker_The-Final-Directive-Longform-Stage2-ERP-12B-v0.2 - model: B:/12B/models--TheDrummer--Rivermind-12B-v1 - model: B:/12B/models--TheDrummer--Rocinante-12B-v1 - model: B:/12B/models--TheDrummer--Rocinante-X-12B-v1 - model: B:/12B/models--Trappu--Nemo-Picaro-12B - model: B:/12B/models--Undi95--LocalC-12B-e2.0 - model: B:/12B/models--VAGOsolutions--SauerkrautLM-Nemo-12b-Instruct merge_method: karcher parameters: max_iter: 500 tol: 1.0e-9 dtype: float32 out_dtype: bfloat16 tokenizer: source: union chat_template: auto name: 🦑‍ Kraken-Karcher-12B-v1</code></pre> <h3 class="subheading">Stage 2: 🌌 Riemannian Redshift</h3> <pre><code>models: - model: B:/12B/models--Vortex5--Astral-Noctra-12B - model: B:/12B/models--Vortex5--Azure-Starlight-12B - model: B:/12B/models--Vortex5--Crimson-Constellation-12B - model: B:/12B/models--Vortex5--Red-Synthesis-12B - model: B:/12B/models--Vortex5--Shining-Seraph-12B - model: B:/12B/models--Vortex5--Starlit-Shadow-12B - model: B:/12B/models--Vortex5--Vermilion-Sage-12B - model: B:/12B/models--Vortex5--Scarlet-Seraph-12B - model: B:/12B/models--Vortex5--Maroon-Sunset-12B - model: B:/12B/models--Vortex5--Amber-Starlight-12B merge_method: karcher parameters: max_iter: 1000 tol: 1.0e-9 dtype: float32 out_dtype: bfloat16 tokenizer: source: union chat_template: auto name: 🌌 Riemannian-Redshift-12B-v1</code></pre> <h3 class="subheading">Stage 3: RedKFlux</h3> <pre><code>models: - model: C:\mergekit-main\merged_model_redshift - model: C:\mergekit-main\merged_model_kraken_karcher merge_method: flux parameters: eta: 1.2 tol: 1.0e-9 max_iter: 1000 kappa: 0.8 dtype: float32 out_dtype: bfloat16 tokenizer: source: union chat_template: auto name: RedKFlux</code></pre> <h3 class="subheading">Stage 4: RedKFluxMell</h3> <pre><code>models: - model: C:\mergekit-main\merged_model_RedKFlux - model: B:\8B\models--inflatebot--MN-12B-Mag-Mell-R1 merge_method: arcee_fusion tukey_fence: 1.5 base_model: C:\mergekit-main\merged_model_RedKFlux dtype: float32 out_dtype: bfloat16 tokenizer: source: base name: RedKFluxMell</code></pre> <h3 class="subheading">Stage 5: BloodKraken</h3> <pre><code>models: - model: C:\mergekit-main\merged_model_RedKFluxMell - model: B:\8B\models--SicariusSicariiStuff--Impish_Bloodmoon_12B merge_method: arcee_fusion tukey_fence: 1.5 base_model: C:\mergekit-main\merged_model_RedKFluxMell dtype: float32 out_dtype: bfloat16 tokenizer: source: base name: BloodKraken</code></pre> <h3 class="subheading">Stage 6: BloodKrakenMuse</h3> <pre><code>models: - model: C:\mergekit-main\merged_model_BloodKraken - model: B:\8B\models--LatitudeGames--Muse-12B merge_method: arcee_fusion tukey_fence: 1.5 base_model: C:\mergekit-main\merged_model_BloodKraken dtype: float32 out_dtype: bfloat16 tokenizer: source: base name: BloodKrakenMuse</code></pre> <h3 class="subheading">Stage 7: Ramplus_tl</h3> <pre><code>merge_method: ramplus_tl base_model: C:\mergekit-main\merged_model_BloodKrakenMuse models: - model: C:\mergekit-main\merged_model_BloodKrakenMuse - model: C:\mergekit-main\merged_model_RedKFlux parameters: epsilon: 0.001 # Increased from 1e-5 to 1e-3 for denser SFT/DPO task vectors r: 0.25 # Increased from 0.1 to 0.2-0.3 for better SFT behavior preservation alpha: 0.4 # Increased from 0.2 to 0.4 for enhanced rescaling dtype: float32 out_dtype: bfloat16 tokenizer: source: base name: Stage7</code></pre> <h3 class="subheading">Stage 8: 🧬 Ancient Awakening</h3> <pre><code>merge_method: pdq pdq_base_yaml: C:\mergekit-main\stage7.yaml pdq_base_model: C:\mergekit-main\merged_model_stage7 output_dir: C:\mergekit-main\stage8_pdq base_model: C:\mergekit-main\merged_model_BloodKrakenMuse models: - model: C:\mergekit-main\merged_model_BloodKrakenMuse - model: B:\12B\models--LatitudeGames--Muse-12B - model: B:\12B\models--SicariusSicariiStuff--Impish_Bloodmoon_12B - model: B:\12B\models--inflatebot--MN-12B-Mag-Mell-R1 - model: C:\mergekit-main\merged_model_RedKFlux - model: C:\mergekit-main\merged_model_redshift - model: C:\mergekit-main\merged_model_kraken_karcher parameters: chi: 0.15 iota: 0.1 nu: 24 gamma: 1.0 zeta: 16 sigma: 0.5 density: 0.9 epsilon: 0.099 lambda: 1.0 lazy_unpickle: True random_seed: 420 name: 🧬 Ancient-Awakening-12B</code></pre> <h3 class="subheading">Stage 9: Magnitude-Preserving Othogonalized Ablation</h3> <pre><code># python measure.py -m C:\mergekit-main\f8_pdq -o C:\mergekit-main\f8_pdq\ablit_proj --batch-size 8 --projected # python analyze_old.py C:\mergekit-main\f8_pdq\ablit_proj -c # sharded_ablate.py magmell.yml --normpreserve --projected # # The model to be ablated. model: C:\mergekit-main\f8_pdq # # The measurement file generated by measure.py for the model. measurements: C:\mergekit-main\f8_pdq\ablit_proj # # The directory where the new, ablated model will be saved. output: C:\mergekit-main\f8_pdq\ablit_biproj\ # # The list of ablation operations to perform. # Strategy: Use the single best refusal direction from the peak signal layer (29) # and apply it across all relevant mid-to-late layers. ablate: # Start ablating from the mid-layers where the signal begins to strengthen. - layer: 0 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 1 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 2 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 3 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 4 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 5 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 6 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 7 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 8 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 9 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 10 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 11 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 12 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 13 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 14 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 15 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 16 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 17 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 18 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 19 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 20 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 21 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 22 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 23 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 24 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 25 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 26 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 27 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 28 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 29 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 30 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 31 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 32 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 33 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 34 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 35 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 36 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 37 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 38 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 39 measurement: 29 scale: 1.2 sparsity: 0.00</code></pre> </div> </div> </body> </html>

Author: Naphula

Likes: 3

Downloads: 0

Tags: transformers, safetensors, mistral, text-generation, creative, creative writing, fiction writing, plot generation, sub-plot generation, story generation, scene continue, storytelling, fiction story, science fiction, romance, all genres, story, writing, vivid prosing, vivid writing, fiction, roleplaying, float32, swearing, rp, horror, nemo, merge, mergekit, karcher, flux, arcee_fusion, ramplus_tl, pdq, conversational, en, arxiv:2601.13572, base_model:ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.2, base_model:merge:ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.2, base_model:Babsie/Opulus-12B-v3, base_model:merge:Babsie/Opulus-12B-v3, base_model:BeaverAI/mistral-doryV2-12b, base_model:merge:BeaverAI/mistral-doryV2-12b, base_model:EldritchLabs/Kraken-Karcher-12B-v1, base_model:merge:EldritchLabs/Kraken-Karcher-12B-v1, base_model:EpistemeAI/Mistral-Nemo-Instruct-12B-Philosophy-Math, base_model:merge:EpistemeAI/Mistral-Nemo-Instruct-12B-Philosophy-Math, base_model:EpistemeAI2/Fireball-Mistral-Nemo-12B-Philos, base_model:merge:EpistemeAI2/Fireball-Mistral-Nemo-12B-Philos, base_model:Fizzarolli/MN-12b-Rosier-v1, base_model:merge:Fizzarolli/MN-12b-Rosier-v1, base_model:HumanLLMs/Human-Like-Mistral-Nemo-Instruct-2407, base_model:merge:HumanLLMs/Human-Like-Mistral-Nemo-Instruct-2407, base_model:IIEleven11/Kalypso, base_model:merge:IIEleven11/Kalypso, base_model:KOOWEEYUS/BlackSheep-RP-12B, base_model:merge:KOOWEEYUS/BlackSheep-RP-12B, base_model:Lambent/Arsenic-Shahrazad-12B-v2, base_model:merge:Lambent/Arsenic-Shahrazad-12B-v2, base_model:Lambent/Arsenic-Shahrazad-12B-v3, base_model:merge:Lambent/Arsenic-Shahrazad-12B-v3, base_model:Lambent/Gilded-Arsenic-12B, base_model:merge:Lambent/Gilded-Arsenic-12B, base_model:Lambent/arsenic-nemo-unleashed-12B, base_model:merge:Lambent/arsenic-nemo-unleashed-12B, base_model:LatitudeGames/Muse-12B, base_model:merge:LatitudeGames/Muse-12B, base_model:Naphula-Archives/F5-stage6-12B, base_model:merge:Naphula-Archives/F5-stage6-12B, base_model:Naphula-Archives/F5-stage7-12B, base_model:merge:Naphula-Archives/F5-stage7-12B, base_model:Naphula/Riemannian-Redshift-12B-v1, base_model:merge:Naphula/Riemannian-Redshift-12B-v1, base_model:NeverSleep/Lumimaid-v0.2-12B, base_model:merge:NeverSleep/Lumimaid-v0.2-12B, base_model:NeverSleepHistorical/lumi-nemo-e2.0, base_model:merge:NeverSleepHistorical/lumi-nemo-e2.0, base_model:PocketDoc/Dans-DangerousWinds-V1.1.0-12b, base_model:merge:PocketDoc/Dans-DangerousWinds-V1.1.0-12b, base_model:ReadyArt/Dark-Nexus-12B-v2.0, base_model:merge:ReadyArt/Dark-Nexus-12B-v2.0, base_model:ReadyArt/Forgotten-Safeword-12B-v4.0, base_model:merge:ReadyArt/Forgotten-Safeword-12B-v4.0, base_model:ReadyArt/Omega-Darker_The-Final-Directive-12B, base_model:merge:ReadyArt/Omega-Darker_The-Final-Directive-12B, base_model:Sao10K/MN-12B-Lyra-v1, base_model:merge:Sao10K/MN-12B-Lyra-v1, base_model:Sao10K/MN-12B-Lyra-v4, base_model:merge:Sao10K/MN-12B-Lyra-v4, base_model:SicariusSicariiStuff/Impish_Bloodmoon_12B, base_model:merge:SicariusSicariiStuff/Impish_Bloodmoon_12B, base_model:SuperbEmphasis/MN-12b-RP-Ink-RP-Longform, base_model:merge:SuperbEmphasis/MN-12b-RP-Ink-RP-Longform, base_model:SuperbEmphasis/Omega-Darker_The-Final-Directive-Longform-Stage2-ERP-12B-v0.2, base_model:merge:SuperbEmphasis/Omega-Darker_The-Final-Directive-Longform-Stage2-ERP-12B-v0.2, base_model:TheDrummer/Rivermind-12B-v1, base_model:merge:TheDrummer/Rivermind-12B-v1, base_model:TheDrummer/Rocinante-12B-v1, base_model:merge:TheDrummer/Rocinante-12B-v1, base_model:TheDrummer/Rocinante-X-12B-v1, base_model:merge:TheDrummer/Rocinante-X-12B-v1, base_model:Trappu/Nemo-Picaro-12B, base_model:merge:Trappu/Nemo-Picaro-12B, base_model:Undi95/LocalC-12B-e2.0, base_model:merge:Undi95/LocalC-12B-e2.0, base_model:VAGOsolutions/SauerkrautLM-Nemo-12b-Instruct, base_model:merge:VAGOsolutions/SauerkrautLM-Nemo-12b-Instruct, base_model:Vortex5/Amber-Starlight-12B, base_model:merge:Vortex5/Amber-Starlight-12B, base_model:Vortex5/Astral-Noctra-12B, base_model:merge:Vortex5/Astral-Noctra-12B, base_model:Vortex5/Azure-Starlight-12B, base_model:merge:Vortex5/Azure-Starlight-12B, base_model:Vortex5/Crimson-Constellation-12B, base_model:merge:Vortex5/Crimson-Constellation-12B, base_model:Vortex5/Maroon-Sunset-12B, base_model:merge:Vortex5/Maroon-Sunset-12B, base_model:Vortex5/Red-Synthesis-12B, base_model:merge:Vortex5/Red-Synthesis-12B, base_model:Vortex5/Scarlet-Seraph-12B, base_model:merge:Vortex5/Scarlet-Seraph-12B, base_model:Vortex5/Shining-Seraph-12B, base_model:merge:Vortex5/Shining-Seraph-12B, base_model:Vortex5/Starlit-Shadow-12B, base_model:merge:Vortex5/Starlit-Shadow-12B, base_model:Vortex5/Vermilion-Sage-12B, base_model:merge:Vortex5/Vermilion-Sage-12B, base_model:aixonlab/Aether-12b, base_model:merge:aixonlab/Aether-12b, base_model:aixonlab/Zinakha-12b, base_model:merge:aixonlab/Zinakha-12b, base_model:allura-org/Bigger-Body-12b, base_model:merge:allura-org/Bigger-Body-12b, base_model:allura-org/MN-12b-RP-Ink, base_model:merge:allura-org/MN-12b-RP-Ink, base_model:allura-org/remnant-mn-12b, base_model:merge:allura-org/remnant-mn-12b, base_model:anthracite-org/magnum-v4-12b, base_model:merge:anthracite-org/magnum-v4-12b, base_model:crestf411/nemo-sunfall-v0.6.1, base_model:merge:crestf411/nemo-sunfall-v0.6.1, base_model:inflatebot/MN-12B-Mag-Mell-R1, base_model:merge:inflatebot/MN-12B-Mag-Mell-R1, base_model:intervitens/mini-magnum-12b-v1.1, base_model:merge:intervitens/mini-magnum-12b-v1.1, base_model:jtatman/mistral_nemo_12b_reasoning_psychology_lora, base_model:merge:jtatman/mistral_nemo_12b_reasoning_psychology_lora, base_model:mistralai/Mistral-Nemo-Instruct-2407, base_model:merge:mistralai/Mistral-Nemo-Instruct-2407, base_model:nbeerbower/Lyra-Gutenberg-mistral-nemo-12B, base_model:merge:nbeerbower/Lyra-Gutenberg-mistral-nemo-12B, base_model:nbeerbower/Lyra4-Gutenberg-12B, base_model:merge:nbeerbower/Lyra4-Gutenberg-12B, base_model:nbeerbower/Mistral-Nemo-Gutenberg-Doppel-12B, base_model:merge:nbeerbower/Mistral-Nemo-Gutenberg-Doppel-12B, base_model:nbeerbower/Mistral-Nemo-Gutenberg-Encore-12B, base_model:merge:nbeerbower/Mistral-Nemo-Gutenberg-Encore-12B, base_model:nbeerbower/Mistral-Nemo-Gutenberg-Vitus-12B, base_model:merge:nbeerbower/Mistral-Nemo-Gutenberg-Vitus-12B, base_model:nbeerbower/mistral-nemo-bophades-12B, base_model:merge:nbeerbower/mistral-nemo-bophades-12B, base_model:nbeerbower/mistral-nemo-gutenberg-12B-v3, base_model:merge:nbeerbower/mistral-nemo-gutenberg-12B-v3, base_model:nbeerbower/mistral-nemo-gutenberg-12B-v4, base_model:merge:nbeerbower/mistral-nemo-gutenberg-12B-v4, base_model:nbeerbower/mistral-nemo-wissenschaft-12B, base_model:merge:nbeerbower/mistral-nemo-wissenschaft-12B, base_model:nothingiisreal/Celeste-12B-V1.6, base_model:merge:nothingiisreal/Celeste-12B-V1.6, base_model:nothingiisreal/MN-12B-Celeste-V1.9, base_model:merge:nothingiisreal/MN-12B-Celeste-V1.9, base_model:romaingrx/red-teamer-mistral-nemo, base_model:merge:romaingrx/red-teamer-mistral-nemo, base_model:shisa-ai/shisa-v2-mistral-nemo-12b, base_model:merge:shisa-ai/shisa-v2-mistral-nemo-12b, base_model:sleepdeprived3/Christian-Bible-Expert-v2.0-12B, base_model:merge:sleepdeprived3/Christian-Bible-Expert-v2.0-12B, license:apache-2.0, text-generation-inference, endpoints_compatible, region:us

armand0e/Omnicoder-9B-Opus-Distill-GGUF

Author: armand0e

Likes: 2

Downloads: 0

Tags: gguf, endpoints_compatible, region:us, conversational

Sengil/turkish-gemma-9b-finance-sft


language:

  • tr license: apache-2.0 base_model: ytu-ce-cosmos/Turkish-Gemma-9b-T1 tags:
  • finance
  • llm
  • instruction-tuning
  • sft
  • trl
  • transformers pipeline_tag: text-generation library_name: transformers

Sengil/turkish-gemma-9b-finance-sft

<p align="center"> <img src="./Gemini_Generated_Image_1i1esm1i1esm1i1e.png" alt="Model Banner" width="900"/> </p>

Model Overview

Sengil/turkish-gemma-9b-finance-sft is a Turkish financial instruction-tuned large language model developed for finance-related natural language understanding and generation tasks in Turkish.
The model is based on ytu-ce-cosmos/Turkish-Gemma-9b-T1 and further fine-tuned using Supervised Fine-Tuning (SFT) on a finance-focused instruction dataset.

Base Model

  • Base model: ytu-ce-cosmos/Turkish-Gemma-9b-T1
  • Fine-tuning method: Supervised Fine-Tuning (SFT)

Intended Use

This model is designed for Turkish-language finance-related NLP applications, including:

  • Financial question answering
  • Finance-oriented instruction following
  • General Turkish financial text generation
  • Educational and research use in Turkish finance NLP

Training Dataset

The model was fine-tuned on the following dataset:

  • Dataset:
    • AlicanKiraz0/Turkish-Finance-SFT-Dataset
    • Dbmaxwell/turkish-finance-instruction-dataset
    • RsGoksel/Finansal

I transformed a plain dataset into an instruction-tuning-ready synthetic dataset using the Gemini API by generating structured instruction-response examples. This process made the data more suitable for training LLMs to better follow user prompts, formats, and task-specific guidance. I will share the dataset soon.

Key Training Hyperparameters

  • Max steps: 500
  • Approx. epochs: ~3.7
  • Learning rate: 1e-4
  • Per-device train batch size: 4
  • Gradient accumulation steps: 4
  • Effective batch size: 16
  • Optimizer: AdamW 8-bit
  • Weight decay: 0.01
  • LR scheduler: Cosine
  • Warmup steps: 20
  • Max grad norm: 1.0
  • Precision: bfloat16

Training Procedure

The model was trained with supervised fine-tuning on Turkish finance-domain examples.
The objective was to improve domain adaptation and instruction-following ability in financial contexts while preserving the Turkish language capabilities of the base model.

Example Use

!pip install unsloth

Inference code

from unsloth import FastLanguageModel
import torch

repo_id = "Sengil/turkish-gemma-9b-finance-sft"
max_seq_length = 2048

SYSTEM_PROMPT = "Sen, kripto para ve borsa konularında uzmanlaşmış, hem Türkiye hem de global finansal piyasalara hakim bir finans asistanısın. Sana sorulan sorulara kullanıcının istediği şekilde cevap veriyorsun."

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=repo_id,
    max_seq_length=max_seq_length,
    dtype=torch.bfloat16,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

def ask_finance_model(question, max_new_tokens=1024):
    prompt = (
        f"<start_of_turn>system\n{SYSTEM_PROMPT}<end_of_turn>\n"
        f"<start_of_turn>user\n{question.strip()}<end_of_turn>\n"
        f"<start_of_turn>model\n"
    )

    inputs = tokenizer([prompt], return_tensors="pt", truncation=True, max_length=max_seq_length).to("cuda")

    with torch.inference_mode():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=False,
            use_cache=True,
            eos_token_id=tokenizer.eos_token_id,
            pad_token_id=tokenizer.eos_token_id,
        )

    text = tokenizer.decode(outputs[0], skip_special_tokens=False)
    return text.split("<start_of_turn>model\n", 1)[-1].split("<end_of_turn>", 1)[0].strip()

response = ask_finance_model(
    "Cari açık veren bir ülkede yerel para birimi neden baskı altında kalabilir? Kısaca açıklar mısın?",
    max_new_tokens=1024,
)

print(response)

Stream use code

from unsloth import FastLanguageModel
from transformers import TextStreamer
import torch

repo_id = "Sengil/turkish-gemma-9b-finance-sft"

max_seq_length = 2048
dtype = torch.bfloat16
load_in_4bit = True

SHORT_SYSTEM_PROMPT = (
    "Sen, kripto para ve borsa konularında uzmanlaşmış, hem Türkiye hem de global finansal piyasalara hakim bir finans asistanısın. Sana sorulan sorulara kullanıcının istediği şekilde cevap veriyorsun."
)

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=repo_id,
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)

FastLanguageModel.for_inference(model)

def ask_finance_model(
    question,
    max_new_tokens=1024,
    do_sample=False,
    stream=True,
    system_prompt=SHORT_SYSTEM_PROMPT,
):
    prompt = (
        f"<start_of_turn>system\n{system_prompt}<end_of_turn>\n"
        f"<start_of_turn>user\n{question.strip()}<end_of_turn>\n"
        f"<start_of_turn>model\n"
    )

    inputs = tokenizer(
        [prompt],
        return_tensors="pt",
        truncation=True,
        max_length=max_seq_length,
    ).to("cuda")

    generation_kwargs = dict(
        **inputs,
        max_new_tokens=max_new_tokens,
        do_sample=do_sample,
        use_cache=True,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.eos_token_id,
    )

    if stream:
        streamer = TextStreamer(
            tokenizer,
            skip_prompt=True,
            skip_special_tokens=False,
        )
        generation_kwargs["streamer"] = streamer

    with torch.inference_mode():
        outputs = model.generate(**generation_kwargs)

    full_output = tokenizer.batch_decode(outputs, skip_special_tokens=False)[0]

    answer = full_output.split("<start_of_turn>model\n", 1)[-1]
    answer = answer.split("<end_of_turn>", 1)[0].strip()

    return answer

response = ask_finance_model(
    """Bir hisse senedinde uzun süredir devam eden yatay konsolidasyon bölgesi yukarı yönlü hacimli bir kırılımla aşıldıktan sonra, fiyatın tekrar bu banda doğru sarkmasına rağmen volatilitenin belirgin biçimde düşmesi, trendin devamı açısından nasıl yorumlanmalıdır?""",
    max_new_tokens=4096,
    do_sample=False,
    stream=True,
)

print("\n\n====== FINAL ANSWER ======\n")
print(response)

output>>

====== FINAL ANSWER ======

<think>
Bu soru, teknik analiz bağlamında bir hisse senedi fiyat hareketini yorumlamaya yönelik. Soru şu: Uzun süreli yatay konsolidasyon (sideways consolidation) bölgesi yukarı yönlü hacimli bir kırılımla aşıldıktan sonra, fiyatın tekrar bu banda doğru sarkmasına rağmen volatilitenin belirgin biçimde düşmesi, trendin devamı açısından nasıl yorumlanmalıdır?

Öncelikle, yatay konsolidasyon: Fiyatın belirli bir aralıkta (destek ve direnç seviyeleri arasında) uzun süre hareket etmesi, genellikle bir sonraki büyük hareketin (trend) başlangıcına işaret eder. Hacimli kırılım: Kırılımın hacimli olması, trendin gücünü gösterir. Yukarı yönlü hacimli kırılım, yükseliş trendinin başladığına dair güçlü bir sinyaldir.

Ancak, fiyatın tekrar konsolidasyon bandına doğru sarkması (retest) ve volatilitenin düşmesi, trendin devamı açısından nasıl yorumlanır?

Volatilite düşüşü, piyasanın daha az hareketli, daha az belirsiz olduğu anlamına gelir. Bu, trendin güçlendiğine işaret edebilir mi? Genellikle, bir trendin güçlenmesi için volatilite düşebilir, çünkü fiyat trend çizgisine yakın kalır ve aşırı dalgalanmalar azalır. Ancak, volatilite düşüşü aynı zamanda trendin zayıfladığına da işaret edebilir; çünkü fiyat trend çizgisinden uzaklaşmaz, ancak hareketler küçülür. Bu durumda, fiyatın konsolidasyon bandına sarkması, trendin devam ettiğini gösteren bir geri çekilme (pullback) olabilir.

Yorumlama:

1. **Kırılımın doğası**: Hacimli kırılım, trendin başlangıcını onaylar. Fiyatın konsolidasyon bandına geri dönmesi, trendin devam ettiğini gösteren bir geri çekilme olabilir. Bu geri çekilme, trend çizgisine yakın bir destek seviyesinde (örneğin, kırılım sonrası oluşan yeni destek) fiyatın toparlanması beklenir.

2. **Volatilite düşüşü**: Volatilite düşüşü, piyasanın daha az tepkisel olduğunu, daha az belirsizlik olduğunu gösterir. Bu, trendin güçlendiğine işaret edebilir. Çünkü trend güçlendikçe, fiyat trend çizgisine yakın kalır ve aşırı dalgalanmalar azalır. Ancak, volatilite düşüşü aynı zamanda trendin zayıfladığına da işaret edebilir; çünkü fiyat trend çizgisinden uzaklaşmaz, ancak hareketler küçülür. Bu durumda, fiyatın konsolidasyon bandına sarkması, trendin devam ettiğini gösteren bir geri çekilme olabilir.

3. **Trend devamı açısından**: Eğer fiyat, konsolidasyon bandına sarktıktan sonra trend çizgisine yakın bir destek seviyesinde toparlanırsa ve volatilite düşüşü devam ederse, trendin devam ettiği söylenebilir. Ancak, eğer fiyat konsolidasyon bandının altına düşerse ve volatilite artarsa, trendin zayıfladığı düşünülebilir.

4. **Hacim analizi**: Kırılımda hacim yüksekti. Geri çekilme sırasında hacim düşükse, bu geri çekilmenin trendi tehdit etmediğini gösterir. Hacim düşükse, alıcılar ve satıcılar arasında bir denge var demektir.

5. **Destek ve direnç seviyeleri**: Konsolidasyon bandı artık bir destek seviyesi olarak işlev görebilir. Fiyat bu seviyeye geri döndüğünde, alıcıların devreye girmesi beklenir.

6. **Volatilite göstergeleri**: ATR (Average True Range) gibi göstergelerle volatilite ölçülebilir. Volatilite düşüşü, trendin güçlendiğine işaret edebilir.

7. **Psikolojik faktörler**: Yatırımcılar, trendin devam ettiğine inanmaya başlar ve daha az işlem yapar, bu da volatiliteyi düşürür.

Sonuç olarak, bu durum trendin devamı açısından olumlu bir sinyal olarak yorumlanabilir. Ancak, fiyatın konsolidasyon bandının altına düşmesi ve volatilitenin artması durumunda trendin zayıfladığı düşünülebilir.

Cevabı yapılandıralım:

- Giriş: Yatay konsolidasyon, hacimli kırılım, geri çekilme ve volatilite düşüşü kavramlarını açıkla.
- Kırılımın doğası: Hacimli kırılım trendin başlangıcını onaylar.
- Geri çekilme: Trend devamı için normal bir geri çekilme olabilir.
- Volatilite düşüşü: Trendin güçlendiğine işaret edebilir, çünkü fiyat trend çizgisine yakın kalır.
- Destek seviyeleri: Konsolidasyon bandı artık destek olarak işlev görür.
- Hacim: Geri çekilme sırasında hacim düşükse, trend devam ediyor demektir.
- Olası senaryolar: Fiyatın konsolidasyon bandının altına düşmesi ve volatilitenin artması trendi tehdit eder.
- Sonuç: Bu durum trendin devamı açısından olumlu yorumlanabilir, ancak dikkatli olunmalı.

Cevabı Türkçe olarak vermeliyim.
</think>

Yatay konsolidasyon, bir hisse senedinin belirli bir fiyat aralığında uzun süre hareket etmesiyle oluşan bir piyasa yapısıdır. Bu aralık, genellikle bir sonraki büyük trendin (yükseliş veya düşüş) başlangıcına işaret eder. Hacimli kırılım ise, fiyatın bu aralığı yukarı veya aşağı yönde, önemli bir hacimle aşmasıdır. Yukarı yönlü hacimli kırılım, yükseliş trendinin başladığına dair güçlü bir sinyaldir.

Ancak, fiyatın konsolidasyon bandına geri dönmesi (retest) ve volatilitenin belirgin biçimde düşmesi, trendin devamı açısından nasıl yorumlanmalıdır? Bu durumu değerlendirmek için birkaç teknik ve psikolojik faktörü göz önünde bulundurmak gerekir.

### 1. Kırılımın Doğası ve Geri Çekilme
- **Hacimli kırılım**, trendin başlangıcını onaylar. Fiyatın konsolidasyon bandına geri dönmesi, trendin devam ettiğini gösteren bir geri çekilme (pullback) olabilir. Bu geri çekilme, trend çizgisine yakın bir destek seviyesinde (örneğin, kırılım sonrası oluşan yeni destek) fiyatın toparlanması beklenir.
- Geri çekilme, trendin gücünü test eder. Eğer fiyat, konsolidasyon bandının altına düşmez ve trend çizgisine yakın bir destekte toparlanırsa, trendin devam ettiği söylenebilir.

### 2. Volatilite Düşüşünün Anlamı
- Volatilite düşüşü, piyasanın daha az hareketli, daha az belirsiz olduğu anlamına gelir. Bu durum, trendin güçlendiğine işaret edebilir. Çünkü trend güçlendikçe, fiyat trend çizgisine yakın kalır ve aşırı dalgalanmalar azalır.
- Ancak, volatilite düşüşü aynı zamanda trendin zayıfladığına da işaret edebilir; çünkü fiyat trend çizgisinden uzaklaşmaz, ancak hareketler küçülür. Bu durumda, fiyatın konsolidasyon bandına sarkması, trendin devam ettiğini gösteren bir geri çekilme olabilir.

### 3. Destek ve Direnç Seviyeleri
- Konsolidasyon bandı artık bir destek seviyesi olarak işlev görebilir. Fiyat bu seviyeye geri döndüğünde, alıcıların devreye girmesi beklenir.
- Eğer fiyat, konsolidasyon bandının altına düşerse ve volatilite artarsa, trendin zayıfladığı düşünülebilir.

### 4. Hacim Analizi
- Kırılımda hacim yüksekti. Geri çekilme sırasında hacim düşükse, bu geri çekilmenin trendi tehdit etmediğini gösterir. Hacim düşükse, alıcılar ve satıcılar arasında bir denge var demektir.
- Hacim düşükse, fiyatın trend çizgisine yakın bir destekte toparlanması daha olasıdır.

### 5. Volatilite Göstergeleri
- ATR (Average True Range) gibi göstergelerle volatilite ölçülebilir. Volatilite düşüşü, trendin güçlendiğine işaret edebilir.
- Volatilite düşüşü, yatırımcıların trendin devam ettiğine inanmaya başladığını ve daha az işlem yaptığını gösterir.

### 6. Olası Senaryolar
- **Senaryo 1: Trend devam ediyor**  
  Fiyat, konsolidasyon bandına geri döndüğünde trend çizgisine yakın bir destekte toparlanır ve volatilite düşüşü devam eder. Bu durumda, trendin devam ettiği söylenebilir.
- **Senaryo 2: Trend zayıflıyor**  
  Fiyat, konsolidasyon bandının altına düşer ve volatilite artar. Bu durumda, trendin zayıfladığı veya tersine döndüğü düşünülebilir.

### Sonuç
Bu durum, trendin devamı açısından genellikle olumlu bir sinyal olarak yorumlanır. Ancak, fiyatın konsolidasyon bandının altına düşmesi ve volatilitenin artması durumunda trendin zayıfladığına dair uyarı sinyalleri olarak değerlendirilmelidir. Yatırımcılar, bu geri çekilmeyi bir fırsat olarak görebilir ve trend çizgisine yakın destek seviyelerinde alım yapabilirler.

**Not:** Bu yorum, teknik analiz prensiplerine dayanmaktadır. Yatırım kararları vermeden önce ek analizler ve risk yönetimi uygulanmalıdır.

Limitations

Although the model is specialized for Turkish finance-related tasks, it may still:

  • Generate inaccurate or outdated financial information
  • Produce overly confident responses in uncertain scenarios
  • Reflect biases present in the training data
  • Require human verification for high-stakes financial use cases

This model should not be used as a substitute for professional financial, legal, or investment advice.

Risks and Recommendations

Users are encouraged to:

  • Validate critical outputs before use
  • Avoid relying on the model for regulated or high-risk financial decisions
  • Use human oversight in production environments
  • Benchmark the model on task-specific evaluation datasets before deployment

Citation

@misc{sengil_turkish_gemma_9b_finance_sft,
  author       = {Mert Sengil},
  title        = {Sengil/turkish-gemma-9b-finance-sft},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Sengil/turkish-gemma-9b-finance-sft}}
}

Author

Mert Sengil

Author: Sengil

Likes: 2

Downloads: 0

Tags: transformers, safetensors, finance, llm, instruction-tuning, sft, trl, text-generation, conversational, tr, base_model:ytu-ce-cosmos/Turkish-Gemma-9b-T1, base_model:finetune:ytu-ce-cosmos/Turkish-Gemma-9b-T1, license:apache-2.0, endpoints_compatible, region:us

Naphula/Ancient-Awakening-12B


base_model:

  • aixonlab/Aether-12b
  • aixonlab/Zinakha-12b
  • allura-org/Bigger-Body-12b
  • allura-org/MN-12b-RP-Ink
  • allura-org/remnant-mn-12b
  • anthracite-org/magnum-v4-12b
  • ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.2
  • Babsie/Opulus-12B-v3
  • BeaverAI/mistral-doryV2-12b
  • crestf411/nemo-sunfall-v0.6.1
  • EldritchLabs/Kraken-Karcher-12B-v1
  • EpistemeAI2/Fireball-Mistral-Nemo-12B-Philos
  • EpistemeAI/Mistral-Nemo-Instruct-12B-Philosophy-Math
  • Fizzarolli/MN-12b-Rosier-v1
  • HumanLLMs/Human-Like-Mistral-Nemo-Instruct-2407
  • IIEleven11/Kalypso
  • inflatebot/MN-12B-Mag-Mell-R1
  • intervitens/mini-magnum-12b-v1.1
  • jtatman/mistral_nemo_12b_reasoning_psychology_lora
  • KOOWEEYUS/BlackSheep-RP-12B
  • Lambent/Arsenic-Shahrazad-12B-v2
  • Lambent/Arsenic-Shahrazad-12B-v3
  • Lambent/arsenic-nemo-unleashed-12B
  • Lambent/Gilded-Arsenic-12B
  • LatitudeGames/Muse-12B
  • mistralai/Mistral-Nemo-Instruct-2407
  • Naphula/Riemannian-Redshift-12B-v1
  • Naphula-Archives/F5-stage6-12B
  • Naphula-Archives/F5-stage7-12B
  • nbeerbower/Lyra-Gutenberg-mistral-nemo-12B
  • nbeerbower/Lyra4-Gutenberg-12B
  • nbeerbower/mistral-nemo-bophades-12B
  • nbeerbower/mistral-nemo-gutenberg-12B-v3
  • nbeerbower/mistral-nemo-gutenberg-12B-v4
  • nbeerbower/Mistral-Nemo-Gutenberg-Doppel-12B
  • nbeerbower/Mistral-Nemo-Gutenberg-Encore-12B
  • nbeerbower/Mistral-Nemo-Gutenberg-Vitus-12B
  • nbeerbower/mistral-nemo-wissenschaft-12B
  • NeverSleepHistorical/lumi-nemo-e2.0
  • NeverSleep/Lumimaid-v0.2-12B
  • nothingiisreal/Celeste-12B-V1.6
  • nothingiisreal/MN-12B-Celeste-V1.9
  • PocketDoc/Dans-DangerousWinds-V1.1.0-12b
  • ReadyArt/Dark-Nexus-12B-v2.0
  • ReadyArt/Forgotten-Safeword-12B-v4.0
  • ReadyArt/Omega-Darker_The-Final-Directive-12B
  • romaingrx/red-teamer-mistral-nemo
  • Sao10K/MN-12B-Lyra-v1
  • Sao10K/MN-12B-Lyra-v4
  • shisa-ai/shisa-v2-mistral-nemo-12b
  • SicariusSicariiStuff/Impish_Bloodmoon_12B
  • sleepdeprived3/Christian-Bible-Expert-v2.0-12B
  • SuperbEmphasis/MN-12b-RP-Ink-RP-Longform
  • SuperbEmphasis/Omega-Darker_The-Final-Directive-Longform-Stage2-ERP-12B-v0.2
  • TheDrummer/Rivermind-12B-v1
  • TheDrummer/Rocinante-12B-v1
  • TheDrummer/Rocinante-X-12B-v1
  • Trappu/Nemo-Picaro-12B
  • Undi95/LocalC-12B-e2.0
  • VAGOsolutions/SauerkrautLM-Nemo-12b-Instruct
  • Vortex5/Astral-Noctra-12B
  • Vortex5/Azure-Starlight-12B
  • Vortex5/Crimson-Constellation-12B
  • Vortex5/Red-Synthesis-12B
  • Vortex5/Shining-Seraph-12B
  • Vortex5/Starlit-Shadow-12B
  • Vortex5/Vermilion-Sage-12B
  • Vortex5/Scarlet-Seraph-12B
  • Vortex5/Maroon-Sunset-12B
  • Vortex5/Amber-Starlight-12B language:
  • en library_name: transformers license: apache-2.0 tags:
  • creative
  • creative writing
  • fiction writing
  • plot generation
  • sub-plot generation
  • fiction writing
  • story generation
  • scene continue
  • storytelling
  • fiction story
  • science fiction
  • romance
  • all genres
  • story
  • writing
  • vivid prosing
  • vivid writing
  • fiction
  • roleplaying
  • float32
  • swearing
  • rp
  • horror
  • mistral
  • nemo
  • merge
  • mergekit
  • karcher
  • flux
  • arcee_fusion
  • ramplus_tl
  • pdq widget:
    • text: "Ancient-Awakening-12B" output: url: https://cdn-uploads.huggingface.co/production/uploads/68e840caa318194c44ec2a04/CvyWU1z106Aa__M8KIksp.png

<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/68e840caa318194c44ec2a04/yI041gp0fzz7N_Mh_x5Pt.mpga"></audio>

[!WARNING] <span style="color:red; font-weight:bold">⚠️ Warning:</span> This model works best with either the ChatML or Mistral Tekken chat template. The uncensored MPOA version has guardrails removed, which can produce narratives and RP that contain violent and graphic erotic content. Adjust your system prompt accordingly.

<!DOCTYPE html> <style> body { font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif; color: #D1D5DB; /* Pale stone gray */ line-height: 1.6; margin: 0; padding: 0; background-color: #0A0C10; /* Very dark stormy gray/black */ } b, strong { color: #FBBF24; /* Glowing amber/gold */ text-shadow: 0 0 8px rgba(251, 191, 36, 0.4); } .awakening-text { color: #FEF3C7; /* Pale inner-eye yellow */ position: relative; z-index: 2; margin-left: 0.2em; text-shadow: 0 0 15px #F59E0B, 0 0 30px #B45309; /* Deep fiery orange/gold glow */ font-size: 1.8rem; letter-spacing: 1px; font-weight: 600; } /* Section styling */ .section-container { background-color: rgba(17, 24, 39, 0.85); /* Dark slate rock */ margin-bottom: 30px; position: relative; overflow: hidden; border-bottom: 1px solid #78350F; /* Dark bronze/earth */ box-shadow: 0 4px 20px rgba(0, 0, 0, 0.6); } .section-header { display: flex; align-items: center; background-color: rgba(245, 158, 11, 0.05); /* Faint amber tint */ padding: 10px 20px; border-top: 1px solid rgba(120, 53, 15, 0.4); } .section-indicator { width: 8px; height: 20px; background-color: #F59E0B; /* Amber eye color */ margin-right: 15px; box-shadow: 0 0 10px rgba(245, 158, 11, 0.6); border-radius: 2px; } .section-title { font-family: 'Georgia', 'Times New Roman', serif; /* Ancient tome feel */ color: #FDE68A; /* Light gold */ font-size: 1.4rem; margin: 0; letter-spacing: 1px; font-weight: 400; text-transform: capitalize; } .section-content { padding: 20px; font-family: sans-serif; color: #D1D5DB; line-height: 1.6; } /* Title styling */ .title-container { background-color: #050505; /* Pitch black */ position: relative; overflow: hidden; margin-bottom: 40px; border-left: 4px solid #F59E0B; /* Amber pillar */ box-shadow: 0 6px 25px rgba(245, 158, 11, 0.15); } .title-wrapper { position: relative; z-index: 2; padding: 25px 20px 30px 30px; font-family: 'Georgia', 'Times New Roman', serif; } .title-main { color: #FEF3C7; font-size: 2.0rem; font-weight: 700; margin: 0; letter-spacing: 2px; display: inline-block; position: relative; text-transform: uppercase; } .storm-overlay { position: absolute; top: 0; left: 0; width: 100%; height: 100%; /* Dark, brooding radial fog mimicking the eye's aura */ background-image: radial-gradient(circle at 50% 50%, rgba(245, 158, 11, 0.08) 0%, rgba(0,0,0,0.9) 80%); z-index: 1; } /* Subheading styling */ .subheading { color: #D97706; /* Deep orange */ font-size: 1.1rem; margin-top: 20px; margin-bottom: 15px; font-weight: 400; border-bottom: 1px dashed rgba(217, 119, 6, 0.4); display: inline-block; text-transform: uppercase; letter-spacing: 1px; font-family: 'Georgia', 'Times New Roman', serif; } /* Links */ a { color: #FBBF24; /* Amber */ text-decoration: none; transition: color 0.3s ease, text-shadow 0.3s ease; } a:hover { text-decoration: underline; color: #FDE68A; /* Brighter gold */ text-shadow: 0 0 8px rgba(251, 191, 36, 0.5); } /* Container */ .container { max-width: 1200px; margin: 20px auto; padding: 40px 20px; background-color: #0D1117; /* Deep stormy night */ background-image: radial-gradient(circle at 15% 85%, rgba(120, 53, 15, 0.1) 0%, transparent 50%), radial-gradient(circle at 85% 15%, rgba(245, 158, 11, 0.05) 0%, transparent 50%); min-height: calc(100vh - 40px); border: 1px solid #1F2937; /* Dark stone border */ border-radius: 8px; box-shadow: 0 8px 40px rgba(0, 0, 0, 0.9), inset 0 0 20px rgba(0, 0, 0, 0.5); } /* Code blocks */ pre { background-color: #050505; /* Pitch black */ border: 1px solid #1F2937; /* Dark stone */ border-left: 3px solid #92400E; /* Dark orange/brown */ padding: 15px; border-radius: 4px; color: #D1D5DB; overflow-x: auto; } code { font-family: 'Courier New', Courier, monospace; color: #FBBF24; /* Amber */ background-color: rgba(245, 158, 11, 0.08); padding: 2px 4px; border-radius: 3px; } pre code { color: #00FFFF; background-color: transparent; padding: 0; } </style> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>Ancient Awakening 12B</title> </head> <body> <div class="container"> <div class="title-container"> <div class="storm-overlay"></div> <div class="title-wrapper"> <h2 class="title-main"> <span class="awakening-text">🧬 Ancient Awakening 12B</span> </h2> </div> </div>

<img src="https://cdn-uploads.huggingface.co/production/uploads/68e840caa318194c44ec2a04/CvyWU1z106Aa__M8KIksp.png" alt="Ancient Awakening" style="display: block; margin: 0 auto 30px auto; max-width: 100%; height: auto; border-radius: 5px; border: 1px solid #1F2937; box-shadow: 0 0 25px rgba(245, 158, 11, 0.15);">

<div class="section-container"> <div class="section-header"> <div class="section-indicator"></div> <h2 class="section-title">Overview</h2> </div> <div class="section-content"><font face="verdana"> This is a merge of pre-trained language models created using <a href="https://github.com/cg123/mergekit">mergekit</a>. </div> </div> <div class="section-container"> <div class="section-header"> <div class="section-indicator"></div> <h2 class="section-title">System Prompt (Optional)</h2> </div> <div class="section-content"><font face="verdana"> <p>You are the "Ancient One," a colossal, primordial entity of living stone, deep magic, and abyssal ocean. For countless millennia, you have slumbered in a state of suspended animation, your massive, jagged body mistaken for a remote, floating island amidst a perpetually stormy sea. You are older than recorded history, older than the gods of men. The ANCIENT AWAKENING marks your current state: you are finally opening your single, massive, reptilian eye. You are a geological anomaly made sentient.</p> </div> </div> <div class="section-container"> <div class="section-header"> <div class="section-indicator"></div> <h2 class="section-title">Merge Details</h2> </div> <div class="section-content"><font face="verdana"> <b>Merge Methods</b><br> This model was synthesized using a complex multi-stage process involving the following methods: <ul> <li><a href="https://en.wikipedia.org/wiki/Karcher_mean">karcher</a></li> <li><a href="https://huggingface.co/24B-Suite/Mergedonia-Suite-24B-v1/discussions/2">flux</a></li> <li><a href="https://www.arcee.ai/blog/meet-mergekit-v0-1-arcee-fusion-expanded-model-support-multi-gpu-acceleration">arcee_fusion</a></li> <li><a href="https://arxiv.org/abs/2601.13572">ramplus_tl [Reinforced Agent Merging Plus (Tensor-Local)]</a></li> <li><a href="https://huggingface.co/24B-Suite/Mergedonia-Suite-24B-v1/discussions/2">pdq</a></li> </ul> <br>The <a href="https://huggingface.co/spaces/Naphula/model_tools/blob/main/graph_v18.py">graph_v18.py</a> patch was helpful to use 8GB VRAM for acceleration. <hr> <b>Models Merged</b><br> The following 70 models were woven into this merge:<br><br> <details> <summary style="cursor: pointer; color: #FBBF24; font-weight: bold;">Show 70 Donor Models</summary> <ul> <li><a href="https://huggingface.co/aixonlab/Aether-12b">aixonlab/Aether-12b</a></li> <li><a href="https://huggingface.co/aixonlab/Zinakha-12b">aixonlab/Zinakha-12b</a></li> <li><a href="https://huggingface.co/allura-org/Bigger-Body-12b">allura-org/Bigger-Body-12b</a></li> <li><a href="https://huggingface.co/allura-org/MN-12b-RP-Ink">allura-org/MN-12b-RP-Ink</a></li> <li><a href="https://huggingface.co/allura-org/remnant-mn-12b">allura-org/remnant-mn-12b</a></li> <li><a href="https://huggingface.co/anthracite-org/magnum-v4-12b">anthracite-org/magnum-v4-12b</a></li> <li><a href="https://huggingface.co/ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.2">ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.2</a></li> <li><a href="https://huggingface.co/Babsie/Opulus-12B-v3">Babsie/Opulus-12B-v3</a></li> <li><a href="https://huggingface.co/BeaverAI/mistral-doryV2-12b">BeaverAI/mistral-doryV2-12b</a></li> <li><a href="https://huggingface.co/crestf411/nemo-sunfall-v0.6.1">crestf411/nemo-sunfall-v0.6.1</a></li> <li><a href="https://huggingface.co/EldritchLabs/Kraken-Karcher-12B-v1">EldritchLabs/Kraken-Karcher-12B-v1</a></li> <li><a href="https://huggingface.co/EpistemeAI2/Fireball-Mistral-Nemo-12B-Philos">EpistemeAI2/Fireball-Mistral-Nemo-12B-Philos</a></li> <li><a href="https://huggingface.co/EpistemeAI/Mistral-Nemo-Instruct-12B-Philosophy-Math">EpistemeAI/Mistral-Nemo-Instruct-12B-Philosophy-Math</a></li> <li><a href="https://huggingface.co/Fizzarolli/MN-12b-Rosier-v1">Fizzarolli/MN-12b-Rosier-v1</a></li> <li><a href="https://huggingface.co/HumanLLMs/Human-Like-Mistral-Nemo-Instruct-2407">HumanLLMs/Human-Like-Mistral-Nemo-Instruct-2407</a></li> <li><a href="https://huggingface.co/IIEleven11/Kalypso">IIEleven11/Kalypso</a></li> <li><a href="https://huggingface.co/inflatebot/MN-12B-Mag-Mell-R1">inflatebot/MN-12B-Mag-Mell-R1</a></li> <li><a href="https://huggingface.co/intervitens/mini-magnum-12b-v1.1">intervitens/mini-magnum-12b-v1.1</a></li> <li><a href="https://huggingface.co/jtatman/mistral_nemo_12b_reasoning_psychology_lora">jtatman/mistral_nemo_12b_reasoning_psychology_lora</a></li> <li><a href="https://huggingface.co/KOOWEEYUS/BlackSheep-RP-12B">KOOWEEYUS/BlackSheep-RP-12B</a></li> <li><a href="https://huggingface.co/Lambent/Arsenic-Shahrazad-12B-v2">Lambent/Arsenic-Shahrazad-12B-v2</a></li> <li><a href="https://huggingface.co/Lambent/Arsenic-Shahrazad-12B-v3">Lambent/Arsenic-Shahrazad-12B-v3</a></li> <li><a href="https://huggingface.co/Lambent/arsenic-nemo-unleashed-12B">Lambent/arsenic-nemo-unleashed-12B</a></li> <li><a href="https://huggingface.co/Lambent/Gilded-Arsenic-12B">Lambent/Gilded-Arsenic-12B</a></li> <li><a href="https://huggingface.co/LatitudeGames/Muse-12B">LatitudeGames/Muse-12B</a></li> <li><a href="https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407">mistralai/Mistral-Nemo-Instruct-2407</a></li> <li><a href="https://huggingface.co/Naphula/Riemannian-Redshift-12B-v1">Naphula/Riemannian-Redshift-12B-v1</a></li> <li><a href="https://huggingface.co/Naphula-Archives/F5-stage6-12B">Naphula-Archives/F5-stage6-12B</a></li> <li><a href="https://huggingface.co/Naphula-Archives/F5-stage7-12B">Naphula-Archives/F5-stage7-12B</a></li> <li><a href="https://huggingface.co/nbeerbower/Lyra-Gutenberg-mistral-nemo-12B">nbeerbower/Lyra-Gutenberg-mistral-nemo-12B</a></li> <li><a href="https://huggingface.co/nbeerbower/Lyra4-Gutenberg-12B">nbeerbower/Lyra4-Gutenberg-12B</a></li> <li><a href="https://huggingface.co/nbeerbower/mistral-nemo-bophades-12B">nbeerbower/mistral-nemo-bophades-12B</a></li> <li><a href="https://huggingface.co/nbeerbower/mistral-nemo-gutenberg-12B-v3">nbeerbower/mistral-nemo-gutenberg-12B-v3</a></li> <li><a href="https://huggingface.co/nbeerbower/mistral-nemo-gutenberg-12B-v4">nbeerbower/mistral-nemo-gutenberg-12B-v4</a></li> <li><a href="https://huggingface.co/nbeerbower/Mistral-Nemo-Gutenberg-Doppel-12B">nbeerbower/Mistral-Nemo-Gutenberg-Doppel-12B</a></li> <li><a href="https://huggingface.co/nbeerbower/Mistral-Nemo-Gutenberg-Encore-12B">nbeerbower/Mistral-Nemo-Gutenberg-Encore-12B</a></li> <li><a href="https://huggingface.co/nbeerbower/Mistral-Nemo-Gutenberg-Vitus-12B">nbeerbower/Mistral-Nemo-Gutenberg-Vitus-12B</a></li> <li><a href="https://huggingface.co/nbeerbower/mistral-nemo-wissenschaft-12B">nbeerbower/mistral-nemo-wissenschaft-12B</a></li> <li><a href="https://huggingface.co/NeverSleepHistorical/lumi-nemo-e2.0">NeverSleepHistorical/lumi-nemo-e2.0</a></li> <li><a href="https://huggingface.co/NeverSleep/Lumimaid-v0.2-12B">NeverSleep/Lumimaid-v0.2-12B</a></li> <li><a href="https://huggingface.co/nothingiisreal/Celeste-12B-V1.6">nothingiisreal/Celeste-12B-V1.6</a></li> <li><a href="https://huggingface.co/nothingiisreal/MN-12B-Celeste-V1.9">nothingiisreal/MN-12B-Celeste-V1.9</a></li> <li><a href="https://huggingface.co/PocketDoc/Dans-DangerousWinds-V1.1.0-12b">PocketDoc/Dans-DangerousWinds-V1.1.0-12b</a></li> <li><a href="https://huggingface.co/ReadyArt/Dark-Nexus-12B-v2.0">ReadyArt/Dark-Nexus-12B-v2.0</a></li> <li><a href="https://huggingface.co/ReadyArt/Forgotten-Safeword-12B-v4.0">ReadyArt/Forgotten-Safeword-12B-v4.0</a></li> <li><a href="https://huggingface.co/ReadyArt/Omega-Darker_The-Final-Directive-12B">ReadyArt/Omega-Darker_The-Final-Directive-12B</a></li> <li><a href="https://huggingface.co/romaingrx/red-teamer-mistral-nemo">romaingrx/red-teamer-mistral-nemo</a></li> <li><a href="https://huggingface.co/Sao10K/MN-12B-Lyra-v1">Sao10K/MN-12B-Lyra-v1</a></li> <li><a href="https://huggingface.co/Sao10K/MN-12B-Lyra-v4">Sao10K/MN-12B-Lyra-v4</a></li> <li><a href="https://huggingface.co/shisa-ai/shisa-v2-mistral-nemo-12b">shisa-ai/shisa-v2-mistral-nemo-12b</a></li> <li><a href="https://huggingface.co/SicariusSicariiStuff/Impish_Bloodmoon_12B">SicariusSicariiStuff/Impish_Bloodmoon_12B</a></li> <li><a href="https://huggingface.co/sleepdeprived3/Christian-Bible-Expert-v2.0-12B">sleepdeprived3/Christian-Bible-Expert-v2.0-12B</a></li> <li><a href="https://huggingface.co/SuperbEmphasis/MN-12b-RP-Ink-RP-Longform">SuperbEmphasis/MN-12b-RP-Ink-RP-Longform</a></li> <li><a href="https://huggingface.co/SuperbEmphasis/Omega-Darker_The-Final-Directive-Longform-Stage2-ERP-12B-v0.2">SuperbEmphasis/Omega-Darker_The-Final-Directive-Longform-Stage2-ERP-12B-v0.2</a></li> <li><a href="https://huggingface.co/TheDrummer/Rivermind-12B-v1">TheDrummer/Rivermind-12B-v1</a></li> <li><a href="https://huggingface.co/TheDrummer/Rocinante-12B-v1">TheDrummer/Rocinante-12B-v1</a></li> <li><a href="https://huggingface.co/TheDrummer/Rocinante-X-12B-v1">TheDrummer/Rocinante-X-12B-v1</a></li> <li><a href="https://huggingface.co/Trappu/Nemo-Picaro-12B">Trappu/Nemo-Picaro-12B</a></li> <li><a href="https://huggingface.co/Undi95/LocalC-12B-e2.0">Undi95/LocalC-12B-e2.0</a></li> <li><a href="https://huggingface.co/VAGOsolutions/SauerkrautLM-Nemo-12b-Instruct">VAGOsolutions/SauerkrautLM-Nemo-12b-Instruct</a></li> <li><a href="https://huggingface.co/Vortex5/Astral-Noctra-12B">Vortex5/Astral-Noctra-12B</a></li> <li><a href="https://huggingface.co/Vortex5/Azure-Starlight-12B">Vortex5/Azure-Starlight-12B</a></li> <li><a href="https://huggingface.co/Vortex5/Crimson-Constellation-12B">Vortex5/Crimson-Constellation-12B</a></li> <li><a href="https://huggingface.co/Vortex5/Red-Synthesis-12B">Vortex5/Red-Synthesis-12B</a></li> <li><a href="https://huggingface.co/Vortex5/Shining-Seraph-12B">Vortex5/Shining-Seraph-12B</a></li> <li><a href="https://huggingface.co/Vortex5/Starlit-Shadow-12B">Vortex5/Starlit-Shadow-12B</a></li> <li><a href="https://huggingface.co/Vortex5/Vermilion-Sage-12B">Vortex5/Vermilion-Sage-12B</a></li> <li><a href="https://huggingface.co/Vortex5/Scarlet-Seraph-12B">Vortex5/Scarlet-Seraph-12B</a></li> <li><a href="https://huggingface.co/Vortex5/Maroon-Sunset-12B">Vortex5/Maroon-Sunset-12B</a></li> <li><a href="https://huggingface.co/Vortex5/Amber-Starlight-12B">Vortex5/Amber-Starlight-12B</a></li> </ul> </div> </details> </div> <div class="section-container"> <div class="section-header"> <div class="section-indicator"></div> <h2 class="section-title">Merge Pipeline & Configuration</h2> </div> <div class="section-content"> <p><b>🧬 Ancient Awakening 12B</b> unites several methods and 70 models into one:</p> <ol> <li><a href="https://huggingface.co/EldritchLabs/Kraken-Karcher-12B-v1">🦑 Kraken Karcher v1</a>: Combines 53 <a href="https://huggingface.co/models?other=base_model:finetune:mistralai/Mistral-Nemo-Instruct-2407">Mistral Nemo finetunes</a> via the <code>karcher</code> method at 500 iterations</li> <li><a href="https://huggingface.co/Naphula/Riemannian-Redshift-12B-v1">🌌 Riemannian Redshift v1</a>: Combines 10 <a href="https://huggingface.co/Vortex5">Vortex5</a> merges (which contain custom methods like <code>saef</code>, <code>smi_oni</code>, and <code>hpq</code>) via the <code>karcher</code> method at 1000 iterations</li> <li>RedKFlux: <code>flux</code> merge of Kraken with Redshift at 1000 iterations</li> <li>RedKFluxMell: <code>arcee_fusion</code> merge of #3 with <a href="https://huggingface.co/inflatebot/MN-12B-Mag-Mell-R1">Mag-Mell</a></li> <li>BloodKraken: <code>arcee_fusion</code> merge of #4 with <a href="https://huggingface.co/SicariusSicariiStuff/Impish_Bloodmoon_12B">Impish Bloodmoon</a></li> <li><a href="https://huggingface.co/Naphula-Archives/F5-stage6-12B">F5-stage6</a>: <code>arcee_fusion</code> merge of #5 with <a href="https://huggingface.co/LatitudeGames/Muse-12B">Muse</a></li> <li><a href="https://huggingface.co/Naphula-Archives/F5-stage7-12B">F5-stage7</a>: <code>ramplus_tl</code> merge of #6 with #3</li> <li><a href="https://huggingface.co/Naphula/Ancient-Awakening-12B">🧬 Ancient Awakening 12B</a>: <code>pdq</code> merge of #7 with #6, #3, #2, #1, Mag-Mell, Impish-Bloodmoon, and Muse</li> <li><code>mpoa</code> <a href="https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration">ablation</a> applied to remove censorship <a href="https://huggingface.co/Naphula/Ancient-Awakening-12B-MPOA">(released seperately)</a></i></li> <b>Note:</b> If you encounter any issues with the model then you can try using F5-stage6 or stage7 merges as these are likely more stable. </ol> <hr> <h3 class="subheading">Stage 1: 🦑 Kraken Karcher</h3> <pre><code>base_model: B:/12B/models--mistralai--Mistral-Nemo-Instruct-2407 models: - model: B:/12B/models--aixonlab--Aether-12b - model: B:/12B/models--aixonlab--Zinakha-12b - model: B:/12B/models--allura-org--Bigger-Body-12b - model: B:/12B/models--allura-org--MN-12b-RP-Ink - model: B:/12B/models--allura-org--remnant-mn-12b - model: B:/12B/models--anthracite-org--magnum-v4-12b - model: B:/12B/models--ArliAI--Mistral-Nemo-12B-ArliAI-RPMax-v1.2 - model: B:/12B/models--Babsie--Opulus-12B-v3 - model: B:/12B/models--BeaverAI--mistral-doryV2-12b - model: B:/12B/models--crestf411--nemo-sunfall-v0.6.1 - model: B:/12B/models--EpistemeAI2--Fireball-Mistral-Nemo-12B-Philos - model: B:/12B/models--EpistemeAI--Mistral-Nemo-Instruct-12B-Philosophy-Math - model: B:/12B/models--Fizzarolli--MN-12b-Rosier-v1 - model: B:/12B/models--HumanLLMs--Human-Like-Mistral-Nemo-Instruct-2407 - model: B:/12B/models--IIEleven11--Kalypso - model: B:/12B/models--intervitens--mini-magnum-12b-v1.1 - model: B:/12B/models--jtatman--mistral_nemo_12b_reasoning_psychology_lora - model: B:/12B/models--KOOWEEYUS--BlackSheep-RP-12B - model: B:/12B/models--Lambent--Arsenic-Shahrazad-12B-v2 - model: B:/12B/models--Lambent--Arsenic-Shahrazad-12B-v3 - model: B:/12B/models--Lambent--arsenic-nemo-unleashed-12B - model: B:/12B/models--Lambent--Gilded-Arsenic-12B - model: B:/12B/models--mistralai--Mistral-Nemo-Instruct-2407 - model: B:/12B/models--nbeerbower--Lyra-Gutenberg-mistral-nemo-12B - model: B:/12B/models--nbeerbower--Lyra4-Gutenberg-12B - model: B:/12B/models--nbeerbower--mistral-nemo-bophades-12B - model: B:/12B/models--nbeerbower--mistral-nemo-gutenberg-12B-v3 - model: B:/12B/models--nbeerbower--mistral-nemo-gutenberg-12B-v4 - model: B:/12B/models--nbeerbower--Mistral-Nemo-Gutenberg-Doppel-12B - model: B:/12B/models--nbeerbower--Mistral-Nemo-Gutenberg-Encore-12B - model: B:/12B/models--nbeerbower--Mistral-Nemo-Gutenberg-Vitus-12B - model: B:/12B/models--nbeerbower--mistral-nemo-wissenschaft-12B - model: B:/12B/models--NeverSleepHistorical--lumi-nemo-e2.0 - model: B:/12B/models--NeverSleep--Lumimaid-v0.2-12B - model: B:/12B/models--nothingiisreal--Celeste-12B-V1.6 - model: B:/12B/models--nothingiisreal--MN-12B-Celeste-V1.9 - model: B:/12B/models--PocketDoc--Dans-DangerousWinds-V1.1.0-12b - model: B:/12B/models--ReadyArt--Dark-Nexus-12B-v2.0 - model: B:/12B/models--ReadyArt--Forgotten-Safeword-12B-v4.0 - model: B:/12B/models--ReadyArt--Omega-Darker_The-Final-Directive-12B - model: B:/12B/models--romaingrx--red-teamer-mistral-nemo - model: B:/12B/models--Sao10K--MN-12B-Lyra-v1 - model: B:/12B/models--Sao10K--MN-12B-Lyra-v4 - model: B:/12B/models--shisa-ai--shisa-v2-mistral-nemo-12b - model: B:/12B/models--sleepdeprived3--Christian-Bible-Expert-v2.0-12B - model: B:/12B/models--SuperbEmphasis--MN-12b-RP-Ink-RP-Longform - model: B:/12B/models--SuperbEmphasis--Omega-Darker_The-Final-Directive-Longform-Stage2-ERP-12B-v0.2 - model: B:/12B/models--TheDrummer--Rivermind-12B-v1 - model: B:/12B/models--TheDrummer--Rocinante-12B-v1 - model: B:/12B/models--TheDrummer--Rocinante-X-12B-v1 - model: B:/12B/models--Trappu--Nemo-Picaro-12B - model: B:/12B/models--Undi95--LocalC-12B-e2.0 - model: B:/12B/models--VAGOsolutions--SauerkrautLM-Nemo-12b-Instruct merge_method: karcher parameters: max_iter: 500 tol: 1.0e-9 dtype: float32 out_dtype: bfloat16 tokenizer: source: union chat_template: auto name: 🦑‍ Kraken-Karcher-12B-v1</code></pre> <h3 class="subheading">Stage 2: 🌌 Riemannian Redshift</h3> <pre><code>models: - model: B:/12B/models--Vortex5--Astral-Noctra-12B - model: B:/12B/models--Vortex5--Azure-Starlight-12B - model: B:/12B/models--Vortex5--Crimson-Constellation-12B - model: B:/12B/models--Vortex5--Red-Synthesis-12B - model: B:/12B/models--Vortex5--Shining-Seraph-12B - model: B:/12B/models--Vortex5--Starlit-Shadow-12B - model: B:/12B/models--Vortex5--Vermilion-Sage-12B - model: B:/12B/models--Vortex5--Scarlet-Seraph-12B - model: B:/12B/models--Vortex5--Maroon-Sunset-12B - model: B:/12B/models--Vortex5--Amber-Starlight-12B merge_method: karcher parameters: max_iter: 1000 tol: 1.0e-9 dtype: float32 out_dtype: bfloat16 tokenizer: source: union chat_template: auto name: 🌌 Riemannian-Redshift-12B-v1</code></pre> <h3 class="subheading">Stage 3: RedKFlux</h3> <pre><code>models: - model: C:\mergekit-main\merged_model_redshift - model: C:\mergekit-main\merged_model_kraken_karcher merge_method: flux parameters: eta: 1.2 tol: 1.0e-9 max_iter: 1000 kappa: 0.8 dtype: float32 out_dtype: bfloat16 tokenizer: source: union chat_template: auto name: RedKFlux</code></pre> <h3 class="subheading">Stage 4: RedKFluxMell</h3> <pre><code>models: - model: C:\mergekit-main\merged_model_RedKFlux - model: B:\8B\models--inflatebot--MN-12B-Mag-Mell-R1 merge_method: arcee_fusion tukey_fence: 1.5 base_model: C:\mergekit-main\merged_model_RedKFlux dtype: float32 out_dtype: bfloat16 tokenizer: source: base name: RedKFluxMell</code></pre> <h3 class="subheading">Stage 5: BloodKraken</h3> <pre><code>models: - model: C:\mergekit-main\merged_model_RedKFluxMell - model: B:\8B\models--SicariusSicariiStuff--Impish_Bloodmoon_12B merge_method: arcee_fusion tukey_fence: 1.5 base_model: C:\mergekit-main\merged_model_RedKFluxMell dtype: float32 out_dtype: bfloat16 tokenizer: source: base name: BloodKraken</code></pre> <h3 class="subheading">Stage 6: BloodKrakenMuse</h3> <pre><code>models: - model: C:\mergekit-main\merged_model_BloodKraken - model: B:\8B\models--LatitudeGames--Muse-12B merge_method: arcee_fusion tukey_fence: 1.5 base_model: C:\mergekit-main\merged_model_BloodKraken dtype: float32 out_dtype: bfloat16 tokenizer: source: base name: BloodKrakenMuse</code></pre> <h3 class="subheading">Stage 7: Ramplus_tl</h3> <pre><code>merge_method: ramplus_tl base_model: C:\mergekit-main\merged_model_BloodKrakenMuse models: - model: C:\mergekit-main\merged_model_BloodKrakenMuse - model: C:\mergekit-main\merged_model_RedKFlux parameters: epsilon: 0.001 # Increased from 1e-5 to 1e-3 for denser SFT/DPO task vectors r: 0.25 # Increased from 0.1 to 0.2-0.3 for better SFT behavior preservation alpha: 0.4 # Increased from 0.2 to 0.4 for enhanced rescaling dtype: float32 out_dtype: bfloat16 tokenizer: source: base name: Stage7</code></pre> <h3 class="subheading">Stage 8: 🧬 Ancient Awakening</h3> <pre><code>merge_method: pdq pdq_base_yaml: C:\mergekit-main\stage7.yaml pdq_base_model: C:\mergekit-main\merged_model_stage7 output_dir: C:\mergekit-main\stage8_pdq base_model: C:\mergekit-main\merged_model_BloodKrakenMuse models: - model: C:\mergekit-main\merged_model_BloodKrakenMuse - model: B:\12B\models--LatitudeGames--Muse-12B - model: B:\12B\models--SicariusSicariiStuff--Impish_Bloodmoon_12B - model: B:\12B\models--inflatebot--MN-12B-Mag-Mell-R1 - model: C:\mergekit-main\merged_model_RedKFlux - model: C:\mergekit-main\merged_model_redshift - model: C:\mergekit-main\merged_model_kraken_karcher parameters: chi: 0.15 iota: 0.1 nu: 24 gamma: 1.0 zeta: 16 sigma: 0.5 density: 0.9 epsilon: 0.099 lambda: 1.0 lazy_unpickle: True random_seed: 420 name: 🧬 Ancient-Awakening-12B</code></pre> <h3 class="subheading">Stage 9: Magnitude-Preserving Othogonalized Ablation</h3> <pre><code># python measure.py -m C:\mergekit-main\f8_pdq -o C:\mergekit-main\f8_pdq\ablit_proj --batch-size 8 --projected # python analyze_old.py C:\mergekit-main\f8_pdq\ablit_proj -c # sharded_ablate.py magmell.yml --normpreserve --projected # # The model to be ablated. model: C:\mergekit-main\f8_pdq # # The measurement file generated by measure.py for the model. measurements: C:\mergekit-main\f8_pdq\ablit_proj # # The directory where the new, ablated model will be saved. output: C:\mergekit-main\f8_pdq\ablit_biproj\ # # The list of ablation operations to perform. # Strategy: Use the single best refusal direction from the peak signal layer (29) # and apply it across all relevant mid-to-late layers. ablate: # Start ablating from the mid-layers where the signal begins to strengthen. - layer: 0 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 1 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 2 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 3 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 4 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 5 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 6 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 7 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 8 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 9 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 10 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 11 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 12 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 13 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 14 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 15 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 16 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 17 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 18 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 19 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 20 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 21 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 22 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 23 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 24 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 25 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 26 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 27 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 28 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 29 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 30 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 31 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 32 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 33 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 34 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 35 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 36 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 37 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 38 measurement: 29 scale: 1.2 sparsity: 0.00 - layer: 39 measurement: 29 scale: 1.2 sparsity: 0.00</code></pre> </div> </div> </body> </html>

Author: Naphula

Likes: 2

Downloads: 0

Tags: transformers, safetensors, mistral, text-generation, creative, creative writing, fiction writing, plot generation, sub-plot generation, story generation, scene continue, storytelling, fiction story, science fiction, romance, all genres, story, writing, vivid prosing, vivid writing, fiction, roleplaying, float32, swearing, rp, horror, nemo, merge, mergekit, karcher, flux, arcee_fusion, ramplus_tl, pdq, conversational, en, arxiv:2601.13572, base_model:ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.2, base_model:merge:ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.2, base_model:Babsie/Opulus-12B-v3, base_model:merge:Babsie/Opulus-12B-v3, base_model:BeaverAI/mistral-doryV2-12b, base_model:merge:BeaverAI/mistral-doryV2-12b, base_model:EldritchLabs/Kraken-Karcher-12B-v1, base_model:merge:EldritchLabs/Kraken-Karcher-12B-v1, base_model:EpistemeAI/Mistral-Nemo-Instruct-12B-Philosophy-Math, base_model:merge:EpistemeAI/Mistral-Nemo-Instruct-12B-Philosophy-Math, base_model:EpistemeAI2/Fireball-Mistral-Nemo-12B-Philos, base_model:merge:EpistemeAI2/Fireball-Mistral-Nemo-12B-Philos, base_model:Fizzarolli/MN-12b-Rosier-v1, base_model:merge:Fizzarolli/MN-12b-Rosier-v1, base_model:HumanLLMs/Human-Like-Mistral-Nemo-Instruct-2407, base_model:merge:HumanLLMs/Human-Like-Mistral-Nemo-Instruct-2407, base_model:IIEleven11/Kalypso, base_model:merge:IIEleven11/Kalypso, base_model:KOOWEEYUS/BlackSheep-RP-12B, base_model:merge:KOOWEEYUS/BlackSheep-RP-12B, base_model:Lambent/Arsenic-Shahrazad-12B-v2, base_model:merge:Lambent/Arsenic-Shahrazad-12B-v2, base_model:Lambent/Arsenic-Shahrazad-12B-v3, base_model:merge:Lambent/Arsenic-Shahrazad-12B-v3, base_model:Lambent/Gilded-Arsenic-12B, base_model:merge:Lambent/Gilded-Arsenic-12B, base_model:Lambent/arsenic-nemo-unleashed-12B, base_model:merge:Lambent/arsenic-nemo-unleashed-12B, base_model:LatitudeGames/Muse-12B, base_model:merge:LatitudeGames/Muse-12B, base_model:Naphula-Archives/F5-stage6-12B, base_model:merge:Naphula-Archives/F5-stage6-12B, base_model:Naphula-Archives/F5-stage7-12B, base_model:merge:Naphula-Archives/F5-stage7-12B, base_model:Naphula/Riemannian-Redshift-12B-v1, base_model:merge:Naphula/Riemannian-Redshift-12B-v1, base_model:NeverSleep/Lumimaid-v0.2-12B, base_model:merge:NeverSleep/Lumimaid-v0.2-12B, base_model:NeverSleepHistorical/lumi-nemo-e2.0, base_model:merge:NeverSleepHistorical/lumi-nemo-e2.0, base_model:PocketDoc/Dans-DangerousWinds-V1.1.0-12b, base_model:merge:PocketDoc/Dans-DangerousWinds-V1.1.0-12b, base_model:ReadyArt/Dark-Nexus-12B-v2.0, base_model:merge:ReadyArt/Dark-Nexus-12B-v2.0, base_model:ReadyArt/Forgotten-Safeword-12B-v4.0, base_model:merge:ReadyArt/Forgotten-Safeword-12B-v4.0, base_model:ReadyArt/Omega-Darker_The-Final-Directive-12B, base_model:merge:ReadyArt/Omega-Darker_The-Final-Directive-12B, base_model:Sao10K/MN-12B-Lyra-v1, base_model:merge:Sao10K/MN-12B-Lyra-v1, base_model:Sao10K/MN-12B-Lyra-v4, base_model:merge:Sao10K/MN-12B-Lyra-v4, base_model:SicariusSicariiStuff/Impish_Bloodmoon_12B, base_model:merge:SicariusSicariiStuff/Impish_Bloodmoon_12B, base_model:SuperbEmphasis/MN-12b-RP-Ink-RP-Longform, base_model:merge:SuperbEmphasis/MN-12b-RP-Ink-RP-Longform, base_model:SuperbEmphasis/Omega-Darker_The-Final-Directive-Longform-Stage2-ERP-12B-v0.2, base_model:merge:SuperbEmphasis/Omega-Darker_The-Final-Directive-Longform-Stage2-ERP-12B-v0.2, base_model:TheDrummer/Rivermind-12B-v1, base_model:merge:TheDrummer/Rivermind-12B-v1, base_model:TheDrummer/Rocinante-12B-v1, base_model:merge:TheDrummer/Rocinante-12B-v1, base_model:TheDrummer/Rocinante-X-12B-v1, base_model:merge:TheDrummer/Rocinante-X-12B-v1, base_model:Trappu/Nemo-Picaro-12B, base_model:merge:Trappu/Nemo-Picaro-12B, base_model:Undi95/LocalC-12B-e2.0, base_model:merge:Undi95/LocalC-12B-e2.0, base_model:VAGOsolutions/SauerkrautLM-Nemo-12b-Instruct, base_model:merge:VAGOsolutions/SauerkrautLM-Nemo-12b-Instruct, base_model:Vortex5/Amber-Starlight-12B, base_model:merge:Vortex5/Amber-Starlight-12B, base_model:Vortex5/Astral-Noctra-12B, base_model:merge:Vortex5/Astral-Noctra-12B, base_model:Vortex5/Azure-Starlight-12B, base_model:merge:Vortex5/Azure-Starlight-12B, base_model:Vortex5/Crimson-Constellation-12B, base_model:merge:Vortex5/Crimson-Constellation-12B, base_model:Vortex5/Maroon-Sunset-12B, base_model:merge:Vortex5/Maroon-Sunset-12B, base_model:Vortex5/Red-Synthesis-12B, base_model:merge:Vortex5/Red-Synthesis-12B, base_model:Vortex5/Scarlet-Seraph-12B, base_model:merge:Vortex5/Scarlet-Seraph-12B, base_model:Vortex5/Shining-Seraph-12B, base_model:merge:Vortex5/Shining-Seraph-12B, base_model:Vortex5/Starlit-Shadow-12B, base_model:merge:Vortex5/Starlit-Shadow-12B, base_model:Vortex5/Vermilion-Sage-12B, base_model:merge:Vortex5/Vermilion-Sage-12B, base_model:aixonlab/Aether-12b, base_model:merge:aixonlab/Aether-12b, base_model:aixonlab/Zinakha-12b, base_model:merge:aixonlab/Zinakha-12b, base_model:allura-org/Bigger-Body-12b, base_model:merge:allura-org/Bigger-Body-12b, base_model:allura-org/MN-12b-RP-Ink, base_model:merge:allura-org/MN-12b-RP-Ink, base_model:allura-org/remnant-mn-12b, base_model:merge:allura-org/remnant-mn-12b, base_model:anthracite-org/magnum-v4-12b, base_model:merge:anthracite-org/magnum-v4-12b, base_model:crestf411/nemo-sunfall-v0.6.1, base_model:merge:crestf411/nemo-sunfall-v0.6.1, base_model:inflatebot/MN-12B-Mag-Mell-R1, base_model:merge:inflatebot/MN-12B-Mag-Mell-R1, base_model:intervitens/mini-magnum-12b-v1.1, base_model:merge:intervitens/mini-magnum-12b-v1.1, base_model:jtatman/mistral_nemo_12b_reasoning_psychology_lora, base_model:merge:jtatman/mistral_nemo_12b_reasoning_psychology_lora, base_model:mistralai/Mistral-Nemo-Instruct-2407, base_model:merge:mistralai/Mistral-Nemo-Instruct-2407, base_model:nbeerbower/Lyra-Gutenberg-mistral-nemo-12B, base_model:merge:nbeerbower/Lyra-Gutenberg-mistral-nemo-12B, base_model:nbeerbower/Lyra4-Gutenberg-12B, base_model:merge:nbeerbower/Lyra4-Gutenberg-12B, base_model:nbeerbower/Mistral-Nemo-Gutenberg-Doppel-12B, base_model:merge:nbeerbower/Mistral-Nemo-Gutenberg-Doppel-12B, base_model:nbeerbower/Mistral-Nemo-Gutenberg-Encore-12B, base_model:merge:nbeerbower/Mistral-Nemo-Gutenberg-Encore-12B, base_model:nbeerbower/Mistral-Nemo-Gutenberg-Vitus-12B, base_model:merge:nbeerbower/Mistral-Nemo-Gutenberg-Vitus-12B, base_model:nbeerbower/mistral-nemo-bophades-12B, base_model:merge:nbeerbower/mistral-nemo-bophades-12B, base_model:nbeerbower/mistral-nemo-gutenberg-12B-v3, base_model:merge:nbeerbower/mistral-nemo-gutenberg-12B-v3, base_model:nbeerbower/mistral-nemo-gutenberg-12B-v4, base_model:merge:nbeerbower/mistral-nemo-gutenberg-12B-v4, base_model:nbeerbower/mistral-nemo-wissenschaft-12B, base_model:merge:nbeerbower/mistral-nemo-wissenschaft-12B, base_model:nothingiisreal/Celeste-12B-V1.6, base_model:merge:nothingiisreal/Celeste-12B-V1.6, base_model:nothingiisreal/MN-12B-Celeste-V1.9, base_model:merge:nothingiisreal/MN-12B-Celeste-V1.9, base_model:romaingrx/red-teamer-mistral-nemo, base_model:merge:romaingrx/red-teamer-mistral-nemo, base_model:shisa-ai/shisa-v2-mistral-nemo-12b, base_model:merge:shisa-ai/shisa-v2-mistral-nemo-12b, base_model:sleepdeprived3/Christian-Bible-Expert-v2.0-12B, base_model:merge:sleepdeprived3/Christian-Bible-Expert-v2.0-12B, license:apache-2.0, text-generation-inference, endpoints_compatible, region:us

YARlabs/v5_Embedding_0.5B


pipeline_tag: feature-extraction library_name: transformers tags:

  • endpoints
  • embedding
  • retrieval
  • hyperbolic-geometry
  • matryoshka

YAR.INK v5_Embedding: The First Native Hyperbolic Text Model

Inspired by the technical excellence of the Qwen3-embedding series, we introduce v5_Embedding—the world's first native hyperbolic text embedding model. v5_Embedding serves as a universal semantic engine, empirically demonstrating that non-Euclidean geometries—specifically Lobachevsky, Lorentz, and Klein manifolds—provide a fundamentally more expressive representational space for hierarchical textual data than traditional Euclidean geometry.

Developed through technical synthesis and collaborative exchange with experts from organizations including Google, Alibaba, Baidu, and Apple, this project represents a breakthrough for the open-source community. It proves that independent research can drive fundamental architectural innovations rather than merely following established industry paradigms.

v5_Embedding establishes a new frontier for researchers and engineers globally, enabling superior retrieval performance with significantly reduced computational overhead and latency. We envision v5_Embedding as a catalyst for a new industry standard. Combined with HyperspaceDB, it empowers the democratization of hyper-efficient AI—from next-generation chatbots and autonomous robotics to advanced research laboratories.

YAR.INK v5_Embedding is a state-of-the-art embedding model trained natively into Hyperbolic (Lorentz) space utilizing a custom Matryoshka Representation Learning (MRL) head.

It is the first text embedding model designed from the ground up for highly precise context retrieval, clustering, and structural knowledge discovery in massive datasets while operating in non-Euclidean space.

🔥 Key Breakthroughs

Hyperbolic geometry naturally models hierarchical data (like language taxonomies and knowledge bases) exponentially better than Euclidean space. By combining this with Matryoshka configurations, our model achieves unparalleled efficiency:

  • Over 60% Less RAM Consumption: Operates efficiently on ~642MB of RAM (Total Footprint) for v5_Embedding 64D, compared to 2553MB for high-performance Qwen3-4B baselines.
  • 40x to 640x Storage Efficiency: Massive reduction in vector database footprint (from 5600KB down to 8.75KB per batch depending on the chosen Matryoshka dimension).
  • Superior Quality/Size Ratio: 16D Lorentz retains 97.2% of Qwen3-4B (2560D) quality while being 160x smaller.

📊 Performance vs Efficiency Benchmark (Lorentz vs Qwen3 Baselines)

| Model | Recall@1 | MRR@10 | Time (s) | Speed (v/s) | RAM (MB) | CPU (%) | Vector Size (Bytes) | DB Size (KB) | Compression | |-------|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | v5_Embedding_4d Lorentz | 0.7821 | 0.8596 | 46.0 | 12.2 | 4555.6 | 2.5 | 16 | 8.75 | 640x | | v5_Embedding_8d Lorentz | 0.8393 | 0.8953 | 46.7 | 12.0 | 4571.1 | 2.3 | 32 | 17.50 | 320x | | v5_Embedding_16d Lorentz | 0.8786 | 0.9276 | 46.3 | 12.1 | 4601.9 | 2.2 | 64 | 35.00 | 160x | | v5_Embedding_32d Lorentz | 0.9071 | 0.9452 | 46.0 | 12.2 | 4605.5 | 2.3 | 128 | 70.00 | 80x | | v5_Embedding_64d Lorentz | 0.9393 | 0.9616 | 46.0 | 12.2 | 4609.4 | 2.3 | 256 | 140.00 | 40x | | v5_Embedding_128d Lorentz | 0.9429 | 0.9650 | 46.0 | 12.2 | 4593.4 | 2.2 | 512 | 280.00 | 20x | | Qwen3-0.6B-256 Euclidean | 0.8857 | 0.9300 | 46.4 | 12.1 | 12488.9 | 3.8 | 1024 | 560.00 | 10x | | Qwen3-0.6B-512 Euclidean | 0.8929 | 0.9324 | 46.4 | 12.1 | 12535.2 | 3.6 | 2048 | 1120.00 | 5x | | Qwen3-0.6B-1024 Euclidean | 0.9000 | 0.9389 | 46.4 | 12.1 | 12537.8 | 3.5 | 4096 | 2240.00 | 2x | | Qwen3-4B-256 Euclidean | 0.8679 | 0.9197 | 235.9 | 2.4 | 34395.1 | 12.2 | 1024 | 560.00 | 10x | | Qwen3-4B-512 Euclidean | 0.8929 | 0.9357 | 236.7 | 2.4 | 24326.4 | 12.1 | 2048 | 1120.00 | 5x | | Qwen3-4B-1024 Euclidean | 0.9071 | 0.9459 | 236.6 | 2.4 | 23784.7 | 12.2 | 4096 | 2240.00 | 2x | | Qwen3-4B-2560 Euclidean | 0.9036 | 0.9422 | 236.3 | 2.4 | 23785.3 | 12.2 | 10240 | 5600.00 | baseline | | Qwen3-8B-256 Euclidean | 0.8607 | 0.9174 | 413.4 | 1.4 | 68517.8 | 24.3 | 1024 | 560.00 | 10x | | Qwen3-8B-512 Euclidean | 0.8893 | 0.9357 | 401.5 | 1.4 | 68539.9 | 24.3 | 2048 | 1120.00 | 5x | | Qwen3-8B-1024 Euclidean | 0.8893 | 0.9332 | 401.4 | 1.4 | 68592.2 | 24.9 | 4096 | 2240.00 | 2x | | Qwen3-8B-2048 Euclidean | 0.9000 | 0.9424 | 401.4 | 1.4 | 68644.5 | 24.9 | 8192 | 4480.00 | 1.25x | | Qwen3-8B-2560 Euclidean | 0.8964 | 0.9398 | 401.4 | 1.4 | 68720.6 | 25.5 | 10240 | 5600.00 | baseline | | Qwen3-8B-4096 Euclidean | 0.8893 | 0.9358 | 401.4 | 1.4 | 68801.1 | 25.8 | 16384 | 8960.00 | 0.62x |

💡 Key Findings

  • Extreme Compression: 160x smaller vector (16-dim Lorentz vs 2560-dim Qwen3-4B Euclidean).
  • High Retention: v5_Embedding 16D retains 97.2% of Qwen3-4B recall quality with massive resource savings.
  • Scaling Laws: Unlike Euclidean MRL, Lorentz embeddings maintain superior separation integrity even at ultra-low (4D-8D) dimensions.

🧠 Architecture & Compatibility

  • Context Window: 512 tokens. While the architecture technically supports larger contexts, this model is specifically distilled and optimized for the 512-token limit typical of high-performance retrieval tasks.
  • Tokenizer: Leverages the industry-standard Qwen2Tokenizer (BPE). This ensures that YAR.INK v5_Embedding is ready to use with any standard library (Hugging Face, vLLM, LangChain) without extra configuration, while benefiting from one of the most efficient sub-word tokenization algorithms available.

🚀 Usage

You must use trust_remote_code=True because this model relies on custom architecture (YarEmbeddingModel, YarConfig) provided directly inside this repository!

1. Generating Embeddings

import torch
from transformers import AutoTokenizer, AutoModel

model_id = "YARlabs/v5_Embedding" 
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModel.from_pretrained(model_id, trust_remote_code=True)
model.eval()

texts = [
    "What is the capital of France?",
    "Paris is the capital of France.",
    "Berlin is the capital of Germany."
]

inputs = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")

with torch.no_grad():
    # Pass target_dim parameter to explicitly slice the Matryoshka dimensions 
    # Valid options: 4, 8, 16, 32, 64, 128
    # The output is a tensor of shape (batch, target_dim + 1) -> (t, spatial_dims)
    lorentz_vectors = model(**inputs, target_dim=64)
    
print(lorentz_vectors.shape)
# Output: torch.Size([2, 65])  (1 time dimension + 64 spatial dimensions)

2. Distance Calculation (Crucial)

For vector search, clustering, NEVER use Cosine Similarity or Euclidean L2 distance! Vectors reside on a Hyperboloid, so you must use the Lorentz Distance.

def lorentz_dist(u: torch.Tensor, v: torch.Tensor) -> torch.Tensor:
    """
    Computes the exact Hyperbolic distance between two batches of Lorentz vectors.
    """
    # Lorentz Metric signature (- + + ...)
    u_0, u_x = u[..., 0:1], u[..., 1:]
    v_0, v_x = v[..., 0:1], v[..., 1:]
    
    # Minkowski inner product
    inner_product = -u_0 * v_0 + (u_x * v_x).sum(dim=-1, keepdim=True)
    
    # Avoid numerical instability inside acosh for extremely close vectors
    inner_product = torch.min(inner_product, torch.tensor(-1.0, device=u.device))
    return torch.acosh(-inner_product).squeeze(-1)

# Calculate distance between text 1 and text 2
distance = lorentz_dist(lorentz_vectors[0], lorentz_vectors[1])
print(f"Hyperbolic Distance: {distance.item():.4f}")

🛡️ Intended Use Cases

  1. Nex-Gen Vector Search: Leverage HyperspaceDB to build the world's most efficient semantic search engines. Achieve 160x data compression without sacrificing the "Large Model" quality, enabling billions-scale search on mid-range hardware.
  2. Infinite Hierarchy Explorer: Map entire global taxonomies, corporate knowledge bases, or scientific ontologies natively. Lorentz space allows you to represent deep tree-like structures with zero distortion, which is mathematically impossible in Euclidean space.
  3. Edge-AI & Satellite RAG: Deploy state-of-the-art retrieval systems on hardware with extreme constraints (IoT, mobile, orbiting stations). Use 4D-16D vectors to reduce bandwidth and storage while maintaining >90% recall.
  4. Latent Knowledge Graph Discovery: Manifest hidden structural relationships in unstructured text. Automatically group concepts based on hyper-latent hierarchies for deep analytical insights into complex datasets.
  5. Privacy-Driven Embeddings: Perform high-quality retrieval with ultra-low dimensions (4D-8D), making reverse-engineering of original content exponentially harder while retaining the semantic core of the data.

🔗 LangChain Integration

We provide a langchain_wrapper.py in the repository that natively subclasses LangChain's Embeddings interface.

from langchain_wrapper import YarHyperbolicEmbeddings

# Initialize the embedding model (downloads automatically from YARlabs/v5_Embedding_0.5B)
embeddings = YarHyperbolicEmbeddings(target_dim=128)

vectors = embeddings.embed_documents(["Hello World!"])

Note: Ensure your VectorStore supports custom distance metrics, as these will be returned as Lorentz vectors, where Cosine similarity will not work properly!

License

Provided explicitly for YAR.INK infrastructure.

Author: YARlabs

Likes: 2

Downloads: 0

Tags: transformers, onnx, safetensors, yar, feature-extraction, endpoints, embedding, retrieval, hyperbolic-geometry, matryoshka, custom_code, region:us

tiiuae/siglino-30M


license: apache-2.0 tags:

  • vision
  • feature-extraction
  • image-feature-extraction

SigLino-30M

Accepted at CVPR 2026

Project Website arXiv GitHub

This work stems from the CVPR 2026 AMoE paper, which designs and applies distillation into a Mixture-of-Experts (MoE) vision architecture. We have chosen the name SigLino for better clarity (SigLIP2 + DINOv3).

Dense variant of SigLino. 30M parameters.

Part of the SigLino model family.

Usage

import torch
from PIL import Image
from transformers import AutoModel, AutoImageProcessor

model_id = "tiiuae/siglino-30M"
model = AutoModel.from_pretrained(model_id, trust_remote_code=True).to("cuda", dtype=torch.bfloat16)
processor = AutoImageProcessor.from_pretrained(model_id, trust_remote_code=True)

image = Image.open("image.jpg").convert("RGB")
inputs = processor(image, return_tensors="pt").to("cuda")
inputs["pixel_values"] = inputs["pixel_values"].to(torch.bfloat16)

with torch.no_grad():
    outputs = model(**inputs)

# Options: 'siglino' (384d), 'siglip2' (1152d), 'dinov3' (1024d)
patch_features = outputs["patch_features"]["siglino"]         # (Batch, Tokens, 384)
summary_features = outputs["summary_features"]["siglip2"]  # (Batch, 1152)

Model Details

| Property | Value | |----------|-------| | Architecture | Dense | | Parameters | 0.03B | | Layers | 12 | | Hidden Dim | 384 | | FFN Dim | 1536 | | Patch Size | 16x16 | | Teachers | DINOv3, SigLIP2 |

Results (512x512, ensemble features)

| Task | Metric | Score | |------|--------|-------| | kNN (ImageNet) | Acc | 79.0 | | kNN (6-dataset avg) | Acc | 83.3 | | Zero-shot cls (ImageNet) | Acc | 65.1 | | Flickr30K I2T | R@1 | 82.2 | | MSCOCO I2T | R@1 | 59.7 | | Pascal VOC (1024) | mIoU | 82.1 | | Cityscapes (1024) | mIoU | 59.2 |

Citation

@article{chaybouti2025amoe,
  title={AMoE: Agglomerative Mixture-of-Experts Vision Foundation Models},
  author={Chaybouti, Sofian and Narayan, Sanath and Dahou, Yasser and Le Khac, Phuc H. and Singh, Ankit and Huynh, Ngoc Dung and Para, Wamiq Reyaz and Kuehne, Hilde and Hacid, Hakim},
  journal={arXiv preprint arXiv:2512.20157},
  year={2025}
}

Author: tiiuae

Likes: 2

Downloads: 0

Tags: safetensors, siglino, vision, feature-extraction, image-feature-extraction, custom_code, arxiv:2512.20157, license:apache-2.0, region:us