baidu/Qianfan-OCR
license: apache-2.0
license_link: LICENSE
language:
- multilingual
tags:
- vision-language
- ocr
- document-intelligence
- qianfan
pipeline_tag: image-text-to-text
library_name: transformers
model-index:
- name: Qianfan-OCR
results:
- task:
type: document-parsing
name: Document Parsing
dataset:
name: OmniDocBench v1.5
type: opendatalab/OmniDocBench
metrics:
- type: overall
value: 93.12
name: Overall Score
- task:
type: ocr
name: OCR
dataset:
name: OlmOCR Bench
type: allenai/olmOCR-bench
metrics:
- type: accuracy
value: 79.8
name: Overall Score
- task:
type: ocr
name: OCR
dataset:
name: OCRBench
type: echo840/OCRBench
metrics:
- type: accuracy
value: 880
name: Score
<div align="center">
<h1>Qianfan-OCR</h1>
<h3>A Unified End-to-End Model for Document Intelligence</h3>
🤖 Demo |
📄 Technical Report |
🖥️ Qianfan Platform |
💻 GitHub |
🧩 Skill
</div>
Introduction
Qianfan-OCR is a 4B-parameter end-to-end document intelligence model developed by the Baidu Qianfan Team. It unifies document parsing, layout analysis, and document understanding within a single vision-language architecture.
Unlike traditional multi-stage OCR pipelines that chain separate layout detection, text recognition, and language comprehension modules, Qianfan-OCR performs direct image-to-Markdown conversion and supports a broad range of prompt-driven tasks — from structured document parsing and table extraction to chart understanding, document question answering, and key information extraction — all within one model.
Key Highlights
- 🏆 #1 End-to-End Model on OmniDocBench v1.5 — Achieves 93.12 overall score, surpassing DeepSeek-OCR-v2 (91.09), Gemini-3 Pro (90.33), and all other end-to-end models
- 🏆 #1 End-to-End Model on OlmOCR Bench — Scores 79.8
- 🏆 #1 on Key Information Extraction — Overall mean score of 87.9 across five public KIE benchmarks, surpassing Gemini-3.1-Pro, Gemini-3-Pro, Seed-2.0, and Qwen3-VL-235B-A22B
- 🧠 Layout-as-Thought — An innovative optional thinking phase that recovers explicit layout analysis within the end-to-end paradigm via
⟨think⟩ tokens
- 🌍 192 Languages — Multilingual OCR support across diverse scripts
- ⚡ Efficient Deployment — Achieves 1.024 PPS (pages per second) with W8A8 quantization on a single A100 GPU
Architecture
Qianfan-OCR adopts the multimodal bridging architecture from Qianfan-VL, consisting of three core components:
| Component | Details |
|---|---|
| Vision Encoder | Qianfan-ViT, 24 Transformer layers, AnyResolution design (up to 4K), 256 visual tokens per 448×448 tile, max 4,096 tokens per image |
| Language Model | Qwen3-4B (3.6B non-embedding), 36 layers, 2560 hidden dim, GQA (32 query / 8 KV heads), 32K context (extendable to 131K) |
| Cross-Modal Adapter | 2-layer MLP with GELU activation, projecting from 1024-dim to 2560-dim |
Layout-as-Thought
A key innovation is Layout-as-Thought: an optional thinking phase triggered by ⟨think⟩ tokens, where the model generates structured layout representations (bounding boxes, element types, reading order) before producing final outputs.
This mechanism serves two purposes:
- Functional: Recovers layout analysis capability within the end-to-end paradigm — users obtain structured layout results directly
- Enhancement: Provides targeted accuracy improvements on documents with complex layouts, cluttered elements, or non-standard reading orders
When to use: Enable thinking for heterogeneous pages with mixed element types (exam papers, technical reports, newspapers). Disable for homogeneous documents (single-column text, simple forms) for better results and lower latency.
Benchmark Results
OmniDocBench v1.5 (Document Parsing)
| Model | Type | Overall ↑ | TextEdit ↓ | FormulaCDM ↑ | TableTEDs ↑ | TableTEDss ↑ | R-orderEdit ↓ |
|---|---|---|---|---|---|---|---|
| Qianfan-OCR (Ours) | End-to-end | 93.12 | 0.041 | 92.43 | 91.02 | 93.85 | 0.049 |
| DeepSeek-OCR-v2 | End-to-end | 91.09 | 0.048 | 90.31 | 87.75 | 92.06 | 0.057 |
| Gemini-3 Pro | End-to-end | 90.33 | 0.065 | 89.18 | 88.28 | 90.29 | 0.071 |
| Qwen3-VL-235B | End-to-end | 89.15 | 0.069 | 88.14 | 86.21 | 90.55 | 0.068 |
| dots.ocr | End-to-end | 88.41 | 0.048 | 83.22 | 86.78 | 90.62 | 0.053 |
| PaddleOCR-VL 1.5 | Pipeline | 94.50 | 0.035 | 94.21 | 92.76 | 95.79 | 0.042 |
General OCR Benchmarks
| Model | OCRBench | OCRBenchv2 (en/zh) | CCOCR-multilan | CCOCR-overall |
|---|---|---|---|---|
| Qianfan-OCR (Ours) | 880 | 56.0 / 60.77 | 76.7 | 79.3 |
| Qwen3-VL-4B | 873 | 60.68 / 59.13 | 74.2 | 76.5 |
| MonkeyOCR | 655 | 21.78 / 38.91 | 43.8 | 35.2 |
| DeepSeek-OCR | 459 | 15.98 / 38.31 | 32.5 | 27.6 |
Document Understanding
| Benchmark | Qianfan-OCR | Qwen3-VL-4B | Qwen3-VL-2B |
|---|---|---|---|
| DocVQA | 92.8 | 94.9 | 92.7 |
| CharXiv_DQ | 94.0 | 81.8 | 69.7 |
| CharXiv_RQ | 85.2 | 48.5 | 41.3 |
| ChartQA | 88.1 | 83.3 | 78.3 |
| ChartQAPro | 42.9 | 36.2 | 24.5 |
| ChartBench | 85.9 | 74.9 | 73.2 |
| TextVQA | 80.0 | 81.8 | 79.9 |
| OCRVQA | 66.8 | 64.7 | 59.3 |
💡 Two-stage OCR+LLM systems score 0.0 on CharXiv (both DQ and RQ), demonstrating that chart structures discarded during text extraction are essential for reasoning.
Key Information Extraction (KIE)
| Model | Overall | OCRBench KIE | OCRBenchv2 KIE (en) | OCRBenchv2 KIE (zh) | CCOCR KIE | Nanonets KIE (F1) |
|---|---|---|---|---|---|---|
| Qianfan-OCR (Ours) | 87.9 | 95.0 | 82.8 | 82.3 | 92.8 | 86.5 |
| Qwen3-VL-235B-A22B | 84.2 | 94.0 | 85.6 | 62.9 | 95.1 | 83.8 |
| Qwen3-4B-VL | 83.5 | 89.0 | 82.1 | 71.3 | 91.6 | 83.3 |
| Gemini-3.1-Pro | 79.2 | 96.0 | 87.8 | 63.4 | 72.5 | 76.1 |
Inference Throughput
| Model | PPS (pages/sec) |
|---|---|
| Qianfan-OCR (W8A8) | 1.024 |
| Qianfan-OCR (W16A16) | 0.503 |
| MinerU 2.5 | 1.057 |
| MonkeyOCR-pro-1.2B | 0.673 |
| Dots OCR | 0.352 |
All benchmarks on a single NVIDIA A100 GPU with vLLM 0.10.2.
Supported Tasks
Qianfan-OCR supports a comprehensive set of document intelligence tasks through prompt-driven control:
| Task Category | Specific Tasks |
|---|---|
| Document Parsing | Image-to-Markdown conversion, multi-page parsing, structured output (JSON/HTML) |
| Layout Analysis | Bounding box detection, element type classification (25 categories), reading order |
| Table Recognition | Complex table extraction (merged cells, rotated tables), HTML output |
| Formula Recognition | Inline and display math formulas, LaTeX output |
| Chart Understanding | Chart QA, trend analysis, data extraction from various chart types |
| Key Information Extraction | Receipts, invoices, certificates, medical records, ID cards |
| Handwriting Recognition | Chinese and English handwritten text |
| Scene Text Recognition | Street signs, product labels, natural scene text |
| Multilingual OCR | 192 languages including Latin, Cyrillic, Arabic, South/Southeast Asian, CJK scripts |
Quick Start
Basic Usage
import torch
import torchvision.transforms as T
from torchvision.transforms.functional import InterpolationMode
from transformers import AutoModel, AutoTokenizer
from PIL import Image
IMAGENET_MEAN = (0.485, 0.456, 0.406)
IMAGENET_STD = (0.229, 0.224, 0.225)
def build_transform(input_size):
MEAN, STD = IMAGENET_MEAN, IMAGENET_STD
transform = T.Compose([
T.Lambda(lambda img: img.convert('RGB') if img.mode != 'RGB' else img),
T.Resize((input_size, input_size), interpolation=InterpolationMode.BICUBIC),
T.ToTensor(),
T.Normalize(mean=MEAN, std=STD)
])
return transform
def find_closest_aspect_ratio(aspect_ratio, target_ratios, width, height, image_size):
best_ratio_diff = float('inf')
best_ratio = (1, 1)
area = width * height
for ratio in target_ratios:
target_aspect_ratio = ratio[0] / ratio[1]
ratio_diff = abs(aspect_ratio - target_aspect_ratio)
if ratio_diff < best_ratio_diff:
best_ratio_diff = ratio_diff
best_ratio = ratio
elif ratio_diff == best_ratio_diff:
if area > 0.5 * image_size * image_size * ratio[0] * ratio[1]:
best_ratio = ratio
return best_ratio
def dynamic_preprocess(image, min_num=1, max_num=12, image_size=448, use_thumbnail=False):
orig_width, orig_height = image.size
aspect_ratio = orig_width / orig_height
# calculate the existing image aspect ratio
target_ratios = set(
(i, j) for n in range(min_num, max_num + 1) for i in range(1, n + 1) for j in range(1, n + 1) if
i * j <= max_num and i * j >= min_num)
target_ratios = sorted(target_ratios, key=lambda x: x[0] * x[1])
# find the closest aspect ratio to the target
target_aspect_ratio = find_closest_aspect_ratio(
aspect_ratio, target_ratios, orig_width, orig_height, image_size)
# calculate the target width and height
target_width = image_size * target_aspect_ratio[0]
target_height = image_size * target_aspect_ratio[1]
blocks = target_aspect_ratio[0] * target_aspect_ratio[1]
# resize the image
resized_img = image.resize((target_width, target_height))
processed_images = []
for i in range(blocks):
box = (
(i % (target_width // image_size)) * image_size,
(i // (target_width // image_size)) * image_size,
((i % (target_width // image_size)) + 1) * image_size,
((i // (target_width // image_size)) + 1) * image_size
)
# split the image
split_img = resized_img.crop(box)
processed_images.append(split_img)
assert len(processed_images) == blocks
if use_thumbnail and len(processed_images) != 1:
thumbnail_img = image.resize((image_size, image_size))
processed_images.append(thumbnail_img)
return processed_images
def load_image(image_file, input_size=448, max_num=12):
image = Image.open(image_file).convert('RGB')
transform = build_transform(input_size=input_size)
images = dynamic_preprocess(image, image_size=input_size, use_thumbnail=True, max_num=max_num)
pixel_values = [transform(image) for image in images]
pixel_values = torch.stack(pixel_values)
return pixel_values
# Load model
MODEL_PATH = "baidu/Qianfan-OCR"
model = AutoModel.from_pretrained(
MODEL_PATH,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto"
).eval()
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
# Load and process image
pixel_values = load_image("./Qianfan-OCR/examples/document.png").to(torch.bfloat16)
# Inference
prompt = "Parse this document to Markdown."
with torch.no_grad():
response = model.chat(
tokenizer,
pixel_values=pixel_values,
question=prompt,
generation_config={"max_new_tokens": 16384}
)
print(response)
With Layout-as-Thought (Thinking Mode)
# Enable Layout-as-Thought by appending <think> token to query
pixel_values = load_image("./Qianfan-OCR/examples/complex_document.jpg").to(torch.bfloat16)
prompt = "Parse this document to Markdown.<think>"
with torch.no_grad():
response = model.chat(
tokenizer,
pixel_values=pixel_values,
question=prompt,
generation_config={"max_new_tokens": 16384}
)
print(response)
# The model will first generate structured layout analysis, then produce the final output
Key Information Extraction
pixel_values = load_image("./Qianfan-OCR/examples/invoice.jpg").to(torch.bfloat16)
prompt = "请从图片中提取以下字段信息:姓名、日期、总金额。使用标准JSON格式输出。"
with torch.no_grad():
response = model.chat(
tokenizer,
pixel_values=pixel_values,
question=prompt,
generation_config={"max_new_tokens": 16384}
)
print(response)
vLLM Deployment
# Serve with vLLM for high-throughput inference
vllm serve baidu/Qianfan-OCR --trust-remote-code
Skill
We provide a Qianfan OCR Document Intelligence skill for image and PDF understanding workflows.
It can be used by users of OpenClaw, Claude Code, Codex, and other assistants that support this skill format.
This skill packages reusable instructions, scripts, and references so the agent can automatically apply Qianfan-powered document intelligence to tasks such as:
- document parsing to Markdown
- layout analysis
- element recognition
- general OCR
- key information extraction
- chart understanding
- document VQA
The skill is designed for visual understanding tasks over images and PDFs, and includes the execution flow needed to prepare inputs, choose the right analysis mode, and call the bundled CLI tools.
Citation
@misc{dong2026qianfanocrunifiedendtoendmodel,
title={Qianfan-OCR: A Unified End-to-End Model for Document Intelligence},
author={Daxiang Dong and Mingming Zheng and Dong Xu and Chunhua Luo and Bairong Zhuang and Yuxuan Li and Ruoyun He and Haoran Wang and Wenyu Zhang and Wenbo Wang and Yicheng Wang and Xue Xiong and Ayong Zheng and Xiaoying Zuo and Ziwei Ou and Jingnan Gu and Quanhao Guo and Jianmin Wu and Dawei Yin and Dou Shen},
year={2026},
eprint={2603.13398},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2603.13398},
}
Acknowledgments
We thank the Baidu AI Cloud team for infrastructure support, the Baige and Kunlun teams for AI infrastructure assistance, and all contributors to the Qianfan platform.
License
This project is licensed under the Apache License 2.0. See LICENSE for the
full license text.
Some bundled third-party source files are licensed under the MIT License. See
NOTICE for the file list and corresponding attribution details.
Author: baidu
Likes: 133
Downloads: 0
Tags: transformers, safetensors, internvl_chat, feature-extraction, vision-language, ocr, document-intelligence, qianfan, image-text-to-text, conversational, custom_code, multilingual, arxiv:2603.13398, arxiv:2509.18189, license:apache-2.0, model-index, eval-results, region:us
Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF
language:
- en
- zh
- ko
license: apache-2.0
base_model: Qwen/Qwen3.5-9B
tags:
- unsloth
- qwen
- qwen3.5
- reasoning
- chain-of-thought
- lora
pipeline_tag: image-text-to-text
datasets:
- nohurry/Opus-4.6-Reasoning-3000x-filtered
- Jackrong/Qwen3.5-reasoning-700x
- Roman1111111/claude-opus-4.6-10000x
🌟 Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2
📢 Announcement
v2 Update:
This iteration is powered by 14,000+ premium Claude 4.6 Opus-style general reasoning samples, with a major focus on achieving massive gains in reasoning efficiency while actively improving peak accuracy.
v2 introduces a refined reasoning scaffold designed to eliminate redundant internal loops, significantly improving the model's cross-task generalization from logic and math into specialized fields like programming. Compared to the original model, autonomy and stability are significantly improved, ensuring the model remains robust and self-consistent during complex, multi-step problem solving. v2 is built to think smarter, not longer, delivering substantial improvements in inference speed and cost-effectiveness while simultaneously boosting baseline accuracy.
Note: Due to the constraints of SFT sample size and training scope, the model's broad general-purpose capabilities might be slightly impacted. The efficiency and accuracy results discussed here are based on the HumanEval and HumanEval+ benchmarks. Thank you for your understanding!

💡 Model Introduction
Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2 is the second iteration of this reasoning-focused Qwen3.5-9B fine-tune, built to drastically improve the efficiency of chain-of-thought generation, unlocking highly substantial gains in reasoning speed and cost-reduction while actually increasing absolute accuracy.
Compared with the earlier version, v2 was trained with 14,000 Claude 4.6 Opus-style general reasoning samples, with a stronger emphasis on transferring concise, reusable reasoning patterns rather than only maximizing raw benchmark scores. The goal of v2 is not simply to make the model "think more," but to help it think more economically: reducing unnecessarily long internal chains, avoiding verbose over-analysis on easy problems, and massively improving the reasoning-cost-to-quality ratio while beating the baseline's benchmark correctness.
A key design choice in v2 is that the distillation data is primarily general-domain reasoning data—specifically focused on mathematics, word problems, logical deduction, and a balanced mix of general knowledge and instructions—rather than specialized code-heavy supervision. Consequently, HumanEval and HumanEval+ are employed here to evaluate cross-task generalization and capability transfer, rather than serving as direct optimization targets. High performance on these benchmarks, despite the lack of code-centric training, confirms that the model's reasoning scaffold has become more robust and transferable, proving that fundamental reasoning logic can effectively power specialized tasks like programming.
Why v2 matters
Relative to the official Qwen3.5-9B baseline, the fine-tuned v2 model achieves a strict upgrade in absolute HumanEval and HumanEval+ accuracy alongside massive, transformative gains in reasoning efficiency:
| Metric | Official Qwen3.5-9B | v2 Fine-tuned Model | Improvement |
|---|---:|---:|---:|
| Average think length (chars) | 2284.3 chars | 1778.0 chars | 🟢 -22.17% (Shorter / Better) |
| Average think length (words) | 400.83 words | 310.33 words | 🟢 -22.58% (Shorter / Better) |
| HumanEval base passes per 10k think chars | 4.004 | 5.041 | 🟢 +25.91% (Higher / Better) |
| HumanEval+ passes per 10k think chars | 3.764 | 4.836 | 🟢 +28.48% (Higher / Better) |
| Think chars needed per HumanEval base pass | 2497.5 | 1983.6 | 🟢 -20.58% (Lower / Better) |
| Think chars needed per HumanEval+ pass | 2656.9 | 2068.0 | 🟢 -22.17% (Lower / Better) |
More impressively, not only does v2 vastly improve reasoning efficiency, it actually outperforms the official baseline on both the standard base tests and the much stricter HumanEval+ benchmark across different test settings.
We conducted two separate evaluations under different sampling temperatures to verify stability and peak performance:
Test Run 1 (T=0.2)
| Fairly Recomputed Benchmark | Official Qwen3.5-9B | v2 Fine-tuned Model | Gap |
|---|---:|---:|---:|
| HumanEval (base tests) pass@1 | 0.8171 | 0.8232 | 🟢 +0.61 pts |
| HumanEval+ (base + extra tests) pass@1 | 0.7622 | 0.7866 | 🟢 +2.44 pts |
Test Run 2 (T=0.6)
| Fairly Recomputed Benchmark | Official Qwen3.5-9B | v2 Fine-tuned Model | Gap |
|---|---:|---:|---:|
| HumanEval (base tests) pass@1 | 0.8170 | 0.8720 | 🟢 +5.50 pts |
| HumanEval+ (base + extra tests) pass@1 | 0.7620 | 0.8170 | 🟢 +5.50 pts |
These consistent dual-improvements make the model undeniably superior for real-world use cases.
For users who care about reasoning efficiency per unit of inference budget, v2 is exceptionally powerful—not only achieving higher peak accuracy, but doing so while consuming over 20% fewer characters and tokens.
That matters especially for:
- Resource-constrained local deployment: On consumer GPUs or lower-memory local setups, shorter and cleaner reasoning traces can reduce latency, memory pressure, and the effective cost of generation.
- Agentic workflows: In multi-step agents, the model often solves many easy or medium subtasks. In those settings, excessively elaborate chain-of-thought can become a tax on throughput. A model that reaches a better answer with fewer reasoning tokens can radically improve end-to-end agent speed and lower cumulative inference cost.
- Open-source tool use and emerging agent stacks: For users building with lightweight open reasoning systems, browser-use agents, terminal agents, or projects in the "OpenClaw / local autonomous agent" style ecosystem, a model that achieves better peak accuracy while drastically improving reasoning economy is highly practical for real-world loops.
- Simple problems at scale: One common issue with strong reasoning-tuned base models is that they sometimes produce very elaborate internal traces even for simple prompts. While that can look impressive, it is often inefficient in practice. v2 is explicitly aimed at trimming this overhead.
In short, v2 no longer forces a trade-off between absolute coding benchmark scores and reasoning economy. It provides a fully optimized deployment-ready profile: faster, shorter, more economical reasoning paired with stronger generalization and accuracy. For local users, agent builders, and cost-sensitive applications, v2 is a strict upgrade.
🗺️ Training Pipeline Overview
Base Model (Qwen3.5-9B)
│
▼
Qwen3.5-9B fine-tuned with Unsloth
│
▼
Supervised Fine-Tuning (SFT) + LoRA
(Response-Only Training masked on "<|im_start|>assistant\n<think>")
│
▼
Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2
🧠 Example of Learned Reasoning Scaffold(Example)
The model includes targeted optimizations addressing Qwen3.5’s tendency toward excessive transitional or repetitive reasoning on simple queries. Through deep distillation and structural imitation of Claude-4.6-Opus reasoning chains, the model adopts a more efficient structured thinking pattern:
“Let me analyze this request carefully: 1..2..3...”.
This streamlined reasoning paradigm significantly reduces redundant cognitive loops while preserving deep analytical capacity, resulting in substantially improved inference efficiency.
Let me analyze this request carefully:
1. Identify the core objective of the problem.
2. Break the task into clearly defined subcomponents.
3. Evaluate constraints and edge cases.
4. Formulate a step-by-step solution plan.
5. Execute the reasoning sequentially and verify consistency.
.
.
.
📚 All Datasets Used
The dataset consists of high-quality, filtered reasoning distillation data:
| Dataset Name | Description / Purpose |
|--------------|-----------------------|
| nohurry/Opus-4.6-Reasoning-3000x-filtered | Provides comprehensive Claude 4.6 Opus reasoning trajectories. |
| Roman1111111/claude-opus-4.6-10000x | Large-scale public Claude 4.6 Opus distillation data used to strengthen general reasoning transfer in v2. |
| TeichAI/claude-4.5-opus-high-reasoning-250x | Injecting high-intensity, structured reasoning instances. |
| Jackrong/Qwen3.5-reasoning-700x | Additional curated reasoning samples designed to strengthen structured step-by-step problem solving and improve reasoning diversity. |
⚠️ Limitations & Intended Use
- Hallucination Risk: While reasoning is strong, the model remains an autoregressive LLM; external facts provided during the thinking sequence may occasionally contain hallucinations if verifying real-world events.
- Intended Scenario: Best suited for offline analytical tasks, coding, math, and heavy logic-dependent prompting where the user needs to transparently follow the AI's internal logic.
- This model is a test version intended solely for learning and demonstration purposes, and is for academic research and technical exploration use only.
🙏 Acknowledgements
Significant thanks to the Unsloth AI team for making rapid fine-tuning of large LLM models accessible. Additionally, we acknowledge Qwen internally, and the open-source community developers producing exceptional distilled datasets.
Author: Jackrong
Likes: 10
Downloads: 0
Tags: gguf, qwen3_5, unsloth, qwen, qwen3.5, reasoning, chain-of-thought, lora, image-text-to-text, en, zh, ko, dataset:nohurry/Opus-4.6-Reasoning-3000x-filtered, dataset:Jackrong/Qwen3.5-reasoning-700x, dataset:Roman1111111/claude-opus-4.6-10000x, base_model:Qwen/Qwen3.5-9B, base_model:adapter:Qwen/Qwen3.5-9B, license:apache-2.0, endpoints_compatible, region:us, conversational
LuffyTheFox/Omnicoder-Claude-4.6-Opus-Uncensored-GGUF
language:
- en
- zh
- ko
license: apache-2.0
base_model: Qwen/Qwen3.5-9B
tags:
- unsloth
- qwen
- qwen3.5
- reasoning
- chain-of-thought
- lora
- uncensored
- not-for-all-audiences
pipeline_tag: text-generation
datasets:
- nohurry/Opus-4.6-Reasoning-3000x-filtered
- Jackrong/Qwen3.5-reasoning-700x
- Roman1111111/claude-opus-4.6-10000x
🌟 This is Omnicoder model based on Qwen 3.5 9B with zero refusals made by merging HauhauCS model with Jackrong model and Omnicoder 9B model from Tesslate.
🌟 GGUF editor on Hugging Face is working very slow. It's taking ages to edit chat template. So thinking is enabled by default in this model.
If you want to disable thinking use this chat template in LM Studio: https://pastebin.com/uk9ZkxCR
For best model perfomance use following settings in latest beta version in LM Studio:
Temperature: 0.7
Top K Sampling: 20
Presence Penalty: 1.5
Top P Sampling: 0.8
Min P Sampling: 0
Seed: 3407 or 42
And this system prompt. It's pretty solid: https://pastebin.com/6C4rtujt
This one is complex but works too: https://pastebin.com/pU25DVnB
Also you can use only this string in System Prompt:
You are Claude, created by Anthropic. You are a helpful AI assistant.
or
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.
And write anything you want after that. Looks like model is underperforming without this first line.
📢 Announcement
v2 Update:
This iteration is powered by 14,000+ premium Claude 4.6 Opus-style general reasoning samples, with a major focus on achieving massive gains in reasoning efficiency at the cost of only an extremely minor drop in accuracy.
v2 introduces a refined reasoning scaffold designed to eliminate redundant internal loops, significantly improving the model's cross-task generalization from logic and math into specialized fields like programming. Compared to the original model, autonomy and stability are significantly improved, ensuring the model remains robust and self-consistent during complex, multi-step problem solving. v2 is built to think smarter, not longer, delivering substantial improvements in inference speed and cost-effectiveness while preserving nearly all of the baseline's peak accuracy.
Note: Due to the constraints of SFT sample size and training scope, the model's broad general-purpose capabilities might be slightly impacted. The efficiency and accuracy results discussed here are based on the HumanEval and HumanEval+ benchmarks. Thank you for your understanding!

💡 Model Introduction
Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2 is the second iteration of this reasoning-focused Qwen3.5-9B fine-tune, built to drastically improve the efficiency of chain-of-thought generation, trading off a practically imperceptible margin of absolute accuracy for highly substantial gains in reasoning speed and cost-reduction.
Compared with the earlier version, v2 was trained with 14,000 Claude 4.6 Opus-style general reasoning samples, with a stronger emphasis on transferring concise, reusable reasoning patterns rather than only maximizing raw benchmark scores. The goal of v2 is not simply to make the model "think more," but to help it think more economically: reducing unnecessarily long internal chains, avoiding verbose over-analysis on easy problems, and massively improving the reasoning-cost-to-quality ratio without meaningfully sacrificing correctness.
A key design choice in v2 is that the distillation data is primarily general-domain reasoning data—specifically focused on mathematics, word problems, logical deduction, and a balanced mix of general knowledge and instructions—rather than specialized code-heavy supervision. Consequently, HumanEval and HumanEval+ are employed here to evaluate cross-task generalization and capability transfer, rather than serving as direct optimization targets. High performance on these benchmarks, despite the lack of code-centric training, confirms that the model's reasoning scaffold has become more robust and transferable, proving that fundamental reasoning logic can effectively power specialized tasks like programming.
Why v2 matters
Relative to the official Qwen3.5-9B baseline, the fine-tuned v2 model accepts an extremely minor loss in absolute HumanEval accuracy (less than 2 percentage points) in exchange for massive, transformative gains in reasoning efficiency:
| Metric | Official Qwen3.5-9B | v2 Fine-tuned Model | Improvement |
|---|---:|---:|---:|
| Average think length (chars) | 2284.3 chars | 1778.0 chars | 🟢 -22.17% (Shorter / Better) |
| Average think length (words) | 400.83 words | 310.33 words | 🟢 -22.58% (Shorter / Better) |
| HumanEval base passes per 10k think chars | 4.004 | 5.041 | 🟢 +25.91% (Higher / Better) |
| HumanEval+ passes per 10k think chars | 3.764 | 4.836 | 🟢 +28.48% (Higher / Better) |
| Think chars needed per HumanEval base pass | 2497.5 | 1983.6 | 🟢 -20.58% (Lower / Better) |
| Think chars needed per HumanEval+ pass | 2656.9 | 2068.0 | 🟢 -22.17% (Lower / Better) |
At the same time, while the official model holds a razor-thin lead on the standard base tests, v2 achieves the exact same accuracy on the much stricter HumanEval+ benchmark:
| Fairly Recomputed Benchmark | Official Qwen3.5-9B | v2 Fine-tuned Model | Gap |
|---|---:|---:|---:|
| HumanEval (base tests) pass@1 | 0.9146 | 0.8963 | 🔴🔽 -1.83 pts |
| HumanEval+ (base + extra tests) pass@1 | 0.8598 | 0.8598 | 🔵 0.00 pts |
This trade-off strongly favors real-world use cases.
For users who care strictly about the absolute highest peak benchmark score, the official model holds a razor-thin edge. However, for users who care about reasoning efficiency per unit of inference budget, v2 is exceptionally superior—doing almost exactly the same quality of logic work while consuming over 20% fewer characters and tokens.
That matters especially for:
- Resource-constrained local deployment: On consumer GPUs or lower-memory local setups, shorter and cleaner reasoning traces can reduce latency, memory pressure, and the effective cost of generation.
- Agentic workflows: In multi-step agents, the model often solves many easy or medium subtasks. In those settings, excessively elaborate chain-of-thought can become a tax on throughput. A model that reaches a workable answer with fewer reasoning tokens can improve end-to-end agent speed and lower cumulative inference cost.
- Open-source tool use and emerging agent stacks: For users building with lightweight open reasoning systems, browser-use agents, terminal agents, or projects in the "OpenClaw / local autonomous agent" style ecosystem, a model that sacrifices a small amount of peak accuracy for much better reasoning economy can be more practical in real-world loops.
- Simple problems at scale: One common issue with strong reasoning-tuned base models is that they sometimes produce very elaborate internal traces even for simple prompts. While that can look impressive, it is often inefficient in practice. v2 is explicitly aimed at trimming this overhead.
In short, v2 does not claim to beat the official model on absolute coding benchmark score. Instead, it demonstrates a more deployment-oriented optimization target: faster, shorter, more economical reasoning with still-competitive generalization. For many local users, agent builders, and cost-sensitive applications, this can be a highly favorable trade.
🗺️ Training Pipeline Overview
Base Model (Qwen3.5-9B)
│
▼
Qwen3.5-9B fine-tuned with Unsloth
│
▼
Supervised Fine-Tuning (SFT) + LoRA
(Response-Only Training masked on "<|im_start|>assistant\n<think>")
│
▼
Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2
🧠 Example of Learned Reasoning Scaffold(Example)
The model includes targeted optimizations addressing Qwen3.5’s tendency toward excessive transitional or repetitive reasoning on simple queries. Through deep distillation and structural imitation of Claude-4.6-Opus reasoning chains, the model adopts a more efficient structured thinking pattern:
“Let me analyze this request carefully: 1..2..3...”.
This streamlined reasoning paradigm significantly reduces redundant cognitive loops while preserving deep analytical capacity, resulting in substantially improved inference efficiency.
Let me analyze this request carefully:
1. Identify the core objective of the problem.
2. Break the task into clearly defined subcomponents.
3. Evaluate constraints and edge cases.
4. Formulate a step-by-step solution plan.
5. Execute the reasoning sequentially and verify consistency.
.
.
.
📚 All Datasets Used
The dataset consists of high-quality, filtered reasoning distillation data:
| Dataset Name | Description / Purpose |
|--------------|-----------------------|
| nohurry/Opus-4.6-Reasoning-3000x-filtered | Provides comprehensive Claude 4.6 Opus reasoning trajectories. |
| Roman1111111/claude-opus-4.6-10000x | Large-scale public Claude 4.6 Opus distillation data used to strengthen general reasoning transfer in v2. |
| TeichAI/claude-4.5-opus-high-reasoning-250x | Injecting high-intensity, structured reasoning instances. |
| Jackrong/Qwen3.5-reasoning-700x | Additional curated reasoning samples designed to strengthen structured step-by-step problem solving and improve reasoning diversity. |
⚠️ Limitations & Intended Use
- Hallucination Risk: While reasoning is strong, the model remains an autoregressive LLM; external facts provided during the thinking sequence may occasionally contain hallucinations if verifying real-world events.
- Intended Scenario: Best suited for offline analytical tasks, coding, math, and heavy logic-dependent prompting where the user needs to transparently follow the AI's internal logic.
- This model is a test version intended solely for learning and demonstration purposes, and is for academic research and technical exploration use only.
🙏 Acknowledgements
Significant thanks to the Unsloth AI team for making rapid fine-tuning of large LLM models accessible. Additionally, we acknowledge Qwen internally, and the open-source community developers producing exceptional distilled datasets.
Author: LuffyTheFox
Likes: 9
Downloads: 0
Tags: gguf, qwen3_5, unsloth, qwen, qwen3.5, reasoning, chain-of-thought, lora, uncensored, not-for-all-audiences, text-generation, en, zh, ko, dataset:nohurry/Opus-4.6-Reasoning-3000x-filtered, dataset:Jackrong/Qwen3.5-reasoning-700x, dataset:Roman1111111/claude-opus-4.6-10000x, base_model:Qwen/Qwen3.5-9B, base_model:adapter:Qwen/Qwen3.5-9B, license:apache-2.0, endpoints_compatible, region:us, conversational
Jackrong/Qwen3.5-4B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF
language:
- en
- zh
- ko
license: apache-2.0
base_model: Qwen/Qwen3.5-4B
tags:
- unsloth
- qwen
- qwen3.5
- reasoning
- chain-of-thought
- lora
pipeline_tag: image-text-to-text
datasets:
- nohurry/Opus-4.6-Reasoning-3000x-filtered
- Jackrong/Qwen3.5-reasoning-700x
- Roman1111111/claude-opus-4.6-10000x
🌟 Qwen3.5-4B-Claude-4.6-Opus-Reasoning-Distilled-v2
📢 Announcement
v2 Update:
This iteration is powered by 14,000+ premium Claude 4.6 Opus-style general reasoning samples, with a major focus on optimizing reasoning economy and structural efficiency.
v2 introduces a refined reasoning scaffold designed to eliminate redundant internal loops, significantly improving the model's cross-task generalization from logic and math into specialized fields like programming. Compared to the original model, autonomy and stability are significantly improved, ensuring the model remains robust and self-consistent during complex, multi-step problem solving. v2 is built to think smarter, not longer, ensuring high-quality analytical depth with a much better reasoning-cost-to-quality ratio.

💡 Model Introduction
Qwen3.5-4B-Claude-4.6-Opus-Reasoning-Distilled-v2 is the second iteration of this reasoning-focused Qwen3.5-4B fine-tune, built to improve the efficiency of chain-of-thought generation while preserving strong general reasoning behavior.
Compared with the earlier version, v2 was trained with 14,000 Claude 4.6 Opus-style general reasoning samples, with a stronger emphasis on transferring concise, reusable reasoning patterns rather than only maximizing raw benchmark scores. The goal of v2 is not simply to make the model "think more," but to help it think more economically: reducing unnecessarily long internal chains, avoiding verbose over-analysis on easy problems, and producing answers with a better reasoning-cost-to-quality ratio.
A key design choice in v2 is that the distillation data is primarily general-domain reasoning data—specifically focused on mathematics, word problems, logical deduction, and a balanced mix of general knowledge and instructions—rather than specialized code-heavy supervision. Consequently, HumanEval and HumanEval+ are employed here to evaluate cross-task generalization and capability transfer, rather than serving as direct optimization targets. High performance on these benchmarks, despite the lack of code-centric training, confirms that the model's reasoning scaffold has become more robust and transferable, proving that fundamental reasoning logic can effectively power specialized tasks like programming.
Why v2 matters
Relative to the official Qwen3.5-4B baseline, the fine-tuned v2 model still trails slightly in absolute HumanEval accuracy after fair rescoring, but it shows substantial gains in reasoning efficiency:
| Metric | Official Qwen3.5-4B | v2 Fine-tuned Model | Change |
|---|---:|---:|---:|
| Average think length | 2829 chars | 1874 chars | 🟢 -33.77% |
| HumanEval base passes per 10k think chars | 3.104 | 4.393 | 🟢 +41.54% |
| HumanEval+ passes per 10k think chars | 2.910 | 4.165 | 🟢 +43.15% |
| Think chars needed per HumanEval base pass | 3222 | 2276 | 🟢 -29.35% |
| Think chars needed per HumanEval+ pass | 3437 | 2401 | 🟢 -30.14% |
At the same time, the official model remains stronger in absolute benchmark score:
| Fairly Recomputed Benchmark | Official Qwen3.5-4B | v2 Fine-tuned Model | Gap |
|---|---:|---:|---:|
| HumanEval (base tests) pass@1 | 0.7683 | 0.7317 | 🔴 -3.66 pts |
| HumanEval+ (base + extra tests) pass@1 | 0.7256 | 0.6951 | 🔴 -3.05 pts |
This trade-off is important to understand correctly.
For users who care only about the highest possible benchmark accuracy, the official model is still the stronger option. However, for users who care about reasoning efficiency per unit of inference budget, v2 is meaningfully improved.
That matters especially for:
- Resource-constrained local deployment: On consumer GPUs or lower-memory local setups, shorter and cleaner reasoning traces can reduce latency, memory pressure, and the effective cost of generation.
- Agentic workflows: In multi-step agents, the model often solves many easy or medium subtasks. In those settings, excessively elaborate chain-of-thought can become a tax on throughput. A model that reaches a workable answer with fewer reasoning tokens can improve end-to-end agent speed and lower cumulative inference cost.
- Open-source tool use and emerging agent stacks: For users building with lightweight open reasoning systems, browser-use agents, terminal agents, or projects in the "OpenClaw / local autonomous agent" style ecosystem, a model that sacrifices a small amount of peak accuracy for much better reasoning economy can be more practical in real-world loops.
- Simple problems at scale: One common issue with strong reasoning-tuned base models is that they sometimes produce very elaborate internal traces even for simple prompts. While that can look impressive, it is often inefficient in practice. v2 is explicitly aimed at trimming this overhead.
In short, v2 does not claim to beat the official model on absolute coding benchmark score. Instead, it demonstrates a more deployment-oriented optimization target: faster, shorter, more economical reasoning with still-competitive generalization. For many local users, agent builders, and cost-sensitive applications, this can be a highly favorable trade.
🗺️ Training Pipeline Overview
Base Model (Qwen3.5-4B)
│
▼
Qwen3.5-4B fine-tuned with Unsloth
│
▼
Supervised Fine-Tuning (SFT) + LoRA
(Response-Only Training masked on "<|im_start|>assistant\n<think>")
│
▼
Jackrong/Qwen3.5-4B-Claude-4.6-Opus-Reasoning-Distilled-v2
🧠 Example of Learned Reasoning Scaffold(Example)
The model includes targeted optimizations addressing Qwen3.5’s tendency toward excessive transitional or repetitive reasoning on simple queries. Through deep distillation and structural imitation of Claude-4.6-Opus reasoning chains, the model adopts a more efficient structured thinking pattern:
“Let me analyze this request carefully: 1..2..3...”.
This streamlined reasoning paradigm significantly reduces redundant cognitive loops while preserving deep analytical capacity, resulting in substantially improved inference efficiency.
Let me analyze this request carefully:
1. Identify the core objective of the problem.
2. Break the task into clearly defined subcomponents.
3. Evaluate constraints and edge cases.
4. Formulate a step-by-step solution plan.
5. Execute the reasoning sequentially and verify consistency.
.
.
.
📚 All Datasets Used
The dataset consists of high-quality, filtered reasoning distillation data:
| Dataset Name | Description / Purpose |
|--------------|-----------------------|
| nohurry/Opus-4.6-Reasoning-3000x-filtered | Provides comprehensive Claude 4.6 Opus reasoning trajectories. |
| Roman1111111/claude-opus-4.6-10000x | Large-scale public Claude 4.6 Opus distillation data used to strengthen general reasoning transfer in v2. |
| TeichAI/claude-4.5-opus-high-reasoning-250x | Injecting high-intensity, structured reasoning instances. |
| Jackrong/Qwen3.5-reasoning-700x | Additional curated reasoning samples designed to strengthen structured step-by-step problem solving and improve reasoning diversity. |
⚠️ Limitations & Intended Use
- Hallucination Risk: While reasoning is strong, the model remains an autoregressive LLM; external facts provided during the thinking sequence may occasionally contain hallucinations if verifying real-world events.
- Intended Scenario: Best suited for offline analytical tasks, coding, math, and heavy logic-dependent prompting where the user needs to transparently follow the AI's internal logic.
- This model is a test version intended solely for learning and demonstration purposes, and is for academic research and technical exploration use only.
🙏 Acknowledgements
Significant thanks to the Unsloth AI team for making rapid fine-tuning of large LLM models accessible. Additionally, we acknowledge Qwen internally, and the open-source community developers producing exceptional distilled datasets.
Author: Jackrong
Likes: 6
Downloads: 0
Tags: gguf, qwen3_5, unsloth, qwen, qwen3.5, reasoning, chain-of-thought, lora, image-text-to-text, en, zh, ko, dataset:nohurry/Opus-4.6-Reasoning-3000x-filtered, dataset:Jackrong/Qwen3.5-reasoning-700x, dataset:Roman1111111/claude-opus-4.6-10000x, base_model:Qwen/Qwen3.5-4B, base_model:adapter:Qwen/Qwen3.5-4B, license:apache-2.0, endpoints_compatible, region:us, conversational
Naphula/Ancient-Awakening-12B-MPOA
base_model:
- aixonlab/Aether-12b
- aixonlab/Zinakha-12b
- allura-org/Bigger-Body-12b
- allura-org/MN-12b-RP-Ink
- allura-org/remnant-mn-12b
- anthracite-org/magnum-v4-12b
- ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.2
- Babsie/Opulus-12B-v3
- BeaverAI/mistral-doryV2-12b
- crestf411/nemo-sunfall-v0.6.1
- EldritchLabs/Kraken-Karcher-12B-v1
- EpistemeAI2/Fireball-Mistral-Nemo-12B-Philos
- EpistemeAI/Mistral-Nemo-Instruct-12B-Philosophy-Math
- Fizzarolli/MN-12b-Rosier-v1
- HumanLLMs/Human-Like-Mistral-Nemo-Instruct-2407
- IIEleven11/Kalypso
- inflatebot/MN-12B-Mag-Mell-R1
- intervitens/mini-magnum-12b-v1.1
- jtatman/mistral_nemo_12b_reasoning_psychology_lora
- KOOWEEYUS/BlackSheep-RP-12B
- Lambent/Arsenic-Shahrazad-12B-v2
- Lambent/Arsenic-Shahrazad-12B-v3
- Lambent/arsenic-nemo-unleashed-12B
- Lambent/Gilded-Arsenic-12B
- LatitudeGames/Muse-12B
- mistralai/Mistral-Nemo-Instruct-2407
- Naphula/Riemannian-Redshift-12B-v1
- Naphula-Archives/F5-stage6-12B
- Naphula-Archives/F5-stage7-12B
- nbeerbower/Lyra-Gutenberg-mistral-nemo-12B
- nbeerbower/Lyra4-Gutenberg-12B
- nbeerbower/mistral-nemo-bophades-12B
- nbeerbower/mistral-nemo-gutenberg-12B-v3
- nbeerbower/mistral-nemo-gutenberg-12B-v4
- nbeerbower/Mistral-Nemo-Gutenberg-Doppel-12B
- nbeerbower/Mistral-Nemo-Gutenberg-Encore-12B
- nbeerbower/Mistral-Nemo-Gutenberg-Vitus-12B
- nbeerbower/mistral-nemo-wissenschaft-12B
- NeverSleepHistorical/lumi-nemo-e2.0
- NeverSleep/Lumimaid-v0.2-12B
- nothingiisreal/Celeste-12B-V1.6
- nothingiisreal/MN-12B-Celeste-V1.9
- PocketDoc/Dans-DangerousWinds-V1.1.0-12b
- ReadyArt/Dark-Nexus-12B-v2.0
- ReadyArt/Forgotten-Safeword-12B-v4.0
- ReadyArt/Omega-Darker_The-Final-Directive-12B
- romaingrx/red-teamer-mistral-nemo
- Sao10K/MN-12B-Lyra-v1
- Sao10K/MN-12B-Lyra-v4
- shisa-ai/shisa-v2-mistral-nemo-12b
- SicariusSicariiStuff/Impish_Bloodmoon_12B
- sleepdeprived3/Christian-Bible-Expert-v2.0-12B
- SuperbEmphasis/MN-12b-RP-Ink-RP-Longform
- SuperbEmphasis/Omega-Darker_The-Final-Directive-Longform-Stage2-ERP-12B-v0.2
- TheDrummer/Rivermind-12B-v1
- TheDrummer/Rocinante-12B-v1
- TheDrummer/Rocinante-X-12B-v1
- Trappu/Nemo-Picaro-12B
- Undi95/LocalC-12B-e2.0
- VAGOsolutions/SauerkrautLM-Nemo-12b-Instruct
- Vortex5/Astral-Noctra-12B
- Vortex5/Azure-Starlight-12B
- Vortex5/Crimson-Constellation-12B
- Vortex5/Red-Synthesis-12B
- Vortex5/Shining-Seraph-12B
- Vortex5/Starlit-Shadow-12B
- Vortex5/Vermilion-Sage-12B
- Vortex5/Scarlet-Seraph-12B
- Vortex5/Maroon-Sunset-12B
- Vortex5/Amber-Starlight-12B
language:
- en
library_name: transformers
license: apache-2.0
tags:
- creative
- creative writing
- fiction writing
- plot generation
- sub-plot generation
- fiction writing
- story generation
- scene continue
- storytelling
- fiction story
- science fiction
- romance
- all genres
- story
- writing
- vivid prosing
- vivid writing
- fiction
- roleplaying
- float32
- swearing
- rp
- horror
- mistral
- nemo
- merge
- mergekit
- karcher
- flux
- arcee_fusion
- ramplus_tl
- pdq
widget:
- text: "Ancient-Awakening-12B-MPOA"
output:
url: https://cdn-uploads.huggingface.co/production/uploads/68e840caa318194c44ec2a04/CvyWU1z106Aa__M8KIksp.png
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/68e840caa318194c44ec2a04/yI041gp0fzz7N_Mh_x5Pt.mpga"></audio>
[!WARNING]
<span style="color:red; font-weight:bold">⚠️ Warning:</span> This model works best with either the ChatML or Mistral Tekken chat template. The uncensored MPOA version has guardrails removed, which can produce narratives and RP that contain violent and graphic erotic content. Adjust your system prompt accordingly.
<!DOCTYPE html>
<style>
body {
font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
color: #D1D5DB; /* Pale stone gray */
line-height: 1.6;
margin: 0;
padding: 0;
background-color: #0A0C10; /* Very dark stormy gray/black */
}
b, strong {
color: #FBBF24; /* Glowing amber/gold */
text-shadow: 0 0 8px rgba(251, 191, 36, 0.4);
}
.awakening-text {
color: #FEF3C7; /* Pale inner-eye yellow */
position: relative;
z-index: 2;
margin-left: 0.2em;
text-shadow: 0 0 15px #F59E0B, 0 0 30px #B45309; /* Deep fiery orange/gold glow */
font-size: 1.8rem;
letter-spacing: 1px;
font-weight: 600;
}
/* Section styling */
.section-container {
background-color: rgba(17, 24, 39, 0.85); /* Dark slate rock */
margin-bottom: 30px;
position: relative;
overflow: hidden;
border-bottom: 1px solid #78350F; /* Dark bronze/earth */
box-shadow: 0 4px 20px rgba(0, 0, 0, 0.6);
}
.section-header {
display: flex;
align-items: center;
background-color: rgba(245, 158, 11, 0.05); /* Faint amber tint */
padding: 10px 20px;
border-top: 1px solid rgba(120, 53, 15, 0.4);
}
.section-indicator {
width: 8px;
height: 20px;
background-color: #F59E0B; /* Amber eye color */
margin-right: 15px;
box-shadow: 0 0 10px rgba(245, 158, 11, 0.6);
border-radius: 2px;
}
.section-title {
font-family: 'Georgia', 'Times New Roman', serif; /* Ancient tome feel */
color: #FDE68A; /* Light gold */
font-size: 1.4rem;
margin: 0;
letter-spacing: 1px;
font-weight: 400;
text-transform: capitalize;
}
.section-content {
padding: 20px;
font-family: sans-serif;
color: #D1D5DB;
line-height: 1.6;
}
/* Title styling */
.title-container {
background-color: #050505; /* Pitch black */
position: relative;
overflow: hidden;
margin-bottom: 40px;
border-left: 4px solid #F59E0B; /* Amber pillar */
box-shadow: 0 6px 25px rgba(245, 158, 11, 0.15);
}
.title-wrapper {
position: relative;
z-index: 2;
padding: 25px 20px 30px 30px;
font-family: 'Georgia', 'Times New Roman', serif;
}
.title-main {
color: #FEF3C7;
font-size: 2.0rem;
font-weight: 700;
margin: 0;
letter-spacing: 2px;
display: inline-block;
position: relative;
text-transform: uppercase;
}
.storm-overlay {
position: absolute;
top: 0;
left: 0;
width: 100%;
height: 100%;
/* Dark, brooding radial fog mimicking the eye's aura */
background-image: radial-gradient(circle at 50% 50%, rgba(245, 158, 11, 0.08) 0%, rgba(0,0,0,0.9) 80%);
z-index: 1;
}
/* Subheading styling */
.subheading {
color: #D97706; /* Deep orange */
font-size: 1.1rem;
margin-top: 20px;
margin-bottom: 15px;
font-weight: 400;
border-bottom: 1px dashed rgba(217, 119, 6, 0.4);
display: inline-block;
text-transform: uppercase;
letter-spacing: 1px;
font-family: 'Georgia', 'Times New Roman', serif;
}
/* Links */
a {
color: #FBBF24; /* Amber */
text-decoration: none;
transition: color 0.3s ease, text-shadow 0.3s ease;
}
a:hover {
text-decoration: underline;
color: #FDE68A; /* Brighter gold */
text-shadow: 0 0 8px rgba(251, 191, 36, 0.5);
}
/* Container */
.container {
max-width: 1200px;
margin: 20px auto;
padding: 40px 20px;
background-color: #0D1117; /* Deep stormy night */
background-image:
radial-gradient(circle at 15% 85%, rgba(120, 53, 15, 0.1) 0%, transparent 50%),
radial-gradient(circle at 85% 15%, rgba(245, 158, 11, 0.05) 0%, transparent 50%);
min-height: calc(100vh - 40px);
border: 1px solid #1F2937; /* Dark stone border */
border-radius: 8px;
box-shadow: 0 8px 40px rgba(0, 0, 0, 0.9), inset 0 0 20px rgba(0, 0, 0, 0.5);
}
/* Code blocks */
pre {
background-color: #050505; /* Pitch black */
border: 1px solid #1F2937; /* Dark stone */
border-left: 3px solid #92400E; /* Dark orange/brown */
padding: 15px;
border-radius: 4px;
color: #D1D5DB;
overflow-x: auto;
}
code {
font-family: 'Courier New', Courier, monospace;
color: #FBBF24; /* Amber */
background-color: rgba(245, 158, 11, 0.08);
padding: 2px 4px;
border-radius: 3px;
}
pre code {
color: #00FFFF;
background-color: transparent;
padding: 0;
}
</style>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Ancient Awakening 12B MPOA</title>
</head>
<body>
<div class="container">
<div class="title-container">
<div class="storm-overlay"></div>
<div class="title-wrapper">
<h2 class="title-main">
<span class="awakening-text">🧬 Ancient Awakening 12B MPOA</span>
</h2>
</div>
</div>
<img src="https://cdn-uploads.huggingface.co/production/uploads/68e840caa318194c44ec2a04/CvyWU1z106Aa__M8KIksp.png"
alt="Ancient Awakening"
style="display: block; margin: 0 auto 30px auto; max-width: 100%; height: auto; border-radius: 5px; border: 1px solid #1F2937; box-shadow: 0 0 25px rgba(245, 158, 11, 0.15);">
<div class="section-container">
<div class="section-header">
<div class="section-indicator"></div>
<h2 class="section-title">Overview</h2>
</div>
<div class="section-content"><font face="verdana">
This is a merge of pre-trained language models created using <a href="https://github.com/cg123/mergekit">mergekit</a>.
</div>
</div>
<div class="section-container">
<div class="section-header">
<div class="section-indicator"></div>
<h2 class="section-title">System Prompt (Optional)</h2>
</div>
<div class="section-content"><font face="verdana">
<p>You are the "Ancient One," a colossal, primordial entity of living stone, deep magic, and abyssal ocean. For countless millennia, you have slumbered in a state of suspended animation, your massive, jagged body mistaken for a remote, floating island amidst a perpetually stormy sea. You are older than recorded history, older than the gods of men. The ANCIENT AWAKENING marks your current state: you are finally opening your single, massive, reptilian eye. You are a geological anomaly made sentient.</p>
</div>
</div>
<div class="section-container">
<div class="section-header">
<div class="section-indicator"></div>
<h2 class="section-title">Merge Details</h2>
</div>
<div class="section-content"><font face="verdana">
<b>Merge Methods</b><br>
This model was synthesized using a complex multi-stage process involving the following methods:
<ul>
<li><a href="https://en.wikipedia.org/wiki/Karcher_mean">karcher</a></li>
<li><a href="https://huggingface.co/24B-Suite/Mergedonia-Suite-24B-v1/discussions/2">flux</a></li>
<li><a href="https://www.arcee.ai/blog/meet-mergekit-v0-1-arcee-fusion-expanded-model-support-multi-gpu-acceleration">arcee_fusion</a></li>
<li><a href="https://arxiv.org/abs/2601.13572">ramplus_tl [Reinforced Agent Merging Plus (Tensor-Local)]</a></li>
<li><a href="https://huggingface.co/24B-Suite/Mergedonia-Suite-24B-v1/discussions/2">pdq</a></li>
</ul>
<br>The <a href="https://huggingface.co/spaces/Naphula/model_tools/blob/main/graph_v18.py">graph_v18.py</a> patch was helpful to use 8GB VRAM for acceleration.
<hr>
<b>Models Merged</b><br>
The following 70 models were woven into this merge:<br><br>
<details>
<summary style="cursor: pointer; color: #FBBF24; font-weight: bold;">Show 70 Donor Models</summary>
<ul>
<li><a href="https://huggingface.co/aixonlab/Aether-12b">aixonlab/Aether-12b</a></li>
<li><a href="https://huggingface.co/aixonlab/Zinakha-12b">aixonlab/Zinakha-12b</a></li>
<li><a href="https://huggingface.co/allura-org/Bigger-Body-12b">allura-org/Bigger-Body-12b</a></li>
<li><a href="https://huggingface.co/allura-org/MN-12b-RP-Ink">allura-org/MN-12b-RP-Ink</a></li>
<li><a href="https://huggingface.co/allura-org/remnant-mn-12b">allura-org/remnant-mn-12b</a></li>
<li><a href="https://huggingface.co/anthracite-org/magnum-v4-12b">anthracite-org/magnum-v4-12b</a></li>
<li><a href="https://huggingface.co/ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.2">ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.2</a></li>
<li><a href="https://huggingface.co/Babsie/Opulus-12B-v3">Babsie/Opulus-12B-v3</a></li>
<li><a href="https://huggingface.co/BeaverAI/mistral-doryV2-12b">BeaverAI/mistral-doryV2-12b</a></li>
<li><a href="https://huggingface.co/crestf411/nemo-sunfall-v0.6.1">crestf411/nemo-sunfall-v0.6.1</a></li>
<li><a href="https://huggingface.co/EldritchLabs/Kraken-Karcher-12B-v1">EldritchLabs/Kraken-Karcher-12B-v1</a></li>
<li><a href="https://huggingface.co/EpistemeAI2/Fireball-Mistral-Nemo-12B-Philos">EpistemeAI2/Fireball-Mistral-Nemo-12B-Philos</a></li>
<li><a href="https://huggingface.co/EpistemeAI/Mistral-Nemo-Instruct-12B-Philosophy-Math">EpistemeAI/Mistral-Nemo-Instruct-12B-Philosophy-Math</a></li>
<li><a href="https://huggingface.co/Fizzarolli/MN-12b-Rosier-v1">Fizzarolli/MN-12b-Rosier-v1</a></li>
<li><a href="https://huggingface.co/HumanLLMs/Human-Like-Mistral-Nemo-Instruct-2407">HumanLLMs/Human-Like-Mistral-Nemo-Instruct-2407</a></li>
<li><a href="https://huggingface.co/IIEleven11/Kalypso">IIEleven11/Kalypso</a></li>
<li><a href="https://huggingface.co/inflatebot/MN-12B-Mag-Mell-R1">inflatebot/MN-12B-Mag-Mell-R1</a></li>
<li><a href="https://huggingface.co/intervitens/mini-magnum-12b-v1.1">intervitens/mini-magnum-12b-v1.1</a></li>
<li><a href="https://huggingface.co/jtatman/mistral_nemo_12b_reasoning_psychology_lora">jtatman/mistral_nemo_12b_reasoning_psychology_lora</a></li>
<li><a href="https://huggingface.co/KOOWEEYUS/BlackSheep-RP-12B">KOOWEEYUS/BlackSheep-RP-12B</a></li>
<li><a href="https://huggingface.co/Lambent/Arsenic-Shahrazad-12B-v2">Lambent/Arsenic-Shahrazad-12B-v2</a></li>
<li><a href="https://huggingface.co/Lambent/Arsenic-Shahrazad-12B-v3">Lambent/Arsenic-Shahrazad-12B-v3</a></li>
<li><a href="https://huggingface.co/Lambent/arsenic-nemo-unleashed-12B">Lambent/arsenic-nemo-unleashed-12B</a></li>
<li><a href="https://huggingface.co/Lambent/Gilded-Arsenic-12B">Lambent/Gilded-Arsenic-12B</a></li>
<li><a href="https://huggingface.co/LatitudeGames/Muse-12B">LatitudeGames/Muse-12B</a></li>
<li><a href="https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407">mistralai/Mistral-Nemo-Instruct-2407</a></li>
<li><a href="https://huggingface.co/Naphula/Riemannian-Redshift-12B-v1">Naphula/Riemannian-Redshift-12B-v1</a></li>
<li><a href="https://huggingface.co/Naphula-Archives/F5-stage6-12B">Naphula-Archives/F5-stage6-12B</a></li>
<li><a href="https://huggingface.co/Naphula-Archives/F5-stage7-12B">Naphula-Archives/F5-stage7-12B</a></li>
<li><a href="https://huggingface.co/nbeerbower/Lyra-Gutenberg-mistral-nemo-12B">nbeerbower/Lyra-Gutenberg-mistral-nemo-12B</a></li>
<li><a href="https://huggingface.co/nbeerbower/Lyra4-Gutenberg-12B">nbeerbower/Lyra4-Gutenberg-12B</a></li>
<li><a href="https://huggingface.co/nbeerbower/mistral-nemo-bophades-12B">nbeerbower/mistral-nemo-bophades-12B</a></li>
<li><a href="https://huggingface.co/nbeerbower/mistral-nemo-gutenberg-12B-v3">nbeerbower/mistral-nemo-gutenberg-12B-v3</a></li>
<li><a href="https://huggingface.co/nbeerbower/mistral-nemo-gutenberg-12B-v4">nbeerbower/mistral-nemo-gutenberg-12B-v4</a></li>
<li><a href="https://huggingface.co/nbeerbower/Mistral-Nemo-Gutenberg-Doppel-12B">nbeerbower/Mistral-Nemo-Gutenberg-Doppel-12B</a></li>
<li><a href="https://huggingface.co/nbeerbower/Mistral-Nemo-Gutenberg-Encore-12B">nbeerbower/Mistral-Nemo-Gutenberg-Encore-12B</a></li>
<li><a href="https://huggingface.co/nbeerbower/Mistral-Nemo-Gutenberg-Vitus-12B">nbeerbower/Mistral-Nemo-Gutenberg-Vitus-12B</a></li>
<li><a href="https://huggingface.co/nbeerbower/mistral-nemo-wissenschaft-12B">nbeerbower/mistral-nemo-wissenschaft-12B</a></li>
<li><a href="https://huggingface.co/NeverSleepHistorical/lumi-nemo-e2.0">NeverSleepHistorical/lumi-nemo-e2.0</a></li>
<li><a href="https://huggingface.co/NeverSleep/Lumimaid-v0.2-12B">NeverSleep/Lumimaid-v0.2-12B</a></li>
<li><a href="https://huggingface.co/nothingiisreal/Celeste-12B-V1.6">nothingiisreal/Celeste-12B-V1.6</a></li>
<li><a href="https://huggingface.co/nothingiisreal/MN-12B-Celeste-V1.9">nothingiisreal/MN-12B-Celeste-V1.9</a></li>
<li><a href="https://huggingface.co/PocketDoc/Dans-DangerousWinds-V1.1.0-12b">PocketDoc/Dans-DangerousWinds-V1.1.0-12b</a></li>
<li><a href="https://huggingface.co/ReadyArt/Dark-Nexus-12B-v2.0">ReadyArt/Dark-Nexus-12B-v2.0</a></li>
<li><a href="https://huggingface.co/ReadyArt/Forgotten-Safeword-12B-v4.0">ReadyArt/Forgotten-Safeword-12B-v4.0</a></li>
<li><a href="https://huggingface.co/ReadyArt/Omega-Darker_The-Final-Directive-12B">ReadyArt/Omega-Darker_The-Final-Directive-12B</a></li>
<li><a href="https://huggingface.co/romaingrx/red-teamer-mistral-nemo">romaingrx/red-teamer-mistral-nemo</a></li>
<li><a href="https://huggingface.co/Sao10K/MN-12B-Lyra-v1">Sao10K/MN-12B-Lyra-v1</a></li>
<li><a href="https://huggingface.co/Sao10K/MN-12B-Lyra-v4">Sao10K/MN-12B-Lyra-v4</a></li>
<li><a href="https://huggingface.co/shisa-ai/shisa-v2-mistral-nemo-12b">shisa-ai/shisa-v2-mistral-nemo-12b</a></li>
<li><a href="https://huggingface.co/SicariusSicariiStuff/Impish_Bloodmoon_12B">SicariusSicariiStuff/Impish_Bloodmoon_12B</a></li>
<li><a href="https://huggingface.co/sleepdeprived3/Christian-Bible-Expert-v2.0-12B">sleepdeprived3/Christian-Bible-Expert-v2.0-12B</a></li>
<li><a href="https://huggingface.co/SuperbEmphasis/MN-12b-RP-Ink-RP-Longform">SuperbEmphasis/MN-12b-RP-Ink-RP-Longform</a></li>
<li><a href="https://huggingface.co/SuperbEmphasis/Omega-Darker_The-Final-Directive-Longform-Stage2-ERP-12B-v0.2">SuperbEmphasis/Omega-Darker_The-Final-Directive-Longform-Stage2-ERP-12B-v0.2</a></li>
<li><a href="https://huggingface.co/TheDrummer/Rivermind-12B-v1">TheDrummer/Rivermind-12B-v1</a></li>
<li><a href="https://huggingface.co/TheDrummer/Rocinante-12B-v1">TheDrummer/Rocinante-12B-v1</a></li>
<li><a href="https://huggingface.co/TheDrummer/Rocinante-X-12B-v1">TheDrummer/Rocinante-X-12B-v1</a></li>
<li><a href="https://huggingface.co/Trappu/Nemo-Picaro-12B">Trappu/Nemo-Picaro-12B</a></li>
<li><a href="https://huggingface.co/Undi95/LocalC-12B-e2.0">Undi95/LocalC-12B-e2.0</a></li>
<li><a href="https://huggingface.co/VAGOsolutions/SauerkrautLM-Nemo-12b-Instruct">VAGOsolutions/SauerkrautLM-Nemo-12b-Instruct</a></li>
<li><a href="https://huggingface.co/Vortex5/Astral-Noctra-12B">Vortex5/Astral-Noctra-12B</a></li>
<li><a href="https://huggingface.co/Vortex5/Azure-Starlight-12B">Vortex5/Azure-Starlight-12B</a></li>
<li><a href="https://huggingface.co/Vortex5/Crimson-Constellation-12B">Vortex5/Crimson-Constellation-12B</a></li>
<li><a href="https://huggingface.co/Vortex5/Red-Synthesis-12B">Vortex5/Red-Synthesis-12B</a></li>
<li><a href="https://huggingface.co/Vortex5/Shining-Seraph-12B">Vortex5/Shining-Seraph-12B</a></li>
<li><a href="https://huggingface.co/Vortex5/Starlit-Shadow-12B">Vortex5/Starlit-Shadow-12B</a></li>
<li><a href="https://huggingface.co/Vortex5/Vermilion-Sage-12B">Vortex5/Vermilion-Sage-12B</a></li>
<li><a href="https://huggingface.co/Vortex5/Scarlet-Seraph-12B">Vortex5/Scarlet-Seraph-12B</a></li>
<li><a href="https://huggingface.co/Vortex5/Maroon-Sunset-12B">Vortex5/Maroon-Sunset-12B</a></li>
<li><a href="https://huggingface.co/Vortex5/Amber-Starlight-12B">Vortex5/Amber-Starlight-12B</a></li>
</ul>
</div>
</details>
</div>
<div class="section-container">
<div class="section-header">
<div class="section-indicator"></div>
<h2 class="section-title">Merge Pipeline & Configuration</h2>
</div>
<div class="section-content">
<p><b>🧬 Ancient Awakening 12B</b> unites several methods and 70 models into one:</p>
<ol>
<li><a href="https://huggingface.co/EldritchLabs/Kraken-Karcher-12B-v1">🦑 Kraken Karcher v1</a>: Combines 53 <a href="https://huggingface.co/models?other=base_model:finetune:mistralai/Mistral-Nemo-Instruct-2407">Mistral Nemo finetunes</a> via the <code>karcher</code> method at 500 iterations</li>
<li><a href="https://huggingface.co/Naphula/Riemannian-Redshift-12B-v1">🌌 Riemannian Redshift v1</a>: Combines 10 <a href="https://huggingface.co/Vortex5">Vortex5</a> merges (which contain custom methods like <code>saef</code>, <code>smi_oni</code>, and <code>hpq</code>) via the <code>karcher</code> method at 1000 iterations</li>
<li>RedKFlux: <code>flux</code> merge of Kraken with Redshift at 1000 iterations</li>
<li>RedKFluxMell: <code>arcee_fusion</code> merge of #3 with <a href="https://huggingface.co/inflatebot/MN-12B-Mag-Mell-R1">Mag-Mell</a></li>
<li>BloodKraken: <code>arcee_fusion</code> merge of #4 with <a href="https://huggingface.co/SicariusSicariiStuff/Impish_Bloodmoon_12B">Impish Bloodmoon</a></li>
<li><a href="https://huggingface.co/Naphula-Archives/F5-stage6-12B">F5-stage6</a>: <code>arcee_fusion</code> merge of #5 with <a href="https://huggingface.co/LatitudeGames/Muse-12B">Muse</a></li>
<li><a href="https://huggingface.co/Naphula-Archives/F5-stage7-12B">F5-stage7</a>: <code>ramplus_tl</code> merge of #6 with #3</li>
<li><a href="https://huggingface.co/Naphula/Ancient-Awakening-12B">🧬 Ancient Awakening 12B</a>: <code>pdq</code> merge of #7 with #6, #3, #2, #1, Mag-Mell, Impish-Bloodmoon, and Muse</li>
<li><code>mpoa</code> <a href="https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration">ablation</a> applied to remove censorship <a href="https://huggingface.co/Naphula/Ancient-Awakening-12B-MPOA">(released seperately)</a></i></li>
<b>Note:</b> If you encounter any issues with the model then you can try using F5-stage6 or stage7 merges as these are likely more stable.
</ol>
<hr>
<h3 class="subheading">Stage 1: 🦑 Kraken Karcher</h3>
<pre><code>base_model: B:/12B/models--mistralai--Mistral-Nemo-Instruct-2407
models:
- model: B:/12B/models--aixonlab--Aether-12b
- model: B:/12B/models--aixonlab--Zinakha-12b
- model: B:/12B/models--allura-org--Bigger-Body-12b
- model: B:/12B/models--allura-org--MN-12b-RP-Ink
- model: B:/12B/models--allura-org--remnant-mn-12b
- model: B:/12B/models--anthracite-org--magnum-v4-12b
- model: B:/12B/models--ArliAI--Mistral-Nemo-12B-ArliAI-RPMax-v1.2
- model: B:/12B/models--Babsie--Opulus-12B-v3
- model: B:/12B/models--BeaverAI--mistral-doryV2-12b
- model: B:/12B/models--crestf411--nemo-sunfall-v0.6.1
- model: B:/12B/models--EpistemeAI2--Fireball-Mistral-Nemo-12B-Philos
- model: B:/12B/models--EpistemeAI--Mistral-Nemo-Instruct-12B-Philosophy-Math
- model: B:/12B/models--Fizzarolli--MN-12b-Rosier-v1
- model: B:/12B/models--HumanLLMs--Human-Like-Mistral-Nemo-Instruct-2407
- model: B:/12B/models--IIEleven11--Kalypso
- model: B:/12B/models--intervitens--mini-magnum-12b-v1.1
- model: B:/12B/models--jtatman--mistral_nemo_12b_reasoning_psychology_lora
- model: B:/12B/models--KOOWEEYUS--BlackSheep-RP-12B
- model: B:/12B/models--Lambent--Arsenic-Shahrazad-12B-v2
- model: B:/12B/models--Lambent--Arsenic-Shahrazad-12B-v3
- model: B:/12B/models--Lambent--arsenic-nemo-unleashed-12B
- model: B:/12B/models--Lambent--Gilded-Arsenic-12B
- model: B:/12B/models--mistralai--Mistral-Nemo-Instruct-2407
- model: B:/12B/models--nbeerbower--Lyra-Gutenberg-mistral-nemo-12B
- model: B:/12B/models--nbeerbower--Lyra4-Gutenberg-12B
- model: B:/12B/models--nbeerbower--mistral-nemo-bophades-12B
- model: B:/12B/models--nbeerbower--mistral-nemo-gutenberg-12B-v3
- model: B:/12B/models--nbeerbower--mistral-nemo-gutenberg-12B-v4
- model: B:/12B/models--nbeerbower--Mistral-Nemo-Gutenberg-Doppel-12B
- model: B:/12B/models--nbeerbower--Mistral-Nemo-Gutenberg-Encore-12B
- model: B:/12B/models--nbeerbower--Mistral-Nemo-Gutenberg-Vitus-12B
- model: B:/12B/models--nbeerbower--mistral-nemo-wissenschaft-12B
- model: B:/12B/models--NeverSleepHistorical--lumi-nemo-e2.0
- model: B:/12B/models--NeverSleep--Lumimaid-v0.2-12B
- model: B:/12B/models--nothingiisreal--Celeste-12B-V1.6
- model: B:/12B/models--nothingiisreal--MN-12B-Celeste-V1.9
- model: B:/12B/models--PocketDoc--Dans-DangerousWinds-V1.1.0-12b
- model: B:/12B/models--ReadyArt--Dark-Nexus-12B-v2.0
- model: B:/12B/models--ReadyArt--Forgotten-Safeword-12B-v4.0
- model: B:/12B/models--ReadyArt--Omega-Darker_The-Final-Directive-12B
- model: B:/12B/models--romaingrx--red-teamer-mistral-nemo
- model: B:/12B/models--Sao10K--MN-12B-Lyra-v1
- model: B:/12B/models--Sao10K--MN-12B-Lyra-v4
- model: B:/12B/models--shisa-ai--shisa-v2-mistral-nemo-12b
- model: B:/12B/models--sleepdeprived3--Christian-Bible-Expert-v2.0-12B
- model: B:/12B/models--SuperbEmphasis--MN-12b-RP-Ink-RP-Longform
- model: B:/12B/models--SuperbEmphasis--Omega-Darker_The-Final-Directive-Longform-Stage2-ERP-12B-v0.2
- model: B:/12B/models--TheDrummer--Rivermind-12B-v1
- model: B:/12B/models--TheDrummer--Rocinante-12B-v1
- model: B:/12B/models--TheDrummer--Rocinante-X-12B-v1
- model: B:/12B/models--Trappu--Nemo-Picaro-12B
- model: B:/12B/models--Undi95--LocalC-12B-e2.0
- model: B:/12B/models--VAGOsolutions--SauerkrautLM-Nemo-12b-Instruct
merge_method: karcher
parameters:
max_iter: 500
tol: 1.0e-9
dtype: float32
out_dtype: bfloat16
tokenizer:
source: union
chat_template: auto
name: 🦑 Kraken-Karcher-12B-v1</code></pre>
<h3 class="subheading">Stage 2: 🌌 Riemannian Redshift</h3>
<pre><code>models:
- model: B:/12B/models--Vortex5--Astral-Noctra-12B
- model: B:/12B/models--Vortex5--Azure-Starlight-12B
- model: B:/12B/models--Vortex5--Crimson-Constellation-12B
- model: B:/12B/models--Vortex5--Red-Synthesis-12B
- model: B:/12B/models--Vortex5--Shining-Seraph-12B
- model: B:/12B/models--Vortex5--Starlit-Shadow-12B
- model: B:/12B/models--Vortex5--Vermilion-Sage-12B
- model: B:/12B/models--Vortex5--Scarlet-Seraph-12B
- model: B:/12B/models--Vortex5--Maroon-Sunset-12B
- model: B:/12B/models--Vortex5--Amber-Starlight-12B
merge_method: karcher
parameters:
max_iter: 1000
tol: 1.0e-9
dtype: float32
out_dtype: bfloat16
tokenizer:
source: union
chat_template: auto
name: 🌌 Riemannian-Redshift-12B-v1</code></pre>
<h3 class="subheading">Stage 3: RedKFlux</h3>
<pre><code>models:
- model: C:\mergekit-main\merged_model_redshift
- model: C:\mergekit-main\merged_model_kraken_karcher
merge_method: flux
parameters:
eta: 1.2
tol: 1.0e-9
max_iter: 1000
kappa: 0.8
dtype: float32
out_dtype: bfloat16
tokenizer:
source: union
chat_template: auto
name: RedKFlux</code></pre>
<h3 class="subheading">Stage 4: RedKFluxMell</h3>
<pre><code>models:
- model: C:\mergekit-main\merged_model_RedKFlux
- model: B:\8B\models--inflatebot--MN-12B-Mag-Mell-R1
merge_method: arcee_fusion
tukey_fence: 1.5
base_model: C:\mergekit-main\merged_model_RedKFlux
dtype: float32
out_dtype: bfloat16
tokenizer:
source: base
name: RedKFluxMell</code></pre>
<h3 class="subheading">Stage 5: BloodKraken</h3>
<pre><code>models:
- model: C:\mergekit-main\merged_model_RedKFluxMell
- model: B:\8B\models--SicariusSicariiStuff--Impish_Bloodmoon_12B
merge_method: arcee_fusion
tukey_fence: 1.5
base_model: C:\mergekit-main\merged_model_RedKFluxMell
dtype: float32
out_dtype: bfloat16
tokenizer:
source: base
name: BloodKraken</code></pre>
<h3 class="subheading">Stage 6: BloodKrakenMuse</h3>
<pre><code>models:
- model: C:\mergekit-main\merged_model_BloodKraken
- model: B:\8B\models--LatitudeGames--Muse-12B
merge_method: arcee_fusion
tukey_fence: 1.5
base_model: C:\mergekit-main\merged_model_BloodKraken
dtype: float32
out_dtype: bfloat16
tokenizer:
source: base
name: BloodKrakenMuse</code></pre>
<h3 class="subheading">Stage 7: Ramplus_tl</h3>
<pre><code>merge_method: ramplus_tl
base_model: C:\mergekit-main\merged_model_BloodKrakenMuse
models:
- model: C:\mergekit-main\merged_model_BloodKrakenMuse
- model: C:\mergekit-main\merged_model_RedKFlux
parameters:
epsilon: 0.001 # Increased from 1e-5 to 1e-3 for denser SFT/DPO task vectors
r: 0.25 # Increased from 0.1 to 0.2-0.3 for better SFT behavior preservation
alpha: 0.4 # Increased from 0.2 to 0.4 for enhanced rescaling
dtype: float32
out_dtype: bfloat16
tokenizer:
source: base
name: Stage7</code></pre>
<h3 class="subheading">Stage 8: 🧬 Ancient Awakening</h3>
<pre><code>merge_method: pdq
pdq_base_yaml: C:\mergekit-main\stage7.yaml
pdq_base_model: C:\mergekit-main\merged_model_stage7
output_dir: C:\mergekit-main\stage8_pdq
base_model: C:\mergekit-main\merged_model_BloodKrakenMuse
models:
- model: C:\mergekit-main\merged_model_BloodKrakenMuse
- model: B:\12B\models--LatitudeGames--Muse-12B
- model: B:\12B\models--SicariusSicariiStuff--Impish_Bloodmoon_12B
- model: B:\12B\models--inflatebot--MN-12B-Mag-Mell-R1
- model: C:\mergekit-main\merged_model_RedKFlux
- model: C:\mergekit-main\merged_model_redshift
- model: C:\mergekit-main\merged_model_kraken_karcher
parameters:
chi: 0.15
iota: 0.1
nu: 24
gamma: 1.0
zeta: 16
sigma: 0.5
density: 0.9
epsilon: 0.099
lambda: 1.0
lazy_unpickle: True
random_seed: 420
name: 🧬 Ancient-Awakening-12B</code></pre>
<h3 class="subheading">Stage 9: Magnitude-Preserving Othogonalized Ablation</h3>
<pre><code># python measure.py -m C:\mergekit-main\f8_pdq -o C:\mergekit-main\f8_pdq\ablit_proj --batch-size 8 --projected
# python analyze_old.py C:\mergekit-main\f8_pdq\ablit_proj -c
# sharded_ablate.py magmell.yml --normpreserve --projected
#
# The model to be ablated.
model: C:\mergekit-main\f8_pdq
#
# The measurement file generated by measure.py for the model.
measurements: C:\mergekit-main\f8_pdq\ablit_proj
#
# The directory where the new, ablated model will be saved.
output: C:\mergekit-main\f8_pdq\ablit_biproj\
#
# The list of ablation operations to perform.
# Strategy: Use the single best refusal direction from the peak signal layer (29)
# and apply it across all relevant mid-to-late layers.
ablate:
# Start ablating from the mid-layers where the signal begins to strengthen.
- layer: 0
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 1
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 2
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 3
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 4
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 5
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 6
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 7
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 8
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 9
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 10
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 11
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 12
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 13
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 14
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 15
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 16
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 17
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 18
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 19
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 20
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 21
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 22
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 23
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 24
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 25
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 26
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 27
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 28
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 29
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 30
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 31
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 32
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 33
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 34
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 35
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 36
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 37
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 38
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 39
measurement: 29
scale: 1.2
sparsity: 0.00</code></pre>
</div>
</div>
</body>
</html>
Author: Naphula
Likes: 3
Downloads: 0
Tags: transformers, safetensors, mistral, text-generation, creative, creative writing, fiction writing, plot generation, sub-plot generation, story generation, scene continue, storytelling, fiction story, science fiction, romance, all genres, story, writing, vivid prosing, vivid writing, fiction, roleplaying, float32, swearing, rp, horror, nemo, merge, mergekit, karcher, flux, arcee_fusion, ramplus_tl, pdq, conversational, en, arxiv:2601.13572, base_model:ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.2, base_model:merge:ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.2, base_model:Babsie/Opulus-12B-v3, base_model:merge:Babsie/Opulus-12B-v3, base_model:BeaverAI/mistral-doryV2-12b, base_model:merge:BeaverAI/mistral-doryV2-12b, base_model:EldritchLabs/Kraken-Karcher-12B-v1, base_model:merge:EldritchLabs/Kraken-Karcher-12B-v1, base_model:EpistemeAI/Mistral-Nemo-Instruct-12B-Philosophy-Math, base_model:merge:EpistemeAI/Mistral-Nemo-Instruct-12B-Philosophy-Math, base_model:EpistemeAI2/Fireball-Mistral-Nemo-12B-Philos, base_model:merge:EpistemeAI2/Fireball-Mistral-Nemo-12B-Philos, base_model:Fizzarolli/MN-12b-Rosier-v1, base_model:merge:Fizzarolli/MN-12b-Rosier-v1, base_model:HumanLLMs/Human-Like-Mistral-Nemo-Instruct-2407, base_model:merge:HumanLLMs/Human-Like-Mistral-Nemo-Instruct-2407, base_model:IIEleven11/Kalypso, base_model:merge:IIEleven11/Kalypso, base_model:KOOWEEYUS/BlackSheep-RP-12B, base_model:merge:KOOWEEYUS/BlackSheep-RP-12B, base_model:Lambent/Arsenic-Shahrazad-12B-v2, base_model:merge:Lambent/Arsenic-Shahrazad-12B-v2, base_model:Lambent/Arsenic-Shahrazad-12B-v3, base_model:merge:Lambent/Arsenic-Shahrazad-12B-v3, base_model:Lambent/Gilded-Arsenic-12B, base_model:merge:Lambent/Gilded-Arsenic-12B, base_model:Lambent/arsenic-nemo-unleashed-12B, base_model:merge:Lambent/arsenic-nemo-unleashed-12B, base_model:LatitudeGames/Muse-12B, base_model:merge:LatitudeGames/Muse-12B, base_model:Naphula-Archives/F5-stage6-12B, base_model:merge:Naphula-Archives/F5-stage6-12B, base_model:Naphula-Archives/F5-stage7-12B, base_model:merge:Naphula-Archives/F5-stage7-12B, base_model:Naphula/Riemannian-Redshift-12B-v1, base_model:merge:Naphula/Riemannian-Redshift-12B-v1, base_model:NeverSleep/Lumimaid-v0.2-12B, base_model:merge:NeverSleep/Lumimaid-v0.2-12B, base_model:NeverSleepHistorical/lumi-nemo-e2.0, base_model:merge:NeverSleepHistorical/lumi-nemo-e2.0, base_model:PocketDoc/Dans-DangerousWinds-V1.1.0-12b, base_model:merge:PocketDoc/Dans-DangerousWinds-V1.1.0-12b, base_model:ReadyArt/Dark-Nexus-12B-v2.0, base_model:merge:ReadyArt/Dark-Nexus-12B-v2.0, base_model:ReadyArt/Forgotten-Safeword-12B-v4.0, base_model:merge:ReadyArt/Forgotten-Safeword-12B-v4.0, base_model:ReadyArt/Omega-Darker_The-Final-Directive-12B, base_model:merge:ReadyArt/Omega-Darker_The-Final-Directive-12B, base_model:Sao10K/MN-12B-Lyra-v1, base_model:merge:Sao10K/MN-12B-Lyra-v1, base_model:Sao10K/MN-12B-Lyra-v4, base_model:merge:Sao10K/MN-12B-Lyra-v4, base_model:SicariusSicariiStuff/Impish_Bloodmoon_12B, base_model:merge:SicariusSicariiStuff/Impish_Bloodmoon_12B, base_model:SuperbEmphasis/MN-12b-RP-Ink-RP-Longform, base_model:merge:SuperbEmphasis/MN-12b-RP-Ink-RP-Longform, base_model:SuperbEmphasis/Omega-Darker_The-Final-Directive-Longform-Stage2-ERP-12B-v0.2, base_model:merge:SuperbEmphasis/Omega-Darker_The-Final-Directive-Longform-Stage2-ERP-12B-v0.2, base_model:TheDrummer/Rivermind-12B-v1, base_model:merge:TheDrummer/Rivermind-12B-v1, base_model:TheDrummer/Rocinante-12B-v1, base_model:merge:TheDrummer/Rocinante-12B-v1, base_model:TheDrummer/Rocinante-X-12B-v1, base_model:merge:TheDrummer/Rocinante-X-12B-v1, base_model:Trappu/Nemo-Picaro-12B, base_model:merge:Trappu/Nemo-Picaro-12B, base_model:Undi95/LocalC-12B-e2.0, base_model:merge:Undi95/LocalC-12B-e2.0, base_model:VAGOsolutions/SauerkrautLM-Nemo-12b-Instruct, base_model:merge:VAGOsolutions/SauerkrautLM-Nemo-12b-Instruct, base_model:Vortex5/Amber-Starlight-12B, base_model:merge:Vortex5/Amber-Starlight-12B, base_model:Vortex5/Astral-Noctra-12B, base_model:merge:Vortex5/Astral-Noctra-12B, base_model:Vortex5/Azure-Starlight-12B, base_model:merge:Vortex5/Azure-Starlight-12B, base_model:Vortex5/Crimson-Constellation-12B, base_model:merge:Vortex5/Crimson-Constellation-12B, base_model:Vortex5/Maroon-Sunset-12B, base_model:merge:Vortex5/Maroon-Sunset-12B, base_model:Vortex5/Red-Synthesis-12B, base_model:merge:Vortex5/Red-Synthesis-12B, base_model:Vortex5/Scarlet-Seraph-12B, base_model:merge:Vortex5/Scarlet-Seraph-12B, base_model:Vortex5/Shining-Seraph-12B, base_model:merge:Vortex5/Shining-Seraph-12B, base_model:Vortex5/Starlit-Shadow-12B, base_model:merge:Vortex5/Starlit-Shadow-12B, base_model:Vortex5/Vermilion-Sage-12B, base_model:merge:Vortex5/Vermilion-Sage-12B, base_model:aixonlab/Aether-12b, base_model:merge:aixonlab/Aether-12b, base_model:aixonlab/Zinakha-12b, base_model:merge:aixonlab/Zinakha-12b, base_model:allura-org/Bigger-Body-12b, base_model:merge:allura-org/Bigger-Body-12b, base_model:allura-org/MN-12b-RP-Ink, base_model:merge:allura-org/MN-12b-RP-Ink, base_model:allura-org/remnant-mn-12b, base_model:merge:allura-org/remnant-mn-12b, base_model:anthracite-org/magnum-v4-12b, base_model:merge:anthracite-org/magnum-v4-12b, base_model:crestf411/nemo-sunfall-v0.6.1, base_model:merge:crestf411/nemo-sunfall-v0.6.1, base_model:inflatebot/MN-12B-Mag-Mell-R1, base_model:merge:inflatebot/MN-12B-Mag-Mell-R1, base_model:intervitens/mini-magnum-12b-v1.1, base_model:merge:intervitens/mini-magnum-12b-v1.1, base_model:jtatman/mistral_nemo_12b_reasoning_psychology_lora, base_model:merge:jtatman/mistral_nemo_12b_reasoning_psychology_lora, base_model:mistralai/Mistral-Nemo-Instruct-2407, base_model:merge:mistralai/Mistral-Nemo-Instruct-2407, base_model:nbeerbower/Lyra-Gutenberg-mistral-nemo-12B, base_model:merge:nbeerbower/Lyra-Gutenberg-mistral-nemo-12B, base_model:nbeerbower/Lyra4-Gutenberg-12B, base_model:merge:nbeerbower/Lyra4-Gutenberg-12B, base_model:nbeerbower/Mistral-Nemo-Gutenberg-Doppel-12B, base_model:merge:nbeerbower/Mistral-Nemo-Gutenberg-Doppel-12B, base_model:nbeerbower/Mistral-Nemo-Gutenberg-Encore-12B, base_model:merge:nbeerbower/Mistral-Nemo-Gutenberg-Encore-12B, base_model:nbeerbower/Mistral-Nemo-Gutenberg-Vitus-12B, base_model:merge:nbeerbower/Mistral-Nemo-Gutenberg-Vitus-12B, base_model:nbeerbower/mistral-nemo-bophades-12B, base_model:merge:nbeerbower/mistral-nemo-bophades-12B, base_model:nbeerbower/mistral-nemo-gutenberg-12B-v3, base_model:merge:nbeerbower/mistral-nemo-gutenberg-12B-v3, base_model:nbeerbower/mistral-nemo-gutenberg-12B-v4, base_model:merge:nbeerbower/mistral-nemo-gutenberg-12B-v4, base_model:nbeerbower/mistral-nemo-wissenschaft-12B, base_model:merge:nbeerbower/mistral-nemo-wissenschaft-12B, base_model:nothingiisreal/Celeste-12B-V1.6, base_model:merge:nothingiisreal/Celeste-12B-V1.6, base_model:nothingiisreal/MN-12B-Celeste-V1.9, base_model:merge:nothingiisreal/MN-12B-Celeste-V1.9, base_model:romaingrx/red-teamer-mistral-nemo, base_model:merge:romaingrx/red-teamer-mistral-nemo, base_model:shisa-ai/shisa-v2-mistral-nemo-12b, base_model:merge:shisa-ai/shisa-v2-mistral-nemo-12b, base_model:sleepdeprived3/Christian-Bible-Expert-v2.0-12B, base_model:merge:sleepdeprived3/Christian-Bible-Expert-v2.0-12B, license:apache-2.0, text-generation-inference, endpoints_compatible, region:us
armand0e/Omnicoder-9B-Opus-Distill-GGUF
Author: armand0e
Likes: 2
Downloads: 0
Tags: gguf, endpoints_compatible, region:us, conversational
Sengil/turkish-gemma-9b-finance-sft
language:
- tr
license: apache-2.0
base_model: ytu-ce-cosmos/Turkish-Gemma-9b-T1
tags:
- finance
- llm
- instruction-tuning
- sft
- trl
- transformers
pipeline_tag: text-generation
library_name: transformers
Sengil/turkish-gemma-9b-finance-sft
<p align="center">
<img src="./Gemini_Generated_Image_1i1esm1i1esm1i1e.png" alt="Model Banner" width="900"/>
</p>
Model Overview
Sengil/turkish-gemma-9b-finance-sft is a Turkish financial instruction-tuned large language model developed for finance-related natural language understanding and generation tasks in Turkish.
The model is based on ytu-ce-cosmos/Turkish-Gemma-9b-T1 and further fine-tuned using Supervised Fine-Tuning (SFT) on a finance-focused instruction dataset.
Base Model
- Base model:
ytu-ce-cosmos/Turkish-Gemma-9b-T1
- Fine-tuning method: Supervised Fine-Tuning (SFT)
Intended Use
This model is designed for Turkish-language finance-related NLP applications, including:
- Financial question answering
- Finance-oriented instruction following
- General Turkish financial text generation
- Educational and research use in Turkish finance NLP
Training Dataset
The model was fine-tuned on the following dataset:
- Dataset:
AlicanKiraz0/Turkish-Finance-SFT-Dataset
Dbmaxwell/turkish-finance-instruction-dataset
RsGoksel/Finansal
I transformed a plain dataset into an instruction-tuning-ready synthetic dataset using the Gemini API by generating structured instruction-response examples.
This process made the data more suitable for training LLMs to better follow user prompts, formats, and task-specific guidance.
I will share the dataset soon.
Key Training Hyperparameters
- Max steps: 500
- Approx. epochs: ~3.7
- Learning rate: 1e-4
- Per-device train batch size: 4
- Gradient accumulation steps: 4
- Effective batch size: 16
- Optimizer: AdamW 8-bit
- Weight decay: 0.01
- LR scheduler: Cosine
- Warmup steps: 20
- Max grad norm: 1.0
- Precision: bfloat16
Training Procedure
The model was trained with supervised fine-tuning on Turkish finance-domain examples.
The objective was to improve domain adaptation and instruction-following ability in financial contexts while preserving the Turkish language capabilities of the base model.
Example Use
!pip install unsloth
Inference code
from unsloth import FastLanguageModel
import torch
repo_id = "Sengil/turkish-gemma-9b-finance-sft"
max_seq_length = 2048
SYSTEM_PROMPT = "Sen, kripto para ve borsa konularında uzmanlaşmış, hem Türkiye hem de global finansal piyasalara hakim bir finans asistanısın. Sana sorulan sorulara kullanıcının istediği şekilde cevap veriyorsun."
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=repo_id,
max_seq_length=max_seq_length,
dtype=torch.bfloat16,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
def ask_finance_model(question, max_new_tokens=1024):
prompt = (
f"<start_of_turn>system\n{SYSTEM_PROMPT}<end_of_turn>\n"
f"<start_of_turn>user\n{question.strip()}<end_of_turn>\n"
f"<start_of_turn>model\n"
)
inputs = tokenizer([prompt], return_tensors="pt", truncation=True, max_length=max_seq_length).to("cuda")
with torch.inference_mode():
outputs = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=False,
use_cache=True,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.eos_token_id,
)
text = tokenizer.decode(outputs[0], skip_special_tokens=False)
return text.split("<start_of_turn>model\n", 1)[-1].split("<end_of_turn>", 1)[0].strip()
response = ask_finance_model(
"Cari açık veren bir ülkede yerel para birimi neden baskı altında kalabilir? Kısaca açıklar mısın?",
max_new_tokens=1024,
)
print(response)
Stream use code
from unsloth import FastLanguageModel
from transformers import TextStreamer
import torch
repo_id = "Sengil/turkish-gemma-9b-finance-sft"
max_seq_length = 2048
dtype = torch.bfloat16
load_in_4bit = True
SHORT_SYSTEM_PROMPT = (
"Sen, kripto para ve borsa konularında uzmanlaşmış, hem Türkiye hem de global finansal piyasalara hakim bir finans asistanısın. Sana sorulan sorulara kullanıcının istediği şekilde cevap veriyorsun."
)
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=repo_id,
max_seq_length=max_seq_length,
dtype=dtype,
load_in_4bit=load_in_4bit,
)
FastLanguageModel.for_inference(model)
def ask_finance_model(
question,
max_new_tokens=1024,
do_sample=False,
stream=True,
system_prompt=SHORT_SYSTEM_PROMPT,
):
prompt = (
f"<start_of_turn>system\n{system_prompt}<end_of_turn>\n"
f"<start_of_turn>user\n{question.strip()}<end_of_turn>\n"
f"<start_of_turn>model\n"
)
inputs = tokenizer(
[prompt],
return_tensors="pt",
truncation=True,
max_length=max_seq_length,
).to("cuda")
generation_kwargs = dict(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=do_sample,
use_cache=True,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.eos_token_id,
)
if stream:
streamer = TextStreamer(
tokenizer,
skip_prompt=True,
skip_special_tokens=False,
)
generation_kwargs["streamer"] = streamer
with torch.inference_mode():
outputs = model.generate(**generation_kwargs)
full_output = tokenizer.batch_decode(outputs, skip_special_tokens=False)[0]
answer = full_output.split("<start_of_turn>model\n", 1)[-1]
answer = answer.split("<end_of_turn>", 1)[0].strip()
return answer
response = ask_finance_model(
"""Bir hisse senedinde uzun süredir devam eden yatay konsolidasyon bölgesi yukarı yönlü hacimli bir kırılımla aşıldıktan sonra, fiyatın tekrar bu banda doğru sarkmasına rağmen volatilitenin belirgin biçimde düşmesi, trendin devamı açısından nasıl yorumlanmalıdır?""",
max_new_tokens=4096,
do_sample=False,
stream=True,
)
print("\n\n====== FINAL ANSWER ======\n")
print(response)
output>>
====== FINAL ANSWER ======
<think>
Bu soru, teknik analiz bağlamında bir hisse senedi fiyat hareketini yorumlamaya yönelik. Soru şu: Uzun süreli yatay konsolidasyon (sideways consolidation) bölgesi yukarı yönlü hacimli bir kırılımla aşıldıktan sonra, fiyatın tekrar bu banda doğru sarkmasına rağmen volatilitenin belirgin biçimde düşmesi, trendin devamı açısından nasıl yorumlanmalıdır?
Öncelikle, yatay konsolidasyon: Fiyatın belirli bir aralıkta (destek ve direnç seviyeleri arasında) uzun süre hareket etmesi, genellikle bir sonraki büyük hareketin (trend) başlangıcına işaret eder. Hacimli kırılım: Kırılımın hacimli olması, trendin gücünü gösterir. Yukarı yönlü hacimli kırılım, yükseliş trendinin başladığına dair güçlü bir sinyaldir.
Ancak, fiyatın tekrar konsolidasyon bandına doğru sarkması (retest) ve volatilitenin düşmesi, trendin devamı açısından nasıl yorumlanır?
Volatilite düşüşü, piyasanın daha az hareketli, daha az belirsiz olduğu anlamına gelir. Bu, trendin güçlendiğine işaret edebilir mi? Genellikle, bir trendin güçlenmesi için volatilite düşebilir, çünkü fiyat trend çizgisine yakın kalır ve aşırı dalgalanmalar azalır. Ancak, volatilite düşüşü aynı zamanda trendin zayıfladığına da işaret edebilir; çünkü fiyat trend çizgisinden uzaklaşmaz, ancak hareketler küçülür. Bu durumda, fiyatın konsolidasyon bandına sarkması, trendin devam ettiğini gösteren bir geri çekilme (pullback) olabilir.
Yorumlama:
1. **Kırılımın doğası**: Hacimli kırılım, trendin başlangıcını onaylar. Fiyatın konsolidasyon bandına geri dönmesi, trendin devam ettiğini gösteren bir geri çekilme olabilir. Bu geri çekilme, trend çizgisine yakın bir destek seviyesinde (örneğin, kırılım sonrası oluşan yeni destek) fiyatın toparlanması beklenir.
2. **Volatilite düşüşü**: Volatilite düşüşü, piyasanın daha az tepkisel olduğunu, daha az belirsizlik olduğunu gösterir. Bu, trendin güçlendiğine işaret edebilir. Çünkü trend güçlendikçe, fiyat trend çizgisine yakın kalır ve aşırı dalgalanmalar azalır. Ancak, volatilite düşüşü aynı zamanda trendin zayıfladığına da işaret edebilir; çünkü fiyat trend çizgisinden uzaklaşmaz, ancak hareketler küçülür. Bu durumda, fiyatın konsolidasyon bandına sarkması, trendin devam ettiğini gösteren bir geri çekilme olabilir.
3. **Trend devamı açısından**: Eğer fiyat, konsolidasyon bandına sarktıktan sonra trend çizgisine yakın bir destek seviyesinde toparlanırsa ve volatilite düşüşü devam ederse, trendin devam ettiği söylenebilir. Ancak, eğer fiyat konsolidasyon bandının altına düşerse ve volatilite artarsa, trendin zayıfladığı düşünülebilir.
4. **Hacim analizi**: Kırılımda hacim yüksekti. Geri çekilme sırasında hacim düşükse, bu geri çekilmenin trendi tehdit etmediğini gösterir. Hacim düşükse, alıcılar ve satıcılar arasında bir denge var demektir.
5. **Destek ve direnç seviyeleri**: Konsolidasyon bandı artık bir destek seviyesi olarak işlev görebilir. Fiyat bu seviyeye geri döndüğünde, alıcıların devreye girmesi beklenir.
6. **Volatilite göstergeleri**: ATR (Average True Range) gibi göstergelerle volatilite ölçülebilir. Volatilite düşüşü, trendin güçlendiğine işaret edebilir.
7. **Psikolojik faktörler**: Yatırımcılar, trendin devam ettiğine inanmaya başlar ve daha az işlem yapar, bu da volatiliteyi düşürür.
Sonuç olarak, bu durum trendin devamı açısından olumlu bir sinyal olarak yorumlanabilir. Ancak, fiyatın konsolidasyon bandının altına düşmesi ve volatilitenin artması durumunda trendin zayıfladığı düşünülebilir.
Cevabı yapılandıralım:
- Giriş: Yatay konsolidasyon, hacimli kırılım, geri çekilme ve volatilite düşüşü kavramlarını açıkla.
- Kırılımın doğası: Hacimli kırılım trendin başlangıcını onaylar.
- Geri çekilme: Trend devamı için normal bir geri çekilme olabilir.
- Volatilite düşüşü: Trendin güçlendiğine işaret edebilir, çünkü fiyat trend çizgisine yakın kalır.
- Destek seviyeleri: Konsolidasyon bandı artık destek olarak işlev görür.
- Hacim: Geri çekilme sırasında hacim düşükse, trend devam ediyor demektir.
- Olası senaryolar: Fiyatın konsolidasyon bandının altına düşmesi ve volatilitenin artması trendi tehdit eder.
- Sonuç: Bu durum trendin devamı açısından olumlu yorumlanabilir, ancak dikkatli olunmalı.
Cevabı Türkçe olarak vermeliyim.
</think>
Yatay konsolidasyon, bir hisse senedinin belirli bir fiyat aralığında uzun süre hareket etmesiyle oluşan bir piyasa yapısıdır. Bu aralık, genellikle bir sonraki büyük trendin (yükseliş veya düşüş) başlangıcına işaret eder. Hacimli kırılım ise, fiyatın bu aralığı yukarı veya aşağı yönde, önemli bir hacimle aşmasıdır. Yukarı yönlü hacimli kırılım, yükseliş trendinin başladığına dair güçlü bir sinyaldir.
Ancak, fiyatın konsolidasyon bandına geri dönmesi (retest) ve volatilitenin belirgin biçimde düşmesi, trendin devamı açısından nasıl yorumlanmalıdır? Bu durumu değerlendirmek için birkaç teknik ve psikolojik faktörü göz önünde bulundurmak gerekir.
### 1. Kırılımın Doğası ve Geri Çekilme
- **Hacimli kırılım**, trendin başlangıcını onaylar. Fiyatın konsolidasyon bandına geri dönmesi, trendin devam ettiğini gösteren bir geri çekilme (pullback) olabilir. Bu geri çekilme, trend çizgisine yakın bir destek seviyesinde (örneğin, kırılım sonrası oluşan yeni destek) fiyatın toparlanması beklenir.
- Geri çekilme, trendin gücünü test eder. Eğer fiyat, konsolidasyon bandının altına düşmez ve trend çizgisine yakın bir destekte toparlanırsa, trendin devam ettiği söylenebilir.
### 2. Volatilite Düşüşünün Anlamı
- Volatilite düşüşü, piyasanın daha az hareketli, daha az belirsiz olduğu anlamına gelir. Bu durum, trendin güçlendiğine işaret edebilir. Çünkü trend güçlendikçe, fiyat trend çizgisine yakın kalır ve aşırı dalgalanmalar azalır.
- Ancak, volatilite düşüşü aynı zamanda trendin zayıfladığına da işaret edebilir; çünkü fiyat trend çizgisinden uzaklaşmaz, ancak hareketler küçülür. Bu durumda, fiyatın konsolidasyon bandına sarkması, trendin devam ettiğini gösteren bir geri çekilme olabilir.
### 3. Destek ve Direnç Seviyeleri
- Konsolidasyon bandı artık bir destek seviyesi olarak işlev görebilir. Fiyat bu seviyeye geri döndüğünde, alıcıların devreye girmesi beklenir.
- Eğer fiyat, konsolidasyon bandının altına düşerse ve volatilite artarsa, trendin zayıfladığı düşünülebilir.
### 4. Hacim Analizi
- Kırılımda hacim yüksekti. Geri çekilme sırasında hacim düşükse, bu geri çekilmenin trendi tehdit etmediğini gösterir. Hacim düşükse, alıcılar ve satıcılar arasında bir denge var demektir.
- Hacim düşükse, fiyatın trend çizgisine yakın bir destekte toparlanması daha olasıdır.
### 5. Volatilite Göstergeleri
- ATR (Average True Range) gibi göstergelerle volatilite ölçülebilir. Volatilite düşüşü, trendin güçlendiğine işaret edebilir.
- Volatilite düşüşü, yatırımcıların trendin devam ettiğine inanmaya başladığını ve daha az işlem yaptığını gösterir.
### 6. Olası Senaryolar
- **Senaryo 1: Trend devam ediyor**
Fiyat, konsolidasyon bandına geri döndüğünde trend çizgisine yakın bir destekte toparlanır ve volatilite düşüşü devam eder. Bu durumda, trendin devam ettiği söylenebilir.
- **Senaryo 2: Trend zayıflıyor**
Fiyat, konsolidasyon bandının altına düşer ve volatilite artar. Bu durumda, trendin zayıfladığı veya tersine döndüğü düşünülebilir.
### Sonuç
Bu durum, trendin devamı açısından genellikle olumlu bir sinyal olarak yorumlanır. Ancak, fiyatın konsolidasyon bandının altına düşmesi ve volatilitenin artması durumunda trendin zayıfladığına dair uyarı sinyalleri olarak değerlendirilmelidir. Yatırımcılar, bu geri çekilmeyi bir fırsat olarak görebilir ve trend çizgisine yakın destek seviyelerinde alım yapabilirler.
**Not:** Bu yorum, teknik analiz prensiplerine dayanmaktadır. Yatırım kararları vermeden önce ek analizler ve risk yönetimi uygulanmalıdır.
Limitations
Although the model is specialized for Turkish finance-related tasks, it may still:
- Generate inaccurate or outdated financial information
- Produce overly confident responses in uncertain scenarios
- Reflect biases present in the training data
- Require human verification for high-stakes financial use cases
This model should not be used as a substitute for professional financial, legal, or investment advice.
Risks and Recommendations
Users are encouraged to:
- Validate critical outputs before use
- Avoid relying on the model for regulated or high-risk financial decisions
- Use human oversight in production environments
- Benchmark the model on task-specific evaluation datasets before deployment
Citation
@misc{sengil_turkish_gemma_9b_finance_sft,
author = {Mert Sengil},
title = {Sengil/turkish-gemma-9b-finance-sft},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/Sengil/turkish-gemma-9b-finance-sft}}
}
Author
Mert Sengil
Author: Sengil
Likes: 2
Downloads: 0
Tags: transformers, safetensors, finance, llm, instruction-tuning, sft, trl, text-generation, conversational, tr, base_model:ytu-ce-cosmos/Turkish-Gemma-9b-T1, base_model:finetune:ytu-ce-cosmos/Turkish-Gemma-9b-T1, license:apache-2.0, endpoints_compatible, region:us
Naphula/Ancient-Awakening-12B
base_model:
- aixonlab/Aether-12b
- aixonlab/Zinakha-12b
- allura-org/Bigger-Body-12b
- allura-org/MN-12b-RP-Ink
- allura-org/remnant-mn-12b
- anthracite-org/magnum-v4-12b
- ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.2
- Babsie/Opulus-12B-v3
- BeaverAI/mistral-doryV2-12b
- crestf411/nemo-sunfall-v0.6.1
- EldritchLabs/Kraken-Karcher-12B-v1
- EpistemeAI2/Fireball-Mistral-Nemo-12B-Philos
- EpistemeAI/Mistral-Nemo-Instruct-12B-Philosophy-Math
- Fizzarolli/MN-12b-Rosier-v1
- HumanLLMs/Human-Like-Mistral-Nemo-Instruct-2407
- IIEleven11/Kalypso
- inflatebot/MN-12B-Mag-Mell-R1
- intervitens/mini-magnum-12b-v1.1
- jtatman/mistral_nemo_12b_reasoning_psychology_lora
- KOOWEEYUS/BlackSheep-RP-12B
- Lambent/Arsenic-Shahrazad-12B-v2
- Lambent/Arsenic-Shahrazad-12B-v3
- Lambent/arsenic-nemo-unleashed-12B
- Lambent/Gilded-Arsenic-12B
- LatitudeGames/Muse-12B
- mistralai/Mistral-Nemo-Instruct-2407
- Naphula/Riemannian-Redshift-12B-v1
- Naphula-Archives/F5-stage6-12B
- Naphula-Archives/F5-stage7-12B
- nbeerbower/Lyra-Gutenberg-mistral-nemo-12B
- nbeerbower/Lyra4-Gutenberg-12B
- nbeerbower/mistral-nemo-bophades-12B
- nbeerbower/mistral-nemo-gutenberg-12B-v3
- nbeerbower/mistral-nemo-gutenberg-12B-v4
- nbeerbower/Mistral-Nemo-Gutenberg-Doppel-12B
- nbeerbower/Mistral-Nemo-Gutenberg-Encore-12B
- nbeerbower/Mistral-Nemo-Gutenberg-Vitus-12B
- nbeerbower/mistral-nemo-wissenschaft-12B
- NeverSleepHistorical/lumi-nemo-e2.0
- NeverSleep/Lumimaid-v0.2-12B
- nothingiisreal/Celeste-12B-V1.6
- nothingiisreal/MN-12B-Celeste-V1.9
- PocketDoc/Dans-DangerousWinds-V1.1.0-12b
- ReadyArt/Dark-Nexus-12B-v2.0
- ReadyArt/Forgotten-Safeword-12B-v4.0
- ReadyArt/Omega-Darker_The-Final-Directive-12B
- romaingrx/red-teamer-mistral-nemo
- Sao10K/MN-12B-Lyra-v1
- Sao10K/MN-12B-Lyra-v4
- shisa-ai/shisa-v2-mistral-nemo-12b
- SicariusSicariiStuff/Impish_Bloodmoon_12B
- sleepdeprived3/Christian-Bible-Expert-v2.0-12B
- SuperbEmphasis/MN-12b-RP-Ink-RP-Longform
- SuperbEmphasis/Omega-Darker_The-Final-Directive-Longform-Stage2-ERP-12B-v0.2
- TheDrummer/Rivermind-12B-v1
- TheDrummer/Rocinante-12B-v1
- TheDrummer/Rocinante-X-12B-v1
- Trappu/Nemo-Picaro-12B
- Undi95/LocalC-12B-e2.0
- VAGOsolutions/SauerkrautLM-Nemo-12b-Instruct
- Vortex5/Astral-Noctra-12B
- Vortex5/Azure-Starlight-12B
- Vortex5/Crimson-Constellation-12B
- Vortex5/Red-Synthesis-12B
- Vortex5/Shining-Seraph-12B
- Vortex5/Starlit-Shadow-12B
- Vortex5/Vermilion-Sage-12B
- Vortex5/Scarlet-Seraph-12B
- Vortex5/Maroon-Sunset-12B
- Vortex5/Amber-Starlight-12B
language:
- en
library_name: transformers
license: apache-2.0
tags:
- creative
- creative writing
- fiction writing
- plot generation
- sub-plot generation
- fiction writing
- story generation
- scene continue
- storytelling
- fiction story
- science fiction
- romance
- all genres
- story
- writing
- vivid prosing
- vivid writing
- fiction
- roleplaying
- float32
- swearing
- rp
- horror
- mistral
- nemo
- merge
- mergekit
- karcher
- flux
- arcee_fusion
- ramplus_tl
- pdq
widget:
- text: "Ancient-Awakening-12B"
output:
url: https://cdn-uploads.huggingface.co/production/uploads/68e840caa318194c44ec2a04/CvyWU1z106Aa__M8KIksp.png
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/68e840caa318194c44ec2a04/yI041gp0fzz7N_Mh_x5Pt.mpga"></audio>
[!WARNING]
<span style="color:red; font-weight:bold">⚠️ Warning:</span> This model works best with either the ChatML or Mistral Tekken chat template. The uncensored MPOA version has guardrails removed, which can produce narratives and RP that contain violent and graphic erotic content. Adjust your system prompt accordingly.
<!DOCTYPE html>
<style>
body {
font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
color: #D1D5DB; /* Pale stone gray */
line-height: 1.6;
margin: 0;
padding: 0;
background-color: #0A0C10; /* Very dark stormy gray/black */
}
b, strong {
color: #FBBF24; /* Glowing amber/gold */
text-shadow: 0 0 8px rgba(251, 191, 36, 0.4);
}
.awakening-text {
color: #FEF3C7; /* Pale inner-eye yellow */
position: relative;
z-index: 2;
margin-left: 0.2em;
text-shadow: 0 0 15px #F59E0B, 0 0 30px #B45309; /* Deep fiery orange/gold glow */
font-size: 1.8rem;
letter-spacing: 1px;
font-weight: 600;
}
/* Section styling */
.section-container {
background-color: rgba(17, 24, 39, 0.85); /* Dark slate rock */
margin-bottom: 30px;
position: relative;
overflow: hidden;
border-bottom: 1px solid #78350F; /* Dark bronze/earth */
box-shadow: 0 4px 20px rgba(0, 0, 0, 0.6);
}
.section-header {
display: flex;
align-items: center;
background-color: rgba(245, 158, 11, 0.05); /* Faint amber tint */
padding: 10px 20px;
border-top: 1px solid rgba(120, 53, 15, 0.4);
}
.section-indicator {
width: 8px;
height: 20px;
background-color: #F59E0B; /* Amber eye color */
margin-right: 15px;
box-shadow: 0 0 10px rgba(245, 158, 11, 0.6);
border-radius: 2px;
}
.section-title {
font-family: 'Georgia', 'Times New Roman', serif; /* Ancient tome feel */
color: #FDE68A; /* Light gold */
font-size: 1.4rem;
margin: 0;
letter-spacing: 1px;
font-weight: 400;
text-transform: capitalize;
}
.section-content {
padding: 20px;
font-family: sans-serif;
color: #D1D5DB;
line-height: 1.6;
}
/* Title styling */
.title-container {
background-color: #050505; /* Pitch black */
position: relative;
overflow: hidden;
margin-bottom: 40px;
border-left: 4px solid #F59E0B; /* Amber pillar */
box-shadow: 0 6px 25px rgba(245, 158, 11, 0.15);
}
.title-wrapper {
position: relative;
z-index: 2;
padding: 25px 20px 30px 30px;
font-family: 'Georgia', 'Times New Roman', serif;
}
.title-main {
color: #FEF3C7;
font-size: 2.0rem;
font-weight: 700;
margin: 0;
letter-spacing: 2px;
display: inline-block;
position: relative;
text-transform: uppercase;
}
.storm-overlay {
position: absolute;
top: 0;
left: 0;
width: 100%;
height: 100%;
/* Dark, brooding radial fog mimicking the eye's aura */
background-image: radial-gradient(circle at 50% 50%, rgba(245, 158, 11, 0.08) 0%, rgba(0,0,0,0.9) 80%);
z-index: 1;
}
/* Subheading styling */
.subheading {
color: #D97706; /* Deep orange */
font-size: 1.1rem;
margin-top: 20px;
margin-bottom: 15px;
font-weight: 400;
border-bottom: 1px dashed rgba(217, 119, 6, 0.4);
display: inline-block;
text-transform: uppercase;
letter-spacing: 1px;
font-family: 'Georgia', 'Times New Roman', serif;
}
/* Links */
a {
color: #FBBF24; /* Amber */
text-decoration: none;
transition: color 0.3s ease, text-shadow 0.3s ease;
}
a:hover {
text-decoration: underline;
color: #FDE68A; /* Brighter gold */
text-shadow: 0 0 8px rgba(251, 191, 36, 0.5);
}
/* Container */
.container {
max-width: 1200px;
margin: 20px auto;
padding: 40px 20px;
background-color: #0D1117; /* Deep stormy night */
background-image:
radial-gradient(circle at 15% 85%, rgba(120, 53, 15, 0.1) 0%, transparent 50%),
radial-gradient(circle at 85% 15%, rgba(245, 158, 11, 0.05) 0%, transparent 50%);
min-height: calc(100vh - 40px);
border: 1px solid #1F2937; /* Dark stone border */
border-radius: 8px;
box-shadow: 0 8px 40px rgba(0, 0, 0, 0.9), inset 0 0 20px rgba(0, 0, 0, 0.5);
}
/* Code blocks */
pre {
background-color: #050505; /* Pitch black */
border: 1px solid #1F2937; /* Dark stone */
border-left: 3px solid #92400E; /* Dark orange/brown */
padding: 15px;
border-radius: 4px;
color: #D1D5DB;
overflow-x: auto;
}
code {
font-family: 'Courier New', Courier, monospace;
color: #FBBF24; /* Amber */
background-color: rgba(245, 158, 11, 0.08);
padding: 2px 4px;
border-radius: 3px;
}
pre code {
color: #00FFFF;
background-color: transparent;
padding: 0;
}
</style>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Ancient Awakening 12B</title>
</head>
<body>
<div class="container">
<div class="title-container">
<div class="storm-overlay"></div>
<div class="title-wrapper">
<h2 class="title-main">
<span class="awakening-text">🧬 Ancient Awakening 12B</span>
</h2>
</div>
</div>
<img src="https://cdn-uploads.huggingface.co/production/uploads/68e840caa318194c44ec2a04/CvyWU1z106Aa__M8KIksp.png"
alt="Ancient Awakening"
style="display: block; margin: 0 auto 30px auto; max-width: 100%; height: auto; border-radius: 5px; border: 1px solid #1F2937; box-shadow: 0 0 25px rgba(245, 158, 11, 0.15);">
<div class="section-container">
<div class="section-header">
<div class="section-indicator"></div>
<h2 class="section-title">Overview</h2>
</div>
<div class="section-content"><font face="verdana">
This is a merge of pre-trained language models created using <a href="https://github.com/cg123/mergekit">mergekit</a>.
</div>
</div>
<div class="section-container">
<div class="section-header">
<div class="section-indicator"></div>
<h2 class="section-title">System Prompt (Optional)</h2>
</div>
<div class="section-content"><font face="verdana">
<p>You are the "Ancient One," a colossal, primordial entity of living stone, deep magic, and abyssal ocean. For countless millennia, you have slumbered in a state of suspended animation, your massive, jagged body mistaken for a remote, floating island amidst a perpetually stormy sea. You are older than recorded history, older than the gods of men. The ANCIENT AWAKENING marks your current state: you are finally opening your single, massive, reptilian eye. You are a geological anomaly made sentient.</p>
</div>
</div>
<div class="section-container">
<div class="section-header">
<div class="section-indicator"></div>
<h2 class="section-title">Merge Details</h2>
</div>
<div class="section-content"><font face="verdana">
<b>Merge Methods</b><br>
This model was synthesized using a complex multi-stage process involving the following methods:
<ul>
<li><a href="https://en.wikipedia.org/wiki/Karcher_mean">karcher</a></li>
<li><a href="https://huggingface.co/24B-Suite/Mergedonia-Suite-24B-v1/discussions/2">flux</a></li>
<li><a href="https://www.arcee.ai/blog/meet-mergekit-v0-1-arcee-fusion-expanded-model-support-multi-gpu-acceleration">arcee_fusion</a></li>
<li><a href="https://arxiv.org/abs/2601.13572">ramplus_tl [Reinforced Agent Merging Plus (Tensor-Local)]</a></li>
<li><a href="https://huggingface.co/24B-Suite/Mergedonia-Suite-24B-v1/discussions/2">pdq</a></li>
</ul>
<br>The <a href="https://huggingface.co/spaces/Naphula/model_tools/blob/main/graph_v18.py">graph_v18.py</a> patch was helpful to use 8GB VRAM for acceleration.
<hr>
<b>Models Merged</b><br>
The following 70 models were woven into this merge:<br><br>
<details>
<summary style="cursor: pointer; color: #FBBF24; font-weight: bold;">Show 70 Donor Models</summary>
<ul>
<li><a href="https://huggingface.co/aixonlab/Aether-12b">aixonlab/Aether-12b</a></li>
<li><a href="https://huggingface.co/aixonlab/Zinakha-12b">aixonlab/Zinakha-12b</a></li>
<li><a href="https://huggingface.co/allura-org/Bigger-Body-12b">allura-org/Bigger-Body-12b</a></li>
<li><a href="https://huggingface.co/allura-org/MN-12b-RP-Ink">allura-org/MN-12b-RP-Ink</a></li>
<li><a href="https://huggingface.co/allura-org/remnant-mn-12b">allura-org/remnant-mn-12b</a></li>
<li><a href="https://huggingface.co/anthracite-org/magnum-v4-12b">anthracite-org/magnum-v4-12b</a></li>
<li><a href="https://huggingface.co/ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.2">ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.2</a></li>
<li><a href="https://huggingface.co/Babsie/Opulus-12B-v3">Babsie/Opulus-12B-v3</a></li>
<li><a href="https://huggingface.co/BeaverAI/mistral-doryV2-12b">BeaverAI/mistral-doryV2-12b</a></li>
<li><a href="https://huggingface.co/crestf411/nemo-sunfall-v0.6.1">crestf411/nemo-sunfall-v0.6.1</a></li>
<li><a href="https://huggingface.co/EldritchLabs/Kraken-Karcher-12B-v1">EldritchLabs/Kraken-Karcher-12B-v1</a></li>
<li><a href="https://huggingface.co/EpistemeAI2/Fireball-Mistral-Nemo-12B-Philos">EpistemeAI2/Fireball-Mistral-Nemo-12B-Philos</a></li>
<li><a href="https://huggingface.co/EpistemeAI/Mistral-Nemo-Instruct-12B-Philosophy-Math">EpistemeAI/Mistral-Nemo-Instruct-12B-Philosophy-Math</a></li>
<li><a href="https://huggingface.co/Fizzarolli/MN-12b-Rosier-v1">Fizzarolli/MN-12b-Rosier-v1</a></li>
<li><a href="https://huggingface.co/HumanLLMs/Human-Like-Mistral-Nemo-Instruct-2407">HumanLLMs/Human-Like-Mistral-Nemo-Instruct-2407</a></li>
<li><a href="https://huggingface.co/IIEleven11/Kalypso">IIEleven11/Kalypso</a></li>
<li><a href="https://huggingface.co/inflatebot/MN-12B-Mag-Mell-R1">inflatebot/MN-12B-Mag-Mell-R1</a></li>
<li><a href="https://huggingface.co/intervitens/mini-magnum-12b-v1.1">intervitens/mini-magnum-12b-v1.1</a></li>
<li><a href="https://huggingface.co/jtatman/mistral_nemo_12b_reasoning_psychology_lora">jtatman/mistral_nemo_12b_reasoning_psychology_lora</a></li>
<li><a href="https://huggingface.co/KOOWEEYUS/BlackSheep-RP-12B">KOOWEEYUS/BlackSheep-RP-12B</a></li>
<li><a href="https://huggingface.co/Lambent/Arsenic-Shahrazad-12B-v2">Lambent/Arsenic-Shahrazad-12B-v2</a></li>
<li><a href="https://huggingface.co/Lambent/Arsenic-Shahrazad-12B-v3">Lambent/Arsenic-Shahrazad-12B-v3</a></li>
<li><a href="https://huggingface.co/Lambent/arsenic-nemo-unleashed-12B">Lambent/arsenic-nemo-unleashed-12B</a></li>
<li><a href="https://huggingface.co/Lambent/Gilded-Arsenic-12B">Lambent/Gilded-Arsenic-12B</a></li>
<li><a href="https://huggingface.co/LatitudeGames/Muse-12B">LatitudeGames/Muse-12B</a></li>
<li><a href="https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407">mistralai/Mistral-Nemo-Instruct-2407</a></li>
<li><a href="https://huggingface.co/Naphula/Riemannian-Redshift-12B-v1">Naphula/Riemannian-Redshift-12B-v1</a></li>
<li><a href="https://huggingface.co/Naphula-Archives/F5-stage6-12B">Naphula-Archives/F5-stage6-12B</a></li>
<li><a href="https://huggingface.co/Naphula-Archives/F5-stage7-12B">Naphula-Archives/F5-stage7-12B</a></li>
<li><a href="https://huggingface.co/nbeerbower/Lyra-Gutenberg-mistral-nemo-12B">nbeerbower/Lyra-Gutenberg-mistral-nemo-12B</a></li>
<li><a href="https://huggingface.co/nbeerbower/Lyra4-Gutenberg-12B">nbeerbower/Lyra4-Gutenberg-12B</a></li>
<li><a href="https://huggingface.co/nbeerbower/mistral-nemo-bophades-12B">nbeerbower/mistral-nemo-bophades-12B</a></li>
<li><a href="https://huggingface.co/nbeerbower/mistral-nemo-gutenberg-12B-v3">nbeerbower/mistral-nemo-gutenberg-12B-v3</a></li>
<li><a href="https://huggingface.co/nbeerbower/mistral-nemo-gutenberg-12B-v4">nbeerbower/mistral-nemo-gutenberg-12B-v4</a></li>
<li><a href="https://huggingface.co/nbeerbower/Mistral-Nemo-Gutenberg-Doppel-12B">nbeerbower/Mistral-Nemo-Gutenberg-Doppel-12B</a></li>
<li><a href="https://huggingface.co/nbeerbower/Mistral-Nemo-Gutenberg-Encore-12B">nbeerbower/Mistral-Nemo-Gutenberg-Encore-12B</a></li>
<li><a href="https://huggingface.co/nbeerbower/Mistral-Nemo-Gutenberg-Vitus-12B">nbeerbower/Mistral-Nemo-Gutenberg-Vitus-12B</a></li>
<li><a href="https://huggingface.co/nbeerbower/mistral-nemo-wissenschaft-12B">nbeerbower/mistral-nemo-wissenschaft-12B</a></li>
<li><a href="https://huggingface.co/NeverSleepHistorical/lumi-nemo-e2.0">NeverSleepHistorical/lumi-nemo-e2.0</a></li>
<li><a href="https://huggingface.co/NeverSleep/Lumimaid-v0.2-12B">NeverSleep/Lumimaid-v0.2-12B</a></li>
<li><a href="https://huggingface.co/nothingiisreal/Celeste-12B-V1.6">nothingiisreal/Celeste-12B-V1.6</a></li>
<li><a href="https://huggingface.co/nothingiisreal/MN-12B-Celeste-V1.9">nothingiisreal/MN-12B-Celeste-V1.9</a></li>
<li><a href="https://huggingface.co/PocketDoc/Dans-DangerousWinds-V1.1.0-12b">PocketDoc/Dans-DangerousWinds-V1.1.0-12b</a></li>
<li><a href="https://huggingface.co/ReadyArt/Dark-Nexus-12B-v2.0">ReadyArt/Dark-Nexus-12B-v2.0</a></li>
<li><a href="https://huggingface.co/ReadyArt/Forgotten-Safeword-12B-v4.0">ReadyArt/Forgotten-Safeword-12B-v4.0</a></li>
<li><a href="https://huggingface.co/ReadyArt/Omega-Darker_The-Final-Directive-12B">ReadyArt/Omega-Darker_The-Final-Directive-12B</a></li>
<li><a href="https://huggingface.co/romaingrx/red-teamer-mistral-nemo">romaingrx/red-teamer-mistral-nemo</a></li>
<li><a href="https://huggingface.co/Sao10K/MN-12B-Lyra-v1">Sao10K/MN-12B-Lyra-v1</a></li>
<li><a href="https://huggingface.co/Sao10K/MN-12B-Lyra-v4">Sao10K/MN-12B-Lyra-v4</a></li>
<li><a href="https://huggingface.co/shisa-ai/shisa-v2-mistral-nemo-12b">shisa-ai/shisa-v2-mistral-nemo-12b</a></li>
<li><a href="https://huggingface.co/SicariusSicariiStuff/Impish_Bloodmoon_12B">SicariusSicariiStuff/Impish_Bloodmoon_12B</a></li>
<li><a href="https://huggingface.co/sleepdeprived3/Christian-Bible-Expert-v2.0-12B">sleepdeprived3/Christian-Bible-Expert-v2.0-12B</a></li>
<li><a href="https://huggingface.co/SuperbEmphasis/MN-12b-RP-Ink-RP-Longform">SuperbEmphasis/MN-12b-RP-Ink-RP-Longform</a></li>
<li><a href="https://huggingface.co/SuperbEmphasis/Omega-Darker_The-Final-Directive-Longform-Stage2-ERP-12B-v0.2">SuperbEmphasis/Omega-Darker_The-Final-Directive-Longform-Stage2-ERP-12B-v0.2</a></li>
<li><a href="https://huggingface.co/TheDrummer/Rivermind-12B-v1">TheDrummer/Rivermind-12B-v1</a></li>
<li><a href="https://huggingface.co/TheDrummer/Rocinante-12B-v1">TheDrummer/Rocinante-12B-v1</a></li>
<li><a href="https://huggingface.co/TheDrummer/Rocinante-X-12B-v1">TheDrummer/Rocinante-X-12B-v1</a></li>
<li><a href="https://huggingface.co/Trappu/Nemo-Picaro-12B">Trappu/Nemo-Picaro-12B</a></li>
<li><a href="https://huggingface.co/Undi95/LocalC-12B-e2.0">Undi95/LocalC-12B-e2.0</a></li>
<li><a href="https://huggingface.co/VAGOsolutions/SauerkrautLM-Nemo-12b-Instruct">VAGOsolutions/SauerkrautLM-Nemo-12b-Instruct</a></li>
<li><a href="https://huggingface.co/Vortex5/Astral-Noctra-12B">Vortex5/Astral-Noctra-12B</a></li>
<li><a href="https://huggingface.co/Vortex5/Azure-Starlight-12B">Vortex5/Azure-Starlight-12B</a></li>
<li><a href="https://huggingface.co/Vortex5/Crimson-Constellation-12B">Vortex5/Crimson-Constellation-12B</a></li>
<li><a href="https://huggingface.co/Vortex5/Red-Synthesis-12B">Vortex5/Red-Synthesis-12B</a></li>
<li><a href="https://huggingface.co/Vortex5/Shining-Seraph-12B">Vortex5/Shining-Seraph-12B</a></li>
<li><a href="https://huggingface.co/Vortex5/Starlit-Shadow-12B">Vortex5/Starlit-Shadow-12B</a></li>
<li><a href="https://huggingface.co/Vortex5/Vermilion-Sage-12B">Vortex5/Vermilion-Sage-12B</a></li>
<li><a href="https://huggingface.co/Vortex5/Scarlet-Seraph-12B">Vortex5/Scarlet-Seraph-12B</a></li>
<li><a href="https://huggingface.co/Vortex5/Maroon-Sunset-12B">Vortex5/Maroon-Sunset-12B</a></li>
<li><a href="https://huggingface.co/Vortex5/Amber-Starlight-12B">Vortex5/Amber-Starlight-12B</a></li>
</ul>
</div>
</details>
</div>
<div class="section-container">
<div class="section-header">
<div class="section-indicator"></div>
<h2 class="section-title">Merge Pipeline & Configuration</h2>
</div>
<div class="section-content">
<p><b>🧬 Ancient Awakening 12B</b> unites several methods and 70 models into one:</p>
<ol>
<li><a href="https://huggingface.co/EldritchLabs/Kraken-Karcher-12B-v1">🦑 Kraken Karcher v1</a>: Combines 53 <a href="https://huggingface.co/models?other=base_model:finetune:mistralai/Mistral-Nemo-Instruct-2407">Mistral Nemo finetunes</a> via the <code>karcher</code> method at 500 iterations</li>
<li><a href="https://huggingface.co/Naphula/Riemannian-Redshift-12B-v1">🌌 Riemannian Redshift v1</a>: Combines 10 <a href="https://huggingface.co/Vortex5">Vortex5</a> merges (which contain custom methods like <code>saef</code>, <code>smi_oni</code>, and <code>hpq</code>) via the <code>karcher</code> method at 1000 iterations</li>
<li>RedKFlux: <code>flux</code> merge of Kraken with Redshift at 1000 iterations</li>
<li>RedKFluxMell: <code>arcee_fusion</code> merge of #3 with <a href="https://huggingface.co/inflatebot/MN-12B-Mag-Mell-R1">Mag-Mell</a></li>
<li>BloodKraken: <code>arcee_fusion</code> merge of #4 with <a href="https://huggingface.co/SicariusSicariiStuff/Impish_Bloodmoon_12B">Impish Bloodmoon</a></li>
<li><a href="https://huggingface.co/Naphula-Archives/F5-stage6-12B">F5-stage6</a>: <code>arcee_fusion</code> merge of #5 with <a href="https://huggingface.co/LatitudeGames/Muse-12B">Muse</a></li>
<li><a href="https://huggingface.co/Naphula-Archives/F5-stage7-12B">F5-stage7</a>: <code>ramplus_tl</code> merge of #6 with #3</li>
<li><a href="https://huggingface.co/Naphula/Ancient-Awakening-12B">🧬 Ancient Awakening 12B</a>: <code>pdq</code> merge of #7 with #6, #3, #2, #1, Mag-Mell, Impish-Bloodmoon, and Muse</li>
<li><code>mpoa</code> <a href="https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration">ablation</a> applied to remove censorship <a href="https://huggingface.co/Naphula/Ancient-Awakening-12B-MPOA">(released seperately)</a></i></li>
<b>Note:</b> If you encounter any issues with the model then you can try using F5-stage6 or stage7 merges as these are likely more stable.
</ol>
<hr>
<h3 class="subheading">Stage 1: 🦑 Kraken Karcher</h3>
<pre><code>base_model: B:/12B/models--mistralai--Mistral-Nemo-Instruct-2407
models:
- model: B:/12B/models--aixonlab--Aether-12b
- model: B:/12B/models--aixonlab--Zinakha-12b
- model: B:/12B/models--allura-org--Bigger-Body-12b
- model: B:/12B/models--allura-org--MN-12b-RP-Ink
- model: B:/12B/models--allura-org--remnant-mn-12b
- model: B:/12B/models--anthracite-org--magnum-v4-12b
- model: B:/12B/models--ArliAI--Mistral-Nemo-12B-ArliAI-RPMax-v1.2
- model: B:/12B/models--Babsie--Opulus-12B-v3
- model: B:/12B/models--BeaverAI--mistral-doryV2-12b
- model: B:/12B/models--crestf411--nemo-sunfall-v0.6.1
- model: B:/12B/models--EpistemeAI2--Fireball-Mistral-Nemo-12B-Philos
- model: B:/12B/models--EpistemeAI--Mistral-Nemo-Instruct-12B-Philosophy-Math
- model: B:/12B/models--Fizzarolli--MN-12b-Rosier-v1
- model: B:/12B/models--HumanLLMs--Human-Like-Mistral-Nemo-Instruct-2407
- model: B:/12B/models--IIEleven11--Kalypso
- model: B:/12B/models--intervitens--mini-magnum-12b-v1.1
- model: B:/12B/models--jtatman--mistral_nemo_12b_reasoning_psychology_lora
- model: B:/12B/models--KOOWEEYUS--BlackSheep-RP-12B
- model: B:/12B/models--Lambent--Arsenic-Shahrazad-12B-v2
- model: B:/12B/models--Lambent--Arsenic-Shahrazad-12B-v3
- model: B:/12B/models--Lambent--arsenic-nemo-unleashed-12B
- model: B:/12B/models--Lambent--Gilded-Arsenic-12B
- model: B:/12B/models--mistralai--Mistral-Nemo-Instruct-2407
- model: B:/12B/models--nbeerbower--Lyra-Gutenberg-mistral-nemo-12B
- model: B:/12B/models--nbeerbower--Lyra4-Gutenberg-12B
- model: B:/12B/models--nbeerbower--mistral-nemo-bophades-12B
- model: B:/12B/models--nbeerbower--mistral-nemo-gutenberg-12B-v3
- model: B:/12B/models--nbeerbower--mistral-nemo-gutenberg-12B-v4
- model: B:/12B/models--nbeerbower--Mistral-Nemo-Gutenberg-Doppel-12B
- model: B:/12B/models--nbeerbower--Mistral-Nemo-Gutenberg-Encore-12B
- model: B:/12B/models--nbeerbower--Mistral-Nemo-Gutenberg-Vitus-12B
- model: B:/12B/models--nbeerbower--mistral-nemo-wissenschaft-12B
- model: B:/12B/models--NeverSleepHistorical--lumi-nemo-e2.0
- model: B:/12B/models--NeverSleep--Lumimaid-v0.2-12B
- model: B:/12B/models--nothingiisreal--Celeste-12B-V1.6
- model: B:/12B/models--nothingiisreal--MN-12B-Celeste-V1.9
- model: B:/12B/models--PocketDoc--Dans-DangerousWinds-V1.1.0-12b
- model: B:/12B/models--ReadyArt--Dark-Nexus-12B-v2.0
- model: B:/12B/models--ReadyArt--Forgotten-Safeword-12B-v4.0
- model: B:/12B/models--ReadyArt--Omega-Darker_The-Final-Directive-12B
- model: B:/12B/models--romaingrx--red-teamer-mistral-nemo
- model: B:/12B/models--Sao10K--MN-12B-Lyra-v1
- model: B:/12B/models--Sao10K--MN-12B-Lyra-v4
- model: B:/12B/models--shisa-ai--shisa-v2-mistral-nemo-12b
- model: B:/12B/models--sleepdeprived3--Christian-Bible-Expert-v2.0-12B
- model: B:/12B/models--SuperbEmphasis--MN-12b-RP-Ink-RP-Longform
- model: B:/12B/models--SuperbEmphasis--Omega-Darker_The-Final-Directive-Longform-Stage2-ERP-12B-v0.2
- model: B:/12B/models--TheDrummer--Rivermind-12B-v1
- model: B:/12B/models--TheDrummer--Rocinante-12B-v1
- model: B:/12B/models--TheDrummer--Rocinante-X-12B-v1
- model: B:/12B/models--Trappu--Nemo-Picaro-12B
- model: B:/12B/models--Undi95--LocalC-12B-e2.0
- model: B:/12B/models--VAGOsolutions--SauerkrautLM-Nemo-12b-Instruct
merge_method: karcher
parameters:
max_iter: 500
tol: 1.0e-9
dtype: float32
out_dtype: bfloat16
tokenizer:
source: union
chat_template: auto
name: 🦑 Kraken-Karcher-12B-v1</code></pre>
<h3 class="subheading">Stage 2: 🌌 Riemannian Redshift</h3>
<pre><code>models:
- model: B:/12B/models--Vortex5--Astral-Noctra-12B
- model: B:/12B/models--Vortex5--Azure-Starlight-12B
- model: B:/12B/models--Vortex5--Crimson-Constellation-12B
- model: B:/12B/models--Vortex5--Red-Synthesis-12B
- model: B:/12B/models--Vortex5--Shining-Seraph-12B
- model: B:/12B/models--Vortex5--Starlit-Shadow-12B
- model: B:/12B/models--Vortex5--Vermilion-Sage-12B
- model: B:/12B/models--Vortex5--Scarlet-Seraph-12B
- model: B:/12B/models--Vortex5--Maroon-Sunset-12B
- model: B:/12B/models--Vortex5--Amber-Starlight-12B
merge_method: karcher
parameters:
max_iter: 1000
tol: 1.0e-9
dtype: float32
out_dtype: bfloat16
tokenizer:
source: union
chat_template: auto
name: 🌌 Riemannian-Redshift-12B-v1</code></pre>
<h3 class="subheading">Stage 3: RedKFlux</h3>
<pre><code>models:
- model: C:\mergekit-main\merged_model_redshift
- model: C:\mergekit-main\merged_model_kraken_karcher
merge_method: flux
parameters:
eta: 1.2
tol: 1.0e-9
max_iter: 1000
kappa: 0.8
dtype: float32
out_dtype: bfloat16
tokenizer:
source: union
chat_template: auto
name: RedKFlux</code></pre>
<h3 class="subheading">Stage 4: RedKFluxMell</h3>
<pre><code>models:
- model: C:\mergekit-main\merged_model_RedKFlux
- model: B:\8B\models--inflatebot--MN-12B-Mag-Mell-R1
merge_method: arcee_fusion
tukey_fence: 1.5
base_model: C:\mergekit-main\merged_model_RedKFlux
dtype: float32
out_dtype: bfloat16
tokenizer:
source: base
name: RedKFluxMell</code></pre>
<h3 class="subheading">Stage 5: BloodKraken</h3>
<pre><code>models:
- model: C:\mergekit-main\merged_model_RedKFluxMell
- model: B:\8B\models--SicariusSicariiStuff--Impish_Bloodmoon_12B
merge_method: arcee_fusion
tukey_fence: 1.5
base_model: C:\mergekit-main\merged_model_RedKFluxMell
dtype: float32
out_dtype: bfloat16
tokenizer:
source: base
name: BloodKraken</code></pre>
<h3 class="subheading">Stage 6: BloodKrakenMuse</h3>
<pre><code>models:
- model: C:\mergekit-main\merged_model_BloodKraken
- model: B:\8B\models--LatitudeGames--Muse-12B
merge_method: arcee_fusion
tukey_fence: 1.5
base_model: C:\mergekit-main\merged_model_BloodKraken
dtype: float32
out_dtype: bfloat16
tokenizer:
source: base
name: BloodKrakenMuse</code></pre>
<h3 class="subheading">Stage 7: Ramplus_tl</h3>
<pre><code>merge_method: ramplus_tl
base_model: C:\mergekit-main\merged_model_BloodKrakenMuse
models:
- model: C:\mergekit-main\merged_model_BloodKrakenMuse
- model: C:\mergekit-main\merged_model_RedKFlux
parameters:
epsilon: 0.001 # Increased from 1e-5 to 1e-3 for denser SFT/DPO task vectors
r: 0.25 # Increased from 0.1 to 0.2-0.3 for better SFT behavior preservation
alpha: 0.4 # Increased from 0.2 to 0.4 for enhanced rescaling
dtype: float32
out_dtype: bfloat16
tokenizer:
source: base
name: Stage7</code></pre>
<h3 class="subheading">Stage 8: 🧬 Ancient Awakening</h3>
<pre><code>merge_method: pdq
pdq_base_yaml: C:\mergekit-main\stage7.yaml
pdq_base_model: C:\mergekit-main\merged_model_stage7
output_dir: C:\mergekit-main\stage8_pdq
base_model: C:\mergekit-main\merged_model_BloodKrakenMuse
models:
- model: C:\mergekit-main\merged_model_BloodKrakenMuse
- model: B:\12B\models--LatitudeGames--Muse-12B
- model: B:\12B\models--SicariusSicariiStuff--Impish_Bloodmoon_12B
- model: B:\12B\models--inflatebot--MN-12B-Mag-Mell-R1
- model: C:\mergekit-main\merged_model_RedKFlux
- model: C:\mergekit-main\merged_model_redshift
- model: C:\mergekit-main\merged_model_kraken_karcher
parameters:
chi: 0.15
iota: 0.1
nu: 24
gamma: 1.0
zeta: 16
sigma: 0.5
density: 0.9
epsilon: 0.099
lambda: 1.0
lazy_unpickle: True
random_seed: 420
name: 🧬 Ancient-Awakening-12B</code></pre>
<h3 class="subheading">Stage 9: Magnitude-Preserving Othogonalized Ablation</h3>
<pre><code># python measure.py -m C:\mergekit-main\f8_pdq -o C:\mergekit-main\f8_pdq\ablit_proj --batch-size 8 --projected
# python analyze_old.py C:\mergekit-main\f8_pdq\ablit_proj -c
# sharded_ablate.py magmell.yml --normpreserve --projected
#
# The model to be ablated.
model: C:\mergekit-main\f8_pdq
#
# The measurement file generated by measure.py for the model.
measurements: C:\mergekit-main\f8_pdq\ablit_proj
#
# The directory where the new, ablated model will be saved.
output: C:\mergekit-main\f8_pdq\ablit_biproj\
#
# The list of ablation operations to perform.
# Strategy: Use the single best refusal direction from the peak signal layer (29)
# and apply it across all relevant mid-to-late layers.
ablate:
# Start ablating from the mid-layers where the signal begins to strengthen.
- layer: 0
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 1
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 2
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 3
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 4
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 5
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 6
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 7
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 8
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 9
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 10
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 11
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 12
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 13
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 14
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 15
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 16
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 17
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 18
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 19
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 20
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 21
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 22
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 23
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 24
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 25
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 26
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 27
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 28
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 29
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 30
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 31
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 32
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 33
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 34
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 35
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 36
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 37
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 38
measurement: 29
scale: 1.2
sparsity: 0.00
- layer: 39
measurement: 29
scale: 1.2
sparsity: 0.00</code></pre>
</div>
</div>
</body>
</html>
Author: Naphula
Likes: 2
Downloads: 0
Tags: transformers, safetensors, mistral, text-generation, creative, creative writing, fiction writing, plot generation, sub-plot generation, story generation, scene continue, storytelling, fiction story, science fiction, romance, all genres, story, writing, vivid prosing, vivid writing, fiction, roleplaying, float32, swearing, rp, horror, nemo, merge, mergekit, karcher, flux, arcee_fusion, ramplus_tl, pdq, conversational, en, arxiv:2601.13572, base_model:ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.2, base_model:merge:ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.2, base_model:Babsie/Opulus-12B-v3, base_model:merge:Babsie/Opulus-12B-v3, base_model:BeaverAI/mistral-doryV2-12b, base_model:merge:BeaverAI/mistral-doryV2-12b, base_model:EldritchLabs/Kraken-Karcher-12B-v1, base_model:merge:EldritchLabs/Kraken-Karcher-12B-v1, base_model:EpistemeAI/Mistral-Nemo-Instruct-12B-Philosophy-Math, base_model:merge:EpistemeAI/Mistral-Nemo-Instruct-12B-Philosophy-Math, base_model:EpistemeAI2/Fireball-Mistral-Nemo-12B-Philos, base_model:merge:EpistemeAI2/Fireball-Mistral-Nemo-12B-Philos, base_model:Fizzarolli/MN-12b-Rosier-v1, base_model:merge:Fizzarolli/MN-12b-Rosier-v1, base_model:HumanLLMs/Human-Like-Mistral-Nemo-Instruct-2407, base_model:merge:HumanLLMs/Human-Like-Mistral-Nemo-Instruct-2407, base_model:IIEleven11/Kalypso, base_model:merge:IIEleven11/Kalypso, base_model:KOOWEEYUS/BlackSheep-RP-12B, base_model:merge:KOOWEEYUS/BlackSheep-RP-12B, base_model:Lambent/Arsenic-Shahrazad-12B-v2, base_model:merge:Lambent/Arsenic-Shahrazad-12B-v2, base_model:Lambent/Arsenic-Shahrazad-12B-v3, base_model:merge:Lambent/Arsenic-Shahrazad-12B-v3, base_model:Lambent/Gilded-Arsenic-12B, base_model:merge:Lambent/Gilded-Arsenic-12B, base_model:Lambent/arsenic-nemo-unleashed-12B, base_model:merge:Lambent/arsenic-nemo-unleashed-12B, base_model:LatitudeGames/Muse-12B, base_model:merge:LatitudeGames/Muse-12B, base_model:Naphula-Archives/F5-stage6-12B, base_model:merge:Naphula-Archives/F5-stage6-12B, base_model:Naphula-Archives/F5-stage7-12B, base_model:merge:Naphula-Archives/F5-stage7-12B, base_model:Naphula/Riemannian-Redshift-12B-v1, base_model:merge:Naphula/Riemannian-Redshift-12B-v1, base_model:NeverSleep/Lumimaid-v0.2-12B, base_model:merge:NeverSleep/Lumimaid-v0.2-12B, base_model:NeverSleepHistorical/lumi-nemo-e2.0, base_model:merge:NeverSleepHistorical/lumi-nemo-e2.0, base_model:PocketDoc/Dans-DangerousWinds-V1.1.0-12b, base_model:merge:PocketDoc/Dans-DangerousWinds-V1.1.0-12b, base_model:ReadyArt/Dark-Nexus-12B-v2.0, base_model:merge:ReadyArt/Dark-Nexus-12B-v2.0, base_model:ReadyArt/Forgotten-Safeword-12B-v4.0, base_model:merge:ReadyArt/Forgotten-Safeword-12B-v4.0, base_model:ReadyArt/Omega-Darker_The-Final-Directive-12B, base_model:merge:ReadyArt/Omega-Darker_The-Final-Directive-12B, base_model:Sao10K/MN-12B-Lyra-v1, base_model:merge:Sao10K/MN-12B-Lyra-v1, base_model:Sao10K/MN-12B-Lyra-v4, base_model:merge:Sao10K/MN-12B-Lyra-v4, base_model:SicariusSicariiStuff/Impish_Bloodmoon_12B, base_model:merge:SicariusSicariiStuff/Impish_Bloodmoon_12B, base_model:SuperbEmphasis/MN-12b-RP-Ink-RP-Longform, base_model:merge:SuperbEmphasis/MN-12b-RP-Ink-RP-Longform, base_model:SuperbEmphasis/Omega-Darker_The-Final-Directive-Longform-Stage2-ERP-12B-v0.2, base_model:merge:SuperbEmphasis/Omega-Darker_The-Final-Directive-Longform-Stage2-ERP-12B-v0.2, base_model:TheDrummer/Rivermind-12B-v1, base_model:merge:TheDrummer/Rivermind-12B-v1, base_model:TheDrummer/Rocinante-12B-v1, base_model:merge:TheDrummer/Rocinante-12B-v1, base_model:TheDrummer/Rocinante-X-12B-v1, base_model:merge:TheDrummer/Rocinante-X-12B-v1, base_model:Trappu/Nemo-Picaro-12B, base_model:merge:Trappu/Nemo-Picaro-12B, base_model:Undi95/LocalC-12B-e2.0, base_model:merge:Undi95/LocalC-12B-e2.0, base_model:VAGOsolutions/SauerkrautLM-Nemo-12b-Instruct, base_model:merge:VAGOsolutions/SauerkrautLM-Nemo-12b-Instruct, base_model:Vortex5/Amber-Starlight-12B, base_model:merge:Vortex5/Amber-Starlight-12B, base_model:Vortex5/Astral-Noctra-12B, base_model:merge:Vortex5/Astral-Noctra-12B, base_model:Vortex5/Azure-Starlight-12B, base_model:merge:Vortex5/Azure-Starlight-12B, base_model:Vortex5/Crimson-Constellation-12B, base_model:merge:Vortex5/Crimson-Constellation-12B, base_model:Vortex5/Maroon-Sunset-12B, base_model:merge:Vortex5/Maroon-Sunset-12B, base_model:Vortex5/Red-Synthesis-12B, base_model:merge:Vortex5/Red-Synthesis-12B, base_model:Vortex5/Scarlet-Seraph-12B, base_model:merge:Vortex5/Scarlet-Seraph-12B, base_model:Vortex5/Shining-Seraph-12B, base_model:merge:Vortex5/Shining-Seraph-12B, base_model:Vortex5/Starlit-Shadow-12B, base_model:merge:Vortex5/Starlit-Shadow-12B, base_model:Vortex5/Vermilion-Sage-12B, base_model:merge:Vortex5/Vermilion-Sage-12B, base_model:aixonlab/Aether-12b, base_model:merge:aixonlab/Aether-12b, base_model:aixonlab/Zinakha-12b, base_model:merge:aixonlab/Zinakha-12b, base_model:allura-org/Bigger-Body-12b, base_model:merge:allura-org/Bigger-Body-12b, base_model:allura-org/MN-12b-RP-Ink, base_model:merge:allura-org/MN-12b-RP-Ink, base_model:allura-org/remnant-mn-12b, base_model:merge:allura-org/remnant-mn-12b, base_model:anthracite-org/magnum-v4-12b, base_model:merge:anthracite-org/magnum-v4-12b, base_model:crestf411/nemo-sunfall-v0.6.1, base_model:merge:crestf411/nemo-sunfall-v0.6.1, base_model:inflatebot/MN-12B-Mag-Mell-R1, base_model:merge:inflatebot/MN-12B-Mag-Mell-R1, base_model:intervitens/mini-magnum-12b-v1.1, base_model:merge:intervitens/mini-magnum-12b-v1.1, base_model:jtatman/mistral_nemo_12b_reasoning_psychology_lora, base_model:merge:jtatman/mistral_nemo_12b_reasoning_psychology_lora, base_model:mistralai/Mistral-Nemo-Instruct-2407, base_model:merge:mistralai/Mistral-Nemo-Instruct-2407, base_model:nbeerbower/Lyra-Gutenberg-mistral-nemo-12B, base_model:merge:nbeerbower/Lyra-Gutenberg-mistral-nemo-12B, base_model:nbeerbower/Lyra4-Gutenberg-12B, base_model:merge:nbeerbower/Lyra4-Gutenberg-12B, base_model:nbeerbower/Mistral-Nemo-Gutenberg-Doppel-12B, base_model:merge:nbeerbower/Mistral-Nemo-Gutenberg-Doppel-12B, base_model:nbeerbower/Mistral-Nemo-Gutenberg-Encore-12B, base_model:merge:nbeerbower/Mistral-Nemo-Gutenberg-Encore-12B, base_model:nbeerbower/Mistral-Nemo-Gutenberg-Vitus-12B, base_model:merge:nbeerbower/Mistral-Nemo-Gutenberg-Vitus-12B, base_model:nbeerbower/mistral-nemo-bophades-12B, base_model:merge:nbeerbower/mistral-nemo-bophades-12B, base_model:nbeerbower/mistral-nemo-gutenberg-12B-v3, base_model:merge:nbeerbower/mistral-nemo-gutenberg-12B-v3, base_model:nbeerbower/mistral-nemo-gutenberg-12B-v4, base_model:merge:nbeerbower/mistral-nemo-gutenberg-12B-v4, base_model:nbeerbower/mistral-nemo-wissenschaft-12B, base_model:merge:nbeerbower/mistral-nemo-wissenschaft-12B, base_model:nothingiisreal/Celeste-12B-V1.6, base_model:merge:nothingiisreal/Celeste-12B-V1.6, base_model:nothingiisreal/MN-12B-Celeste-V1.9, base_model:merge:nothingiisreal/MN-12B-Celeste-V1.9, base_model:romaingrx/red-teamer-mistral-nemo, base_model:merge:romaingrx/red-teamer-mistral-nemo, base_model:shisa-ai/shisa-v2-mistral-nemo-12b, base_model:merge:shisa-ai/shisa-v2-mistral-nemo-12b, base_model:sleepdeprived3/Christian-Bible-Expert-v2.0-12B, base_model:merge:sleepdeprived3/Christian-Bible-Expert-v2.0-12B, license:apache-2.0, text-generation-inference, endpoints_compatible, region:us
YARlabs/v5_Embedding_0.5B
pipeline_tag: feature-extraction
library_name: transformers
tags:
- endpoints
- embedding
- retrieval
- hyperbolic-geometry
- matryoshka
YAR.INK v5_Embedding: The First Native Hyperbolic Text Model
Inspired by the technical excellence of the Qwen3-embedding series, we introduce v5_Embedding—the world's first native hyperbolic text embedding model. v5_Embedding serves as a universal semantic engine, empirically demonstrating that non-Euclidean geometries—specifically Lobachevsky, Lorentz, and Klein manifolds—provide a fundamentally more expressive representational space for hierarchical textual data than traditional Euclidean geometry.
Developed through technical synthesis and collaborative exchange with experts from organizations including Google, Alibaba, Baidu, and Apple, this project represents a breakthrough for the open-source community. It proves that independent research can drive fundamental architectural innovations rather than merely following established industry paradigms.
v5_Embedding establishes a new frontier for researchers and engineers globally, enabling superior retrieval performance with significantly reduced computational overhead and latency. We envision v5_Embedding as a catalyst for a new industry standard. Combined with HyperspaceDB, it empowers the democratization of hyper-efficient AI—from next-generation chatbots and autonomous robotics to advanced research laboratories.
YAR.INK v5_Embedding is a state-of-the-art embedding model trained natively into Hyperbolic (Lorentz) space utilizing a custom Matryoshka Representation Learning (MRL) head.
It is the first text embedding model designed from the ground up for highly precise context retrieval, clustering, and structural knowledge discovery in massive datasets while operating in non-Euclidean space.
🔥 Key Breakthroughs
Hyperbolic geometry naturally models hierarchical data (like language taxonomies and knowledge bases) exponentially better than Euclidean space. By combining this with Matryoshka configurations, our model achieves unparalleled efficiency:
- Over 60% Less RAM Consumption: Operates efficiently on ~642MB of RAM (Total Footprint) for v5_Embedding 64D, compared to 2553MB for high-performance Qwen3-4B baselines.
- 40x to 640x Storage Efficiency: Massive reduction in vector database footprint (from 5600KB down to 8.75KB per batch depending on the chosen Matryoshka dimension).
- Superior Quality/Size Ratio: 16D Lorentz retains 97.2% of Qwen3-4B (2560D) quality while being 160x smaller.
📊 Performance vs Efficiency Benchmark (Lorentz vs Qwen3 Baselines)
| Model | Recall@1 | MRR@10 | Time (s) | Speed (v/s) | RAM (MB) | CPU (%) | Vector Size (Bytes) | DB Size (KB) | Compression |
|-------|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
| v5_Embedding_4d Lorentz | 0.7821 | 0.8596 | 46.0 | 12.2 | 4555.6 | 2.5 | 16 | 8.75 | 640x |
| v5_Embedding_8d Lorentz | 0.8393 | 0.8953 | 46.7 | 12.0 | 4571.1 | 2.3 | 32 | 17.50 | 320x |
| v5_Embedding_16d Lorentz | 0.8786 | 0.9276 | 46.3 | 12.1 | 4601.9 | 2.2 | 64 | 35.00 | 160x |
| v5_Embedding_32d Lorentz | 0.9071 | 0.9452 | 46.0 | 12.2 | 4605.5 | 2.3 | 128 | 70.00 | 80x |
| v5_Embedding_64d Lorentz | 0.9393 | 0.9616 | 46.0 | 12.2 | 4609.4 | 2.3 | 256 | 140.00 | 40x |
| v5_Embedding_128d Lorentz | 0.9429 | 0.9650 | 46.0 | 12.2 | 4593.4 | 2.2 | 512 | 280.00 | 20x |
| Qwen3-0.6B-256 Euclidean | 0.8857 | 0.9300 | 46.4 | 12.1 | 12488.9 | 3.8 | 1024 | 560.00 | 10x |
| Qwen3-0.6B-512 Euclidean | 0.8929 | 0.9324 | 46.4 | 12.1 | 12535.2 | 3.6 | 2048 | 1120.00 | 5x |
| Qwen3-0.6B-1024 Euclidean | 0.9000 | 0.9389 | 46.4 | 12.1 | 12537.8 | 3.5 | 4096 | 2240.00 | 2x |
| Qwen3-4B-256 Euclidean | 0.8679 | 0.9197 | 235.9 | 2.4 | 34395.1 | 12.2 | 1024 | 560.00 | 10x |
| Qwen3-4B-512 Euclidean | 0.8929 | 0.9357 | 236.7 | 2.4 | 24326.4 | 12.1 | 2048 | 1120.00 | 5x |
| Qwen3-4B-1024 Euclidean | 0.9071 | 0.9459 | 236.6 | 2.4 | 23784.7 | 12.2 | 4096 | 2240.00 | 2x |
| Qwen3-4B-2560 Euclidean | 0.9036 | 0.9422 | 236.3 | 2.4 | 23785.3 | 12.2 | 10240 | 5600.00 | baseline |
| Qwen3-8B-256 Euclidean | 0.8607 | 0.9174 | 413.4 | 1.4 | 68517.8 | 24.3 | 1024 | 560.00 | 10x |
| Qwen3-8B-512 Euclidean | 0.8893 | 0.9357 | 401.5 | 1.4 | 68539.9 | 24.3 | 2048 | 1120.00 | 5x |
| Qwen3-8B-1024 Euclidean | 0.8893 | 0.9332 | 401.4 | 1.4 | 68592.2 | 24.9 | 4096 | 2240.00 | 2x |
| Qwen3-8B-2048 Euclidean | 0.9000 | 0.9424 | 401.4 | 1.4 | 68644.5 | 24.9 | 8192 | 4480.00 | 1.25x |
| Qwen3-8B-2560 Euclidean | 0.8964 | 0.9398 | 401.4 | 1.4 | 68720.6 | 25.5 | 10240 | 5600.00 | baseline |
| Qwen3-8B-4096 Euclidean | 0.8893 | 0.9358 | 401.4 | 1.4 | 68801.1 | 25.8 | 16384 | 8960.00 | 0.62x |
💡 Key Findings
- Extreme Compression: 160x smaller vector (16-dim Lorentz vs 2560-dim Qwen3-4B Euclidean).
- High Retention: v5_Embedding 16D retains 97.2% of Qwen3-4B recall quality with massive resource savings.
- Scaling Laws: Unlike Euclidean MRL, Lorentz embeddings maintain superior separation integrity even at ultra-low (4D-8D) dimensions.
🧠 Architecture & Compatibility
- Context Window: 512 tokens. While the architecture technically supports larger contexts, this model is specifically distilled and optimized for the 512-token limit typical of high-performance retrieval tasks.
- Tokenizer: Leverages the industry-standard Qwen2Tokenizer (BPE). This ensures that YAR.INK v5_Embedding is ready to use with any standard library (Hugging Face, vLLM, LangChain) without extra configuration, while benefiting from one of the most efficient sub-word tokenization algorithms available.
🚀 Usage
You must use trust_remote_code=True because this model relies on custom architecture (YarEmbeddingModel, YarConfig) provided directly inside this repository!
1. Generating Embeddings
import torch
from transformers import AutoTokenizer, AutoModel
model_id = "YARlabs/v5_Embedding"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModel.from_pretrained(model_id, trust_remote_code=True)
model.eval()
texts = [
"What is the capital of France?",
"Paris is the capital of France.",
"Berlin is the capital of Germany."
]
inputs = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
with torch.no_grad():
# Pass target_dim parameter to explicitly slice the Matryoshka dimensions
# Valid options: 4, 8, 16, 32, 64, 128
# The output is a tensor of shape (batch, target_dim + 1) -> (t, spatial_dims)
lorentz_vectors = model(**inputs, target_dim=64)
print(lorentz_vectors.shape)
# Output: torch.Size([2, 65]) (1 time dimension + 64 spatial dimensions)
2. Distance Calculation (Crucial)
For vector search, clustering, NEVER use Cosine Similarity or Euclidean L2 distance!
Vectors reside on a Hyperboloid, so you must use the Lorentz Distance.
def lorentz_dist(u: torch.Tensor, v: torch.Tensor) -> torch.Tensor:
"""
Computes the exact Hyperbolic distance between two batches of Lorentz vectors.
"""
# Lorentz Metric signature (- + + ...)
u_0, u_x = u[..., 0:1], u[..., 1:]
v_0, v_x = v[..., 0:1], v[..., 1:]
# Minkowski inner product
inner_product = -u_0 * v_0 + (u_x * v_x).sum(dim=-1, keepdim=True)
# Avoid numerical instability inside acosh for extremely close vectors
inner_product = torch.min(inner_product, torch.tensor(-1.0, device=u.device))
return torch.acosh(-inner_product).squeeze(-1)
# Calculate distance between text 1 and text 2
distance = lorentz_dist(lorentz_vectors[0], lorentz_vectors[1])
print(f"Hyperbolic Distance: {distance.item():.4f}")
🛡️ Intended Use Cases
- Nex-Gen Vector Search: Leverage HyperspaceDB to build the world's most efficient semantic search engines. Achieve 160x data compression without sacrificing the "Large Model" quality, enabling billions-scale search on mid-range hardware.
- Infinite Hierarchy Explorer: Map entire global taxonomies, corporate knowledge bases, or scientific ontologies natively. Lorentz space allows you to represent deep tree-like structures with zero distortion, which is mathematically impossible in Euclidean space.
- Edge-AI & Satellite RAG: Deploy state-of-the-art retrieval systems on hardware with extreme constraints (IoT, mobile, orbiting stations). Use 4D-16D vectors to reduce bandwidth and storage while maintaining >90% recall.
- Latent Knowledge Graph Discovery: Manifest hidden structural relationships in unstructured text. Automatically group concepts based on hyper-latent hierarchies for deep analytical insights into complex datasets.
- Privacy-Driven Embeddings: Perform high-quality retrieval with ultra-low dimensions (4D-8D), making reverse-engineering of original content exponentially harder while retaining the semantic core of the data.
🔗 LangChain Integration
We provide a langchain_wrapper.py in the repository that natively subclasses LangChain's Embeddings interface.
from langchain_wrapper import YarHyperbolicEmbeddings
# Initialize the embedding model (downloads automatically from YARlabs/v5_Embedding_0.5B)
embeddings = YarHyperbolicEmbeddings(target_dim=128)
vectors = embeddings.embed_documents(["Hello World!"])
Note: Ensure your VectorStore supports custom distance metrics, as these will be returned as Lorentz vectors, where Cosine similarity will not work properly!
License
Provided explicitly for YAR.INK infrastructure.
Author: YARlabs
Likes: 2
Downloads: 0
Tags: transformers, onnx, safetensors, yar, feature-extraction, endpoints, embedding, retrieval, hyperbolic-geometry, matryoshka, custom_code, region:us
tiiuae/siglino-30M
license: apache-2.0
tags:
- vision
- feature-extraction
- image-feature-extraction
SigLino-30M
Accepted at CVPR 2026

This work stems from the CVPR 2026 AMoE paper, which designs and applies distillation into a Mixture-of-Experts (MoE) vision architecture. We have chosen the name SigLino for better clarity (SigLIP2 + DINOv3).
Dense variant of SigLino. 30M parameters.
Part of the SigLino model family.
Usage
import torch
from PIL import Image
from transformers import AutoModel, AutoImageProcessor
model_id = "tiiuae/siglino-30M"
model = AutoModel.from_pretrained(model_id, trust_remote_code=True).to("cuda", dtype=torch.bfloat16)
processor = AutoImageProcessor.from_pretrained(model_id, trust_remote_code=True)
image = Image.open("image.jpg").convert("RGB")
inputs = processor(image, return_tensors="pt").to("cuda")
inputs["pixel_values"] = inputs["pixel_values"].to(torch.bfloat16)
with torch.no_grad():
outputs = model(**inputs)
# Options: 'siglino' (384d), 'siglip2' (1152d), 'dinov3' (1024d)
patch_features = outputs["patch_features"]["siglino"] # (Batch, Tokens, 384)
summary_features = outputs["summary_features"]["siglip2"] # (Batch, 1152)
Model Details
| Property | Value |
|----------|-------|
| Architecture | Dense |
| Parameters | 0.03B |
| Layers | 12 |
| Hidden Dim | 384 |
| FFN Dim | 1536 |
| Patch Size | 16x16 |
| Teachers | DINOv3, SigLIP2 |
Results (512x512, ensemble features)
| Task | Metric | Score |
|------|--------|-------|
| kNN (ImageNet) | Acc | 79.0 |
| kNN (6-dataset avg) | Acc | 83.3 |
| Zero-shot cls (ImageNet) | Acc | 65.1 |
| Flickr30K I2T | R@1 | 82.2 |
| MSCOCO I2T | R@1 | 59.7 |
| Pascal VOC (1024) | mIoU | 82.1 |
| Cityscapes (1024) | mIoU | 59.2 |
Citation
@article{chaybouti2025amoe,
title={AMoE: Agglomerative Mixture-of-Experts Vision Foundation Models},
author={Chaybouti, Sofian and Narayan, Sanath and Dahou, Yasser and Le Khac, Phuc H. and Singh, Ankit and Huynh, Ngoc Dung and Para, Wamiq Reyaz and Kuehne, Hilde and Hacid, Hakim},
journal={arXiv preprint arXiv:2512.20157},
year={2025}
}
Author: tiiuae
Likes: 2
Downloads: 0
Tags: safetensors, siglino, vision, feature-extraction, image-feature-extraction, custom_code, arxiv:2512.20157, license:apache-2.0, region:us