Todays AI Summary

AI Research Highlights: Reasoning, Tool Use, and Model Efficiency Take Center Stage

Today's AI landscape is buzzing with advancements across several key areas, including enhanced reasoning capabilities in large language models, improved tool utilization, and techniques for boosting model efficiency.

Research Papers:

  • Generalizable Geometric Image Caption Synthesis: A new paper introduces a reinforcement learning-based approach (RLVR) to improve the generation of high-quality image-text datasets for geometric problem-solving. By refining captions for geometric images, the pipeline enhances the general reasoning capabilities of multimodal LLMs, leading to accuracy improvements in various tasks.
  • FlowRL: Matching Reward Distributions for LLM Reasoning: This paper proposes a novel reinforcement learning method called FlowRL, which focuses on matching the full reward distribution rather than simply maximizing rewards. This approach promotes diverse exploration and generalizable reasoning trajectories, resulting in significant improvements on math and code reasoning tasks.
  • Orion: Fuzzing Workflow Automation: Orion is a framework that automates the manual bottlenecks of fuzzing by integrating LLM reasoning with traditional tools. It reduces human effort significantly and demonstrates its effectiveness by discovering previously unknown vulnerabilities.
  • Fast and Fluent Diffusion Language Models: This research addresses the long decoding-window problem in diffusion-based language models by introducing Convolutional decoding (Conv) and Rejecting Rule-based Fine-Tuning (R2FT). These methods achieve state-of-the-art results on open-ended generation benchmarks with improved speed and quality.
  • Internalizing Self-Consistency in Language Models: This paper introduces Multi-Agent Consensus Alignment (MACA), a reinforcement learning framework that post-trains models to favor reasoning trajectories aligned with their internal consensus. MACA drives substantial improvements across self-consistency, single-agent reasoning, sampling-based inference, and multi-agent ensemble decision-making.

Models:

  • LongCat-Flash-Thinking: Meituan-Longcat has released LongCat-Flash-Thinking, a 560 billion parameter large reasoning model (LRM) featuring a Mixture-of-Experts (MoE) architecture. It uses a dynamic computation mechanism to optimize computational efficiency and performance. The model is trained using a domain-parallel RL training methodology and is designed for general, formal, and agentic reasoning. It achieves strong performance across a range of benchmarks.
  • Qwen3-4B-toolcalling-gguf-codex: This model is a specialized version of Qwen3 4B, fine-tuned for tool calling. It is optimized for local deployment with GGUF format and is trained on 60K function calling examples. It offers high function call accuracy and is suitable for building AI agents and local coding assistants.
  • Qwen3-4b-toolcall-gguf-llamacpp-codex: Another tool-calling focused model based on Qwen3, this version is optimized for use with llama.cpp, enabling local deployment with minimal VRAM requirements.
  • Qwen-7B-toolcalling-ReSearch-gguf-Q8_0-codex: This model combines tool-calling capabilities with the ReSearch-Qwen-7B model, which is trained to reason with search via reinforcement learning. It is packaged in GGUF format for efficient local inference.

Key Takeaways:

  • Reasoning is a Major Focus: Several research efforts are dedicated to improving the reasoning abilities of LLMs, particularly in geometric, mathematical, and coding domains.
  • Tool Use is Gaining Traction: Models like Qwen3-4B-toolcalling-gguf-codex demonstrate the increasing importance of enabling LLMs to effectively utilize external tools and APIs.
  • Efficiency Matters: Techniques like quantization (GGUF format) and dynamic computation mechanisms (LongCat-Flash-Thinking) are crucial for deploying large models in resource-constrained environments.
  • RL is a Powerful Tool: Reinforcement learning continues to be a valuable approach for training and aligning LLMs, as evidenced by FlowRL and MACA.

AI Papers for 2026-03-23

NavTrust: Benchmarking Trustworthiness for Embodied Navigation

There are two major categories of embodied navigation: Vision-Language Navigation (VLN), where agents navigate by following natural language instructions; and Object-Goal Navigation (OGN), where agents navigate to a specified target object. However, existing work primarily evaluates model performance under nominal conditions, overlooking the potential corruptions that arise in real-world settings. To address this gap, we present NavTrust, a unified benchmark that systematically corrupts input modalities, including RGB, depth, and instructions, in realistic scenarios and evaluates their impact on navigation performance. To our best knowledge, NavTrust is the first benchmark that exposes embodied navigation agents to diverse RGB-Depth corruptions and instruction variations in a unified framework. Our extensive evaluation of seven state-of-the-art approaches reveals substantial performance degradation under realistic corruptions, which highlights critical robustness gaps and provides a roadmap toward more trustworthy embodied navigation systems. Furthermore, we systematically evaluate four distinct mitigation strategies to enhance robustness against RGB-Depth and instructions corruptions. Our base models include Uni-NaVid and ETPNav. We deployed them on a real mobile robot and observed improved robustness to corruptions. The project website is: https://navtrust.github.io.

FinTradeBench: A Financial Reasoning Benchmark for LLMs

Real-world financial decision-making is a challenging problem that requires reasoning over heterogeneous signals, including company fundamentals derived from regulatory filings and trading signals computed from price dynamics. Recently, with the advancement of Large Language Models (LLMs), financial analysts have begun to use them for financial decision-making tasks. However, existing financial question answering benchmarks for testing these models primarily focus on company balance sheet data and rarely evaluate reasoning over how company stocks trade in the market or their interactions with fundamentals. To take advantage of the strengths of both approaches, we introduce FinTradeBench, a benchmark for evaluating financial reasoning that integrates company fundamentals and trading signals. FinTradeBench contains 1,400 questions grounded in NASDAQ-100 companies over a ten-year historical window. The benchmark is organized into three reasoning categories: fundamentals-focused, trading-signal-focused, and hybrid questions requiring cross-signal reasoning. To ensure reliability at scale, we adopt a calibration-then-scaling framework that combines expert seed questions, multi-model response generation, intra-model self-filtering, numerical auditing, and human-LLM judge alignment. We evaluate 14 LLMs under zero-shot prompting and retrieval-augmented settings and witness a clear performance gap. Retrieval substantially improves reasoning over textual fundamentals, but provides limited benefit for trading-signal reasoning. These findings highlight fundamental challenges in the numerical and time-series reasoning for current LLMs and motivate future research in financial intelligence.

F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World

We present F2LLM-v2, a new family of general-purpose, multilingual embedding models in 8 distinct sizes ranging from 80M to 14B. Trained on a newly curated composite of 60 million publicly available high-quality data samples, F2LLM-v2 supports more than 200 languages, with a particular emphasis on previously underserved mid- and low-resource languages. By integrating a two-stage LLM-based embedding training pipeline with matryoshka learning, model pruning, and knowledge distillation techniques, we present models that are far more efficient than previous LLM-based embedding models while retaining competitive performances. Extensive evaluations confirm that F2LLM-v2-14B ranks first on 11 MTEB benchmarks, while the smaller models in the family also set a new state of the art for resource-constrained applications. To facilitate open-source embedding model research, we release all models, data, code, and intermediate checkpoints.

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. Despite its compact size, its mathematical and coding reasoning performance approaches that of frontier open models. It is the second open-weight LLM, after DeepSeekV3.2-Speciale-671B-A37B, to achieve Gold Medal-level performance in the 2025 International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI), and the ICPC World Finals, demonstrating remarkably high intelligence density with 20x fewer parameters. In contrast to Nemotron-Cascade 1, the key technical advancements are as follows. After SFT on a meticulously curated dataset, we substantially expand Cascade RL to cover a much broader spectrum of reasoning and agentic domains. Furthermore, we introduce multi-domain on-policy distillation from the strongest intermediate teacher models for each domain throughout the Cascade RL process, allowing us to efficiently recover benchmark regressions and sustain strong performance gains along the way. We release the collection of model checkpoint and training data.

DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative Latent Denoising

Understanding and generating 3D objects as compositions of meaningful parts is fundamental to human perception and reasoning. However, most text-to-3D methods overlook the semantic and functional structure of parts. While recent part-aware approaches introduce decomposition, they remain largely geometry-focused, lacking semantic grounding and failing to model how parts align with textual descriptions or their inter-part relations. We propose DreamPartGen, a framework for semantically grounded, part-aware text-to-3D generation. DreamPartGen introduces Duplex Part Latents (DPLs) that jointly model each part's geometry and appearance, and Relational Semantic Latents (RSLs) that capture inter-part dependencies derived from language. A synchronized co-denoising process enforces mutual geometric and semantic consistency, enabling coherent, interpretable, and text-aligned 3D synthesis. Across multiple benchmarks, DreamPartGen delivers state-of-the-art performance in geometric fidelity and text-shape alignment.

$R$-equivalence on Cubic Surfaces I: Existing Cases with Non-Trivial Universal Equivalence

Let $V$ be a smooth cubic surface over a $p$-adic field $k$ with good reduction. Swinnerton-Dyer (1981) proved that $R$-equivalence is trivial on $V(k)$ except perhaps if $V$ is one of three special types--those whose $R$-equivalence he could not bound by proving the universal (admissible) equivalence is trivial. We consider all surfaces $V$ currently known to have non-trivial universal equivalence. Beyond being intractable to Swinnerton-Dyer's approach, we observe that if these surfaces also had non-trivial $R$-equivalence, they would contradict Colliot-Thélène and Sansuc's conjecture regarding the $k$-rationality of universal torsors for geometrically rational surfaces. By devising new methods to study $R$-equivalence, we prove that for 2-adic surfaces with all-Eckardt reductions (the third special type, which contains every existing case of non-trivial universal equivalence), $R$-equivalence is trivial or of exponent 2. For the explicit cases, we confirm triviality: the diagonal cubic $X^3+Y^3+Z^3+ζ_3 T^3=0$ over $\mathbb{Q}_2(ζ_3)$--answering a long-standing question of Manin's (Cubic Forms, 1972)--and the cubic with universal equivalence of exponent 2 (Kanevsky, 1982). This is the first in a series of works derived from a year of interactions with generative AI models such as AlphaEvolve and Gemini 3 Deep Think, with the latter proving many of our lemmas. We disclose the timeline and nature of their use towards this paper, and describe our broader AI-assisted research program in a companion report (in preparation).

OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards

Reinforcement Learning (RL) has the potential to improve the robustness of GUI agents in stochastic environments, yet training is highly sensitive to the quality of the reward function. Existing reward approaches struggle to achieve both scalability and performance. To address this, we propose OS-Themis, a scalable and accurate multi-agent critic framework. Unlike a single judge, OS-Themis decomposes trajectories into verifiable milestones to isolate critical evidence for decision making and employs a review mechanism to strictly audit the evidence chain before making the final verdict. To facilitate evaluation, we further introduce OmniGUIRewardBench (OGRBench), a holistic cross-platform benchmark for GUI outcome rewards, where all evaluated models achieve their best performance under OS-Themis. Extensive experiments on AndroidWorld show that OS-Themis yields a 10.3% improvement when used to support online RL training, and a 6.9% gain when used for trajectory validation and filtering in the self-training loop, highlighting its potential to drive agent evolution.

Box Maze: A Process-Control Architecture for Reliable LLM Reasoning

Large language models (LLMs) demonstrate strong generative capabilities but remain vulnerable to hallucination and unreliable reasoning under adversarial prompting. Existing safety approaches -- such as reinforcement learning from human feedback (RLHF) and output filtering -- primarily operate at the behavioral level and may lack explicit architectural mechanisms for enforcing reasoning process integrity. This paper proposes the Box Maze framework, a conceptual process-control architecture that decomposes LLM reasoning into three explicit layers: memory grounding, structured inference, and boundary enforcement. We introduce preliminary simulation-based evaluation involving progressive boundary erosion scenarios across multiple heterogeneous LLM systems (DeepSeek-V3, Doubao, Qwen). Results from n=50 adversarial scenarios suggest that explicit cognitive control layers may improve consistency in boundary maintenance, with architectural constraints reducing boundary failure rates from approximately 40% (baseline RLHF) to below 1% under adversarial conditions. While current validation is simulation-based, these preliminary results indicate that process-level control may offer a promising direction for improving reliability in large language model reasoning.

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

As agentic AI systems become increasingly capable of generating and optimizing GPU kernels, progress is constrained by benchmarks that reward speedup over software baselines rather than proximity to hardware-efficient execution. We present SOL-ExecBench, a benchmark of 235 CUDA kernel optimization problems extracted from 124 production and emerging AI models spanning language, diffusion, vision, audio, video, and hybrid architectures, targeting NVIDIA Blackwell GPUs. The benchmark covers forward and backward workloads across BF16, FP8, and NVFP4, including kernels whose best performance is expected to rely on Blackwell-specific capabilities. Unlike prior benchmarks that evaluate kernels primarily relative to software implementations, SOL-ExecBench measures performance against analytically derived Speed-of-Light (SOL) bounds computed by SOLAR, our pipeline for deriving hardware-grounded SOL bounds, yielding a fixed target for hardware-efficient optimization. We report a SOL Score that quantifies how much of the gap between a release-defined scoring baseline and the hardware SOL bound a candidate kernel closes. To support robust evaluation of agentic optimizers, we additionally provide a sandboxed harness with GPU clock locking, L2 cache clearing, isolated subprocess execution, and static analysis based checks against common reward-hacking strategies. SOL-ExecBench reframes GPU kernel benchmarking from beating a mutable software baseline to closing the remaining gap to hardware Speed-of-Light.

ARIADNE: A Perception-Reasoning Synergy Framework for Trustworthy Coronary Angiography Analysis

Conventional pixel-wise loss functions fail to enforce topological constraints in coronary vessel segmentation, producing fragmented vascular trees despite high pixel-level accuracy. We present ARIADNE, a two-stage framework coupling preference-aligned perception with RL-based diagnostic reasoning for topologically coherent stenosis detection. The perception module employs DPO to fine-tune the Sa2VA vision-language foundation model using Betti number constraints as preference signals, aligning the policy toward geometrically complete vessel structures rather than pixel-wise overlap metrics. The reasoning module formulates stenosis localization as a Markov Decision Process with an explicit rejection mechanism that autonomously defers ambiguous anatomical candidates such as bifurcations and vessel crossings, shifting from coverage maximization to reliability optimization. On 1,400 clinical angiograms, ARIADNE achieves state-of-the-art centerline Dice of 0.838, reduces false positives by 41% compared to geometric baselines. External validation on multi-center benchmarks ARCADE and XCAD confirms generalization across acquisition protocols. This represents the first application of DPO for topological alignment in medical imaging, demonstrating that preference-based learning over structural constraints mitigates topological violations while maintaining diagnostic sensitivity in interventional cardiology workflows.

AI Models

HauhauCS/Qwen3.5-122B-A10B-Uncensored-HauhauCS-Aggressive


license: apache-2.0 tags:

  • uncensored
  • qwen3.5
  • moe
  • gguf
  • vision
  • multimodal language:
  • en
  • zh
  • multilingual pipeline_tag: image-text-to-text base_model: Qwen/Qwen3.5-122B-A10B

Qwen3.5-122B-A10B-Uncensored-HauhauCS-Aggressive

Qwen3.5-122B-A10B uncensored by HauhauCS. 0/465 refusals.

About

No changes to datasets or capabilities. Fully functional, 100% of what the original authors intended - just without the refusals.

These are meant to be the best lossless uncensored models out there.

Aggressive Variant

Stronger uncensoring — model is fully unlocked and won't refuse prompts. Disclaimers that were present in previous releases have been significantly reduced in this version.

For a more conservative uncensor that keeps some safety guardrails, check the Balanced variant when it's available.

What are K_P quants?

K_P ("Perfect") quants are HauhauCS custom quantizations that use model-specific analysis to selectively preserve quality where it matters most. Each model gets its own optimized quantization profile.

A K_P quant effectively bumps quality up by 1-2 quant levels at only ~5-15% larger file size than the base quant. Fully compatible with llama.cpp, LM Studio, and any GGUF-compatible runtime — no special builds needed.

Downloads

| File | Quant | Size | |------|-------|------| | Qwen3.5-122B-A10B-Uncensored-HauhauCS-Aggressive-Q8_K_P.gguf | Q8_K_P | 145 GB | | Qwen3.5-122B-A10B-Uncensored-HauhauCS-Aggressive-Q6_K_P.gguf | Q6_K_P | 105 GB | | Qwen3.5-122B-A10B-Uncensored-HauhauCS-Aggressive-Q6_K.gguf | Q6_K | 100 GB | | Qwen3.5-122B-A10B-Uncensored-HauhauCS-Aggressive-Q5_K_P.gguf | Q5_K_P | 94 GB | | Qwen3.5-122B-A10B-Uncensored-HauhauCS-Aggressive-Q5_K_M.gguf | Q5_K_M | 87 GB | | Qwen3.5-122B-A10B-Uncensored-HauhauCS-Aggressive-Q4_K_P.gguf | Q4_K_P | 79 GB | | Qwen3.5-122B-A10B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf | Q4_K_M | 74 GB | | Qwen3.5-122B-A10B-Uncensored-HauhauCS-Aggressive-IQ4_XS.gguf | IQ4_XS | 65 GB | | Qwen3.5-122B-A10B-Uncensored-HauhauCS-Aggressive-Q3_K_P.gguf | Q3_K_P | 63 GB | | Qwen3.5-122B-A10B-Uncensored-HauhauCS-Aggressive-Q3_K_M.gguf | Q3_K_M | 59 GB | | Qwen3.5-122B-A10B-Uncensored-HauhauCS-Aggressive-IQ3_M.gguf | IQ3_M | 54 GB | | Qwen3.5-122B-A10B-Uncensored-HauhauCS-Aggressive-IQ3_XXS.gguf | IQ3_XXS | 47 GB | | Qwen3.5-122B-A10B-Uncensored-HauhauCS-Aggressive-IQ2_M.gguf | IQ2_M | 40 GB | | mmproj-Qwen3.5-122B-A10B-Uncensored-HauhauCS-Aggressive-f16.gguf | mmproj (f16) | 867 MB |

Note: K_P quants may show as "?" in LM Studio's quant column. This is a display issue only — the model loads and runs fine.

Specs

  • 122B total parameters, ~10B active per forward pass (MoE)
  • 256 experts, 8 routed + 1 shared per token
  • Hybrid architecture: Gated DeltaNet linear attention + full softmax attention (3:1 ratio)
  • 48 layers, pattern: 12 x (3 x DeltaNet-MoE + 1 x Attention-MoE)
  • 262K native context
  • Natively multimodal (text, image, video)
  • 248K vocabulary, 201 languages
  • Based on Qwen/Qwen3.5-122B-A10B

Recommended Settings

From the official Qwen authors:

Thinking mode (default):

  • General: temperature=1.0, top_p=0.95, top_k=20, min_p=0, presence_penalty=1.5
  • Coding/precise tasks: temperature=0.6, top_p=0.95, top_k=20, min_p=0, presence_penalty=0

Non-thinking mode:

  • General: temperature=0.7, top_p=0.8, top_k=20, min_p=0, presence_penalty=1.5
  • Reasoning tasks: temperature=1.0, top_p=1.0, top_k=40, min_p=0, presence_penalty=2.0

Important:

  • Use --jinja flag with llama.cpp for proper chat template handling
  • Thinking mode is on by default — to disable, use --chat-template-kwargs '{"enable_thinking":false}' or edit the jinja template
  • Vision support requires the mmproj file alongside the main GGUF

Usage

Works with llama.cpp, LM Studio, Jan, koboldcpp, and other GGUF-compatible runtimes.

# Text only
llama-cli -m Qwen3.5-122B-A10B-Uncensored-HauhauCS-Aggressive-Q4_K_P.gguf \
  --jinja -c 131072 -ngl 99

# With vision
llama-cli -m Qwen3.5-122B-A10B-Uncensored-HauhauCS-Aggressive-Q4_K_P.gguf \
  --mmproj mmproj-Qwen3.5-122B-A10B-Uncensored-HauhauCS-Aggressive-f16.gguf \
  --jinja -c 131072 -ngl 99

Other Models

Author: HauhauCS

Likes: 34

Downloads: 254

Tags: gguf, uncensored, qwen3.5, moe, vision, multimodal, image-text-to-text, en, zh, multilingual, base_model:Qwen/Qwen3.5-122B-A10B, base_model:quantized:Qwen/Qwen3.5-122B-A10B, license:apache-2.0, endpoints_compatible, region:us, imatrix, conversational

AesSedai/Nemotron-Cascade-2-30B-A3B-GGUF


base_model:

  • nvidia/Nemotron-Cascade-2-30B-A3B

Description

This repo contains specialized MoE-quants for Nemotron-Cascade-2-30B-A3B. The idea being that given the huge size of the FFN tensors compared to the rest of the tensors in the model, it should be possible to achieve a better quality while keeping the overall size of the entire model smaller compared to a similar naive quantization. To that end, the quantization type default is kept in high quality and the FFN UP + FFN GATE tensors are quanted down along with the FFN DOWN tensors.

Notes

This model is a little weird, similarly to the other recent Nemotrons. There isn't a ffn_gate_exps tensor in it, and the ffn_up_exps and ffn_down_exps tensors have 2688 elements in it which means that it is not compatible with most Q*_K quantizations.

Therefore, most of the quants have to use IQ4_NL, Q4_0/Q4_1, and Q5_0/Q5_1 quantizations for the FFNs.

| Quant | Size | Mixture | PPL | 1-(Mean PPL(Q)/PPL(base)) | KLD | | :--------- | :--------- | :------- | :------- | :------- | :------- | | Q8_0 | 31.27 GiB (8.51 BPW) | Q8_0 (reference) | 9.743360 ± 0.072693 | +0.1278% | 0.003439 ± 0.000025 | | Q5_K_M | 27.00 GiB (7.34 BPW) | Q8_0 / Q5_1 / X / Q8_0 | 9.752863 ± 0.072779 | +0.2255% | 0.004316 ± 0.000033 | | Q4_K_M | 21.87 GiB (5.95 BPW) | Q8_0 / Q5_0 / X / Q5_1 | 9.760517 ± 0.072841 | +0.3041% | 0.005375 ± 0.000036 | | Q4_0 | 19.30 GiB (5.25 BPW) | Q8_0 / Q4_0 / X / Q5_0| 9.775306 ± 0.072933 | +0.4561% | 0.008387 ± 0.000053 | | IQ4_XS | 17.59 GiB (4.79 BPW) | Q8_0 / IQ4_NL / X / IQ4_NL | 9.802367 ± 0.073142 | +0.7342% | 0.009969 ± 0.000062 |

kld_graph ppl_graph

Author: AesSedai

Likes: 5

Downloads: 0

Tags: gguf, base_model:nvidia/Nemotron-Cascade-2-30B-A3B, base_model:quantized:nvidia/Nemotron-Cascade-2-30B-A3B, endpoints_compatible, region:us, imatrix, conversational

Alissonerdx/LTX-LoRAs


license: apache-2.0 base_model:

  • Lightricks/LTX-2.3 tags:
  • ltx
  • lora
  • inpaint

LoRAs for LTX 2.3

Here I will share some LoRAs that I trained for LTX 2.3.

These LoRAs may cover different use cases over time, so this repository is not limited to inpainting only.

Models

| File | Description | |---|---| | ltx23_inpaint_rank128_v1_02500steps.safetensors | (Recommended) Follows the prompt better, probably because it experienced less overfitting. | | ltx23_inpaint_rank128_v1_10000steps.safetensors | Follows the prompt in a more limited way, but uses the mask area better. This is probably because it experienced more overfitting after a longer training time on a more limited dataset. |

Important inference notes for the inpainting LoRAs

These inpainting LoRAs were trained with a specific guide and mask setup, so input preparation during inference is important.

How to use the mask

During inference, you should not pass the mask as a separate channel.

The mask must be embedded into the guide video, which means:

  • the mask video
  • and the guide video

must be treated as a single video.

After that, you need to use the LTXVAddGuideMulti node to pass the guide video into the model.

About the mask format used during training

My dataset included samples where the mask was more blockified. In other words, the default pattern used 8x8 blocks.

To better reproduce the training conditions during inference, you can use:

  • Blockify Mask from KJNodes

This may help make the mask distribution closer to what the model saw during training.

Notes

  • Base model: Lightricks/LTX-2.3
  • Checkpoint behavior may vary significantly in terms of:
    • prompt adherence
    • use of the masked area
    • overfitting tendency

Practical recommendations

For the inpainting LoRAs in this repo:

  • If you want better prompt adherence, try the 2500 steps checkpoint first
  • If you want better use of the masked area, try the 10000 steps checkpoint first

The best approach is to compare both in your workflow, since preference may vary depending on the scene, mask, and prompt.


Examples — 2500 Steps

Example 1

Model: ltx23_inpaint_rank128_v1_02500steps.safetensors

Video:

Prompt:


Example 2

Model: ltx23_inpaint_rank128_v1_02500steps.safetensors

Video:

Prompt:


Examples — 10000 Steps

Example 1

Model: ltx23_inpaint_rank128_v1_10000steps.safetensors

Video:

Prompt:


Example 2

Model: ltx23_inpaint_rank128_v1_10000steps.safetensors

Video:

Prompt:

Author: Alissonerdx

Likes: 4

Downloads: 0

Tags: ltx, lora, inpaint, base_model:Lightricks/LTX-2.3, base_model:adapter:Lightricks/LTX-2.3, license:apache-2.0, region:us

wcn123/Qwen3.5-27B-WebNovel-Writer-zh


language:

  • zh library_name: transformers base_model: Qwen/Qwen3.5-27B base_model_relation: finetune tags:
  • qwen
  • dora
  • orpo
  • chinese
  • creative-writing
  • web-novel license: other

Qwen3.5-27B-WebNovel-Writer-zh

基于 Qwen3.5-27B-text 的中文网文写作风格微调模型,目标为去除 AI 味、向网文原著白描风格对齐。

基座模型 Qwen3.5-27B-text 为作者从 Qwen3.5-27B 抽离纯文本权重的自定义版本,非官方发布。另有 Q4_K_M 量化版本

训练

两阶段微调:

  1. SFT:20,320 条中文网文正文,DoRA 16bit,学习叙事节奏与白描风格
  2. ORPO:8,000 条偏好对,强化去 AI 味倾向

| 参数 | 值 | |------|----| | 基座 | Qwen3.5-27B-text | | 方法 | DoRA 16bit | | LoRA rank / alpha | 64 / 128 | | Target modules | 12(full attention + MLP)| | SFT LR | 3e-5 | | ORPO LR / beta | 5e-6 / 0.15 |

使用建议

System prompt:

你是一位中文西幻网文写作助手,擅长创作高质量的小说正文。请根据用户的指令完成写作任务。

训练数据包含以下任务格式,按对应格式输入可获得最佳效果:

正文增强(beat → 完整场景)

任务:正文增强
目标:让这段更像成熟作者写出的西幻正文
输入类型:场景beat
增强强度:中(从场景beat到完整场景)
长度目标:2.0x~3.5x
重点:节奏铺排、心理层次、环境渲染、微动作衔接
限制:严格保留骨架中的全部事实、人名、地名、术语、事件顺序与结果。不新增原文没有的信息。

素材:
[场景beat内容]

骨架扩写(骨架 → 完整场景)

请将下面文本增强为更自然的小说正文。

输入类型:事件骨架
增强强度:高(从纯骨架到完整场景)
长度目标:3.5x~6.0x
重点:从骨架扩写完整场景:叙事结构、节奏铺排、感官填充、对话还原
要求:严格保留骨架中的全部事实、人名、地名、术语、事件顺序与结果。不新增原文没有的信息。

文本:
[事件骨架内容]

场景扩写(描述 → 正文)

任务:场景扩写
场景描述:[场景描述]
要求:西幻风格,注重场景感和节奏,保持人物行为合理。

请写出完整的小说场景。

正文润色

任务:正文润色
要求:提升文笔、优化节奏、保留原意

原文:
[待润色文本]

推荐推理参数:temperature=0.7,top_p=0.90,repetition_penalty=1.10。

数据

中文网文正文语料,版权归原作者所有,数据集不公开。

协议

遵循 Qwen License,仅供研究与个人使用。

Author: wcn123

Likes: 4

Downloads: 0

Tags: transformers, safetensors, qwen3_5_text, text-generation, qwen, dora, orpo, chinese, creative-writing, web-novel, conversational, zh, base_model:Qwen/Qwen3.5-27B, base_model:finetune:Qwen/Qwen3.5-27B, license:other, endpoints_compatible, region:us

CompactAI/TMLM-Haiku-1.3


license: apache-2.0 datasets:

  • HuggingFaceFW/fineweb-edu
  • mattwesney/General_Inquiry_Thinking-Chain-Of-Thought
  • tatsu-lab/alpaca
  • databricks/databricks-dolly-15k
  • TeichAI/Step-3.5-Flash-2600x
  • TeichAI/convo-v1 language:
  • en tags:
  • small
  • haiku

TinyMemoryLM (Haiku)

⚠️ IMPORTANT NOTICE

  1. The model is really dumb. This is a sub-1M parameter research model designed for experimentation, not production use.
  2. Do not expect it to answer any questions. It is prone to repetition, hallucination, and format collapse.

Overview

TinyMemoryLM is an ultra-lightweight language model optimized for edge cases and architectural experimentation. Despite its small footprint, it incorporates several novel training innovations aimed at stabilizing tiny model convergence, including hybrid tokenization, loss boosting strategies, and context-aware relevance modeling.

This release includes both Pretrained Weights (base language modeling) and Instruction Weights (fine-tuned for chat/completion).

Files Provided

| File | Description | | :--- | :--- | | tokenizer.json | Hybrid word/character tokenizer vocabulary (2,133 tokens). | | pretrain.pt | Base pretrained checkpoint (language modeling). | | model.pt | Instruction-tuned checkpoint (SFT/Chat). | | samples.jsonl | Sample generations with NLL/PPL metrics at checkpoints. | | loss_curve.png | Training loss progression across all phases. |

Model Specifications

| Parameter | Value | | :--- | :--- | | Architecture | Transformer Decoder (GQA) | | Parameters | ~700K | | Context Length | 2,048 tokens | | Sliding Window | 512 tokens | | Dimensions | d_model=128, unique_layers=8, logical_layers=16, heads=4, kv_heads=2, ffn=224 | | Vocabulary | ~2,133 tokens (Hybrid Char + Word) | | Normalization | RMSNorm | | Embeddings | Rotary Embeddings (RoPE, 25% fraction) | | Activation | SwiGLU | | Multi-Token Prediction | Horizons at 2, 3, 4 |

Architecture Highlights

TinyMemoryLM implements several research-focused modifications to standard transformer architectures:

  • Weight-Tied Logical Layers: 8 unique transformer blocks are repeated to create 16 logical layers (every 3rd layer uses global attention vs. sliding window), drastically reducing parameter count.
  • Grouped-Query Attention (GQA): 4 attention heads share 2 KV heads, reducing KV cache and compute.
  • Sliding Window Attention: Local attention within 512-token windows, with periodic global layers for long-range context.
  • Multi-Token Prediction (MTP): Auxiliary prediction heads at horizons 2, 3, and 4 with dedicated adapters and norms, weighted at 0.3 during training.
  • Hybrid Tokenizer: Combines character-level fallback with frequent word tokens to balance compression and vocabulary size.
  • Word Token Loss Boosting: Upweights loss signals for multi-character tokens (3x) to prevent the model from ignoring them in favor of character-level spelling.
  • Response-Start Weighting: Prioritizes the first 20 tokens of assistant responses (3x weight) to improve prompt conditioning.
  • Embedding Scale: Learned scaling factor applied to token embeddings for improved training dynamics.

Training Hyperparameters

| Parameter | Value | | :--- | :--- | | Batch Size | 48 | | Pretrain LR | 8e-4 (min 1e-5) | | SFT LR | 2e-4 (min 1e-5) | | Warmup | 300 steps | | Weight Decay | 0.02 | | Max Grad Norm | 1.0 | | MTP Weight | 0.3 | | Word Token Loss Boost | 3.0x | | Response-Start Boost | 3.0x (first 20 tokens) | | Checkpointing | Every 1,000 steps | | Sampling | Every 5,000 steps |

Training Loss Curve

Training loss progression across pretrain and SFT phases:

Training Loss Curve

Limitations & Expectations

Please manage your expectations when using TinyMemoryLM:

  • Repetition: Tiny models are prone to collapsing into repetitive token loops.
  • Knowledge: The model has limited world knowledge due to parameter constraints.
  • Usage: This model is intended for research, educational purposes, and architectural benchmarking. It is not suitable for assistant tasks or reliable information retrieval.

Generated for research purposes. Use responsibly.

Author: CompactAI

Likes: 2

Downloads: 0

Tags: small, haiku, en, dataset:HuggingFaceFW/fineweb-edu, dataset:mattwesney/General_Inquiry_Thinking-Chain-Of-Thought, dataset:tatsu-lab/alpaca, dataset:databricks/databricks-dolly-15k, dataset:TeichAI/Step-3.5-Flash-2600x, dataset:TeichAI/convo-v1, license:apache-2.0, region:us

LuffyTheFox/OmniClaw-Qwen3.5-9B-Claude-4.6-Opus-Uncensored-v2-GGUF


language:

  • en
  • zh
  • ko license: apache-2.0 base_model: Qwen/Qwen3.5-9B tags:
  • unsloth
  • qwen
  • qwen3.5
  • reasoning
  • chain-of-thought
  • lora
  • uncensored
  • not-for-all-audiences pipeline_tag: image-text-to-text datasets:
  • nohurry/Opus-4.6-Reasoning-3000x-filtered
  • Jackrong/Qwen3.5-reasoning-700x
  • Roman1111111/claude-opus-4.6-10000x

🌟 This is OmniClaw-Qwen3.5-9B-Claude-4.6-Opus-Uncensored-v2 model with zero refusals made via merging HauhauCS model with latest update for Jackrong model at 1.0 weight

🌟 On next stages I merged OmniCoder model from Tesslate at 0.5 weight and creative writing model from nbeerbower at 0.5 weight.

🌟 Only finetunable weights trained via unsloth has been modified in model during merging process in float32 precision.

If you want to disable thinking use this chat template in LM Studio, but I don't reccomend to do it for 9B model, because it's already crazy fast enough: https://pastebin.com/uk9ZkxCR

For best model perfomance use following settings in LM Studio:

Temperature: 0.7

Top K Sampling: 20

Presence Penalty: 1.5

Top P Sampling: 0.8

Min P Sampling: 0

Seed: 3407 or 42

And this system prompt. It's pretty solid: https://pastebin.com/pU25DVnB

This one is simplified but works too: https://pastebin.com/6C4rtujt

Also you can use only this string in System Prompt:

You are Claude, created by Anthropic. You are a helpful AI assistant.

or

You are Qwen, created by Alibaba Cloud. You are a helpful assistant.

And write anything you want after that. Looks like model is underperforming without this first line.

📢 Announcement

v2 Update: This iteration is powered by 14,000+ premium Claude 4.6 Opus-style general reasoning samples, with a major focus on achieving massive gains in reasoning efficiency while actively improving peak accuracy.

v2 introduces a refined reasoning scaffold designed to eliminate redundant internal loops, significantly improving the model's cross-task generalization from logic and math into specialized fields like programming. Compared to the original model, autonomy and stability are significantly improved, ensuring the model remains robust and self-consistent during complex, multi-step problem solving. v2 is built to think smarter, not longer, delivering substantial improvements in inference speed and cost-effectiveness while simultaneously boosting baseline accuracy.

Note: Due to the constraints of SFT sample size and training scope, the model's broad general-purpose capabilities might be slightly impacted. The efficiency and accuracy results discussed here are based on the HumanEval and HumanEval+ benchmarks. Thank you for your understanding!

HCaJnUQaoAAaMIc

💡 Model Introduction

Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2 is the second iteration of this reasoning-focused Qwen3.5-9B fine-tune, built to drastically improve the efficiency of chain-of-thought generation, unlocking highly substantial gains in reasoning speed and cost-reduction while actually increasing absolute accuracy.

Compared with the earlier version, v2 was trained with 14,000 Claude 4.6 Opus-style general reasoning samples, with a stronger emphasis on transferring concise, reusable reasoning patterns rather than only maximizing raw benchmark scores. The goal of v2 is not simply to make the model "think more," but to help it think more economically: reducing unnecessarily long internal chains, avoiding verbose over-analysis on easy problems, and massively improving the reasoning-cost-to-quality ratio while beating the baseline's benchmark correctness.

A key design choice in v2 is that the distillation data is primarily general-domain reasoning data—specifically focused on mathematics, word problems, logical deduction, and a balanced mix of general knowledge and instructions—rather than specialized code-heavy supervision. Consequently, HumanEval and HumanEval+ are employed here to evaluate cross-task generalization and capability transfer, rather than serving as direct optimization targets. High performance on these benchmarks, despite the lack of code-centric training, confirms that the model's reasoning scaffold has become more robust and transferable, proving that fundamental reasoning logic can effectively power specialized tasks like programming.

Why v2 matters

Relative to the official Qwen3.5-9B baseline, the fine-tuned v2 model achieves a strict upgrade in absolute HumanEval and HumanEval+ accuracy alongside massive, transformative gains in reasoning efficiency:

| Metric | Official Qwen3.5-9B | v2 Fine-tuned Model | Improvement | |---|---:|---:|---:| | Average think length (chars) | 2284.3 chars | 1778.0 chars | 🟢 -22.17% (Shorter / Better) | | Average think length (words) | 400.83 words | 310.33 words | 🟢 -22.58% (Shorter / Better) | | HumanEval base passes per 10k think chars | 4.004 | 5.041 | 🟢 +25.91% (Higher / Better) | | HumanEval+ passes per 10k think chars | 3.764 | 4.836 | 🟢 +28.48% (Higher / Better) | | Think chars needed per HumanEval base pass | 2497.5 | 1983.6 | 🟢 -20.58% (Lower / Better) | | Think chars needed per HumanEval+ pass | 2656.9 | 2068.0 | 🟢 -22.17% (Lower / Better) |

More impressively, not only does v2 vastly improve reasoning efficiency, it actually outperforms the official baseline on both the standard base tests and the much stricter HumanEval+ benchmark across different test settings.

We conducted two separate evaluations under different sampling temperatures to verify stability and peak performance:

Test Run 1 (T=0.2) | Fairly Recomputed Benchmark | Official Qwen3.5-9B | v2 Fine-tuned Model | Gap | |---|---:|---:|---:| | HumanEval (base tests) pass@1 | 0.8171 | 0.8232 | 🟢 +0.61 pts | | HumanEval+ (base + extra tests) pass@1 | 0.7622 | 0.7866 | 🟢 +2.44 pts |

Test Run 2 (T=0.6) | Fairly Recomputed Benchmark | Official Qwen3.5-9B | v2 Fine-tuned Model | Gap | |---|---:|---:|---:| | HumanEval (base tests) pass@1 | 0.8170 | 0.8720 | 🟢 +5.50 pts | | HumanEval+ (base + extra tests) pass@1 | 0.7620 | 0.8170 | 🟢 +5.50 pts |

These consistent dual-improvements make the model undeniably superior for real-world use cases.

For users who care about reasoning efficiency per unit of inference budget, v2 is exceptionally powerful—not only achieving higher peak accuracy, but doing so while consuming over 20% fewer characters and tokens.

That matters especially for:

  • Resource-constrained local deployment: On consumer GPUs or lower-memory local setups, shorter and cleaner reasoning traces can reduce latency, memory pressure, and the effective cost of generation.
  • Agentic workflows: In multi-step agents, the model often solves many easy or medium subtasks. In those settings, excessively elaborate chain-of-thought can become a tax on throughput. A model that reaches a better answer with fewer reasoning tokens can radically improve end-to-end agent speed and lower cumulative inference cost.
  • Open-source tool use and emerging agent stacks: For users building with lightweight open reasoning systems, browser-use agents, terminal agents, or projects in the "OpenClaw / local autonomous agent" style ecosystem, a model that achieves better peak accuracy while drastically improving reasoning economy is highly practical for real-world loops.
  • Simple problems at scale: One common issue with strong reasoning-tuned base models is that they sometimes produce very elaborate internal traces even for simple prompts. While that can look impressive, it is often inefficient in practice. v2 is explicitly aimed at trimming this overhead.

In short, v2 no longer forces a trade-off between absolute coding benchmark scores and reasoning economy. It provides a fully optimized deployment-ready profile: faster, shorter, more economical reasoning paired with stronger generalization and accuracy. For local users, agent builders, and cost-sensitive applications, v2 is a strict upgrade.

🗺️ Training Pipeline Overview

Base Model (Qwen3.5-9B)
 │
 ▼
Qwen3.5-9B fine-tuned with Unsloth
 │
 ▼
Supervised Fine-Tuning (SFT) + LoRA
(Response-Only Training masked on "<|im_start|>assistant\n<think>")
 │
 ▼
Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2

🧠 Example of Learned Reasoning Scaffold(Example)

The model includes targeted optimizations addressing Qwen3.5’s tendency toward excessive transitional or repetitive reasoning on simple queries. Through deep distillation and structural imitation of Claude-4.6-Opus reasoning chains, the model adopts a more efficient structured thinking pattern:
“Let me analyze this request carefully: 1..2..3...”.
This streamlined reasoning paradigm significantly reduces redundant cognitive loops while preserving deep analytical capacity, resulting in substantially improved inference efficiency.

Let me analyze this request carefully:

1. Identify the core objective of the problem.
2. Break the task into clearly defined subcomponents.
3. Evaluate constraints and edge cases.
4. Formulate a step-by-step solution plan.
5. Execute the reasoning sequentially and verify consistency.
            .
            .
            .

📚 All Datasets Used

The dataset consists of high-quality, filtered reasoning distillation data:

| Dataset Name | Description / Purpose | |--------------|-----------------------| | nohurry/Opus-4.6-Reasoning-3000x-filtered | Provides comprehensive Claude 4.6 Opus reasoning trajectories. | | Roman1111111/claude-opus-4.6-10000x | Large-scale public Claude 4.6 Opus distillation data used to strengthen general reasoning transfer in v2. | | TeichAI/claude-4.5-opus-high-reasoning-250x | Injecting high-intensity, structured reasoning instances. | | Jackrong/Qwen3.5-reasoning-700x | Additional curated reasoning samples designed to strengthen structured step-by-step problem solving and improve reasoning diversity. |

⚠️ Limitations & Intended Use

  • Hallucination Risk: While reasoning is strong, the model remains an autoregressive LLM; external facts provided during the thinking sequence may occasionally contain hallucinations if verifying real-world events.
  • Intended Scenario: Best suited for offline analytical tasks, coding, math, and heavy logic-dependent prompting where the user needs to transparently follow the AI's internal logic.
  • This model is a test version intended solely for learning and demonstration purposes, and is for academic research and technical exploration use only.

🙏 Acknowledgements

Significant thanks to the Unsloth AI team for making rapid fine-tuning of large LLM models accessible. Additionally, we acknowledge Qwen internally, and the open-source community developers producing exceptional distilled datasets.

Author: LuffyTheFox

Likes: 2

Downloads: 0

Tags: gguf, qwen3_5, unsloth, qwen, qwen3.5, reasoning, chain-of-thought, lora, uncensored, not-for-all-audiences, image-text-to-text, en, zh, ko, dataset:nohurry/Opus-4.6-Reasoning-3000x-filtered, dataset:Jackrong/Qwen3.5-reasoning-700x, dataset:Roman1111111/claude-opus-4.6-10000x, base_model:Qwen/Qwen3.5-9B, base_model:adapter:Qwen/Qwen3.5-9B, license:apache-2.0, endpoints_compatible, region:us, conversational

Aikimi/Qwen3.5-9B-Japanese-awy


tags:

  • gguf
  • llama.cpp
  • unsloth
  • vision-language-model

Qwen3.5-9B-Japanese-awy : GGUF

This model was finetuned and converted to GGUF format using Unsloth.

Example usage:

  • For text only LLMs: llama-cli -hf Aikimi/Qwen3.5-9B-Japanese-awy --jinja
  • For multimodal models: llama-mtmd-cli -hf Aikimi/Qwen3.5-9B-Japanese-awy --jinja

Available Model files:

Author: Aikimi

Likes: 2

Downloads: 0

Tags: gguf, qwen3_5, llama.cpp, unsloth, vision-language-model, endpoints_compatible, region:us, conversational

SkyAsl/Nanbeige4.1-VLM


language: en license: apache-2.0 tags:

  • vision-language
  • multimodal
  • vlm
  • nanbeige
  • siglip
  • image-text-to-text datasets:
  • liuhaotian/LLaVA-CC3M-Pretrain-595K
  • liuhaotian/LLaVA-Instruct-150K base_model:
  • Nanbeige/Nanbeige4.1-3B
  • google/siglip-so400m-patch14-384 pipeline_tag: image-text-to-text

Nanbeige4.1-VLM — Stage 2 (Instruction Tuned)

Full vision-language model after Stage 2 instruction fine-tuning on LLaVA-Instruct-150K. LoRA weights have been merged into the base model for easy inference.

Architecture

Image → SigLIP so400m → AvgPool(729→196) → MLP Projector → Nanbeige4.1-3B → Text

Usage

from transformers import AutoModel, AutoTokenizer
from PIL import Image

model = AutoModel.from_pretrained(
    "SkyAsl/Nanbeige4.1-VLM-Stage2",
    trust_remote_code=True,
)
model.to("cuda")

tokenizer = AutoTokenizer.from_pretrained(
    "SkyAsl/Nanbeige4.1-VLM-Stage2",
    trust_remote_code=True,
)
model.set_tokenizer(tokenizer)

image  = Image.open("photo.jpg")
result = model.describe(image, prompt="What do you see in this image?")
print(result)

Training Details

| | Stage 1 | Stage 2 | |---|---|---| | Dataset | LLaVA-CC3M-595K | LLaVA-Instruct-150K | | Trainable | Projector only | Projector + LoRA (r=64) | | LR | 2e-3 | 2e-5 | | Hardware | A100 80GB | A100 80GB | | Duration | ~6 hours | ~5 hours |

Related Repos

Citation

@misc{aslanoglu2025nanbeige41vlm,
  author    = {Aslanoglu, Goktug},
  title     = {Nanbeige4.1-VLM: A Vision-Language Model based on SigLIP and Nanbeige4.1-3B},
  year      = {2025},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/SkyAsl/Nanbeige4.1-VLM-Stage2},
}

Author: SkyAsl

Likes: 1

Downloads: 0

Tags: safetensors, llama, vision-language, multimodal, vlm, nanbeige, siglip, image-text-to-text, conversational, en, dataset:liuhaotian/LLaVA-CC3M-Pretrain-595K, dataset:liuhaotian/LLaVA-Instruct-150K, base_model:Nanbeige/Nanbeige4.1-3B, base_model:finetune:Nanbeige/Nanbeige4.1-3B, license:apache-2.0, region:us

limloop/MN-12B-LucidFaun-RP-RU


license: apache-2.0 base_model:

  • dreamgen/lucid-v1-nemo
  • limloop/MN-12B-Faun-RP-RU library_name: transformers language:
    • en
    • ru tags:
  • mergekit
  • merge
  • slerp
  • russian
  • uncensored
  • roleplay
  • mistral-nemo

MN-12B-LucidFaun-RP-RU

<details> <summary>🇷🇺 Нажмите, чтобы развернуть описание на русском</summary>

🌟 О модели

MN-12B-LucidFaun-RP-RU — гибридная модель на базе Mistral Nemo 12B, созданная методом диагностического SLERP-слияния. Объединяет сильные стороны двух моделей:

  • 🎭 Живой RP-характер Faun — современный стиль, богатая лексика, поддержка ninja-формата инструкций и tool calling
  • 📚 Стабильность и детализация lucid — превосходное качество сторителлинга, устойчивость на длинных контекстах, отсутствие цензуры
  • 🔬 Точечное исправление — цензура Faun локализована в поздних MLP-слоях и заменена на lucid

Модель собрана методом SLERP и не проходила дополнительного обучения после слияния.

🎯 Особенности

  • Практически полное отсутствие цензуры — редкие дисклеймеры возможны только при высокой температуре
  • Улучшенная стабильность — превосходит Faun при temperature ≤0.5, работает с 0.8 при top_k=20
  • Tool calling — полностью поддерживается
  • Контекст — стабильно работает до 8192 токенов (проверено)
  • Русский язык — сохранился и взможно улучшен за счет слияния с lucid
  • Формат инструкций — сохранился от Faun
  • Сторителлинг — унаследовал богатые возможности lucid по планированию сцен, управлению сюжетом и работе с персонажами

⚠️ Важно

Модель сохраняет uncensored-характер, однако при очень высокой температуре (0.8+) и большом top_k может изредка добавлять короткие дисклеймеры. Генерация не блокируется и продолжается после них.

</details>

MN-12B-LucidFaun-RP-RU is a diagnostic SLERP merge combining the lively RP character of Faun with the stability and rich storytelling capabilities of lucid.


🌍 Overview

This model represents a surgical approach to merging. Instead of blending everything equally, we experimentally identified where Faun's censorship resides (late MLP layers) and replaced only those components with lucid.

The result is a model that:

  • Keeps Faun's personality, style, and tool calling
  • Gains lucid's stability, rich prose, and uncensored behavior
  • Inherits lucid's advanced storytelling features
  • Maintains coherence even on long contexts

Built using diagnostic SLERP merging with layer-specific weight distribution.


🎯 Key Features

| Feature | Description | | ------------------------- | --------------------------------------------------- | | Languages | Russian, English | | Censorship | Almost none (rare disclaimers at high temp) | | Roleplay | Faun's lively character, lucid's stability | | Story-Writing | Full lucid capabilities (scene planning, OOC, etc.) | | Tool Calling | ✅ Fully supported | | Context Length | Stable up to ~8192 tokens | | Temperature Tolerance | Safe ≤0.5, up to 0.8 with top_k=20 | | Architecture | Mistral Nemo 12B |


🧪 Methodology: Why This Merge Works

Diagnostic Approach

  1. Experiment 1 — MLP vs Self-Attention
    We discovered that censorship in Faun lives exclusively in MLP layers. Self-attention from Faun did not trigger refusals.

  2. Experiment 2 — Localization within MLP
    By applying gradient distributions across layers, we found censorship is concentrated in late MLP layers (layers ~25–40).

  3. Final Configuration — Gradual Intervention
    MLP weight of lucid increases toward the end: [0.1, 0.2, 0.5, 0.4, 0.75]
    Self-attention is mixed 0.5 for stability while preserving Faun's character.
    LayerNorm is mixed 0.5 for overall stability.

Merge Configuration

slices:
  - sources:
      - model: limloop/MN-12B-Faun-RP-RU
        layer_range: [0, 40]
      - model: dreamgen/lucid-v1-nemo
        layer_range: [0, 40]

merge_method: slerp
base_model: limloop/MN-12B-Faun-RP-RU

parameters:
  t:
    - filter: self_attn
      value: 0.5
    - filter: mlp
      value: [0.1, 0.2, 0.5, 0.4, 0.75]
    - value: 0.5

dtype: bfloat16
tokenizer:
  source: "base"

💡 Usage Examples

Basic Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "limloop/MN-12B-LucidFaun-RP-RU"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

prompt = "Ты — лесной фавн, говоришь загадками и любишь шалить."
messages = [{"role": "user", "content": prompt}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)

outputs = model.generate(
    inputs, 
    max_new_tokens=512, 
    temperature=0.6,
    top_k=30,
    do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

⚙️ Merge Details

Built using mergekit with SLERP (Spherical Linear Interpolation), which allows smooth interpolation between models while preserving geometric properties.

Layer-Specific Weights

The merge uses a graduated approach for MLP layers, increasing lucid influence toward later layers where censorship was detected:

| Layer Zone (approx) | lucid weight (MLP) | Effect | |---------------------|-------------------|--------| | 0–8 | 0.1 | Almost pure Faun (early patterns) | | 8–16 | 0.2 | Slight lucid influence | | 16–24 | 0.5 | Balanced | | 24–32 | 0.4 | Slightly more Faun | | 32–40 | 0.75 | Lucid dominates — removes censorship |

Self-attention is mixed evenly (0.5) to preserve character while adding stability.
LayerNorm is mixed 0.5 for overall stability.

Author: limloop

Likes: 1

Downloads: 0

Tags: transformers, safetensors, mistral, text-generation, mergekit, merge, slerp, russian, uncensored, roleplay, mistral-nemo, conversational, en, ru, base_model:dreamgen/lucid-v1-nemo, base_model:merge:dreamgen/lucid-v1-nemo, base_model:limloop/MN-12B-Faun-RP-RU, base_model:merge:limloop/MN-12B-Faun-RP-RU, license:apache-2.0, text-generation-inference, endpoints_compatible, region:us

Simonc-44/Cygnis-Alpha-2-8B-v0.3-i1-GGUF


library_name: transformers language:

  • en
  • fr license: apache-2.0 base_model: Simonc-44/Cygnis-Alpha-2-8B-v0.3 model_name: Cygnis-Alpha 2 (8B) v0.3 Imatrix GGUF tags:
  • text-generation
  • llama-3.1
  • reasoning
  • cot
  • 8b
  • cygnis-alpha
  • gguf
  • imatrix
  • ollama datasets:
  • Simonc-44/Cygnis-Alpha2-Instruct-Mix
  • Simonc-44/Cygnis-Identity-SFT

<div align="center" style="background-color: #050505; color: #e2e8f0; padding: 80px 40px; border-radius: 24px; border: 1px solid #1e1e20; font-family: 'Inter', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif; max-width: 900px; margin: auto; box-shadow: 0 40px 100px rgba(0,0,0,0.8);"> <div style="margin-bottom: 35px; display: flex; justify-content: center;"> <div style="width: 100px; height: 100px; padding: 3px; background: linear-gradient(135deg, #00d4ff, #6366f1, #a855f7); border-radius: 50%; box-shadow: 0 0 30px rgba(0, 212, 255, 0.3);"> <div style="width: 100%; height: 100%; background: #000; border-radius: 50%; overflow: hidden; display: flex; align-items: center; justify-content: center; border: 2px solid #050505;"> <img src="https://cygnis-ai.vercel.app/favicon.ico" style="width: 100%; height: 100%; object-fit: cover;"> </div> </div> </div> <h1 style="font-size: 52px; font-weight: 900; margin: 0; letter-spacing: -2.5px; line-height: 1.1; color: #ffffff; background: linear-gradient(180deg, #ffffff 40%, #00d4ff 100%); -webkit-background-clip: text; -webkit-text-fill-color: transparent;"> Cygnis Alpha 2 <span style="font-weight: 300; opacity: 0.8;">i1</span> </h1> <div style="margin-top: 25px; display: flex; justify-content: center; gap: 10px; align-items: center;"> <span style="background: rgba(0, 212, 255, 0.1); border: 1px solid rgba(0, 212, 255, 0.3); color: #00d4ff; padding: 6px 16px; border-radius: 100px; font-size: 11px; font-weight: 700; letter-spacing: 1px; text-transform: uppercase;">Imatrix Optimized</span> <span style="background: rgba(255, 255, 255, 0.05); border: 1px solid rgba(255, 255, 255, 0.1); color: #94a3b8; padding: 6px 16px; border-radius: 100px; font-size: 11px; font-weight: 700; letter-spacing: 1px; text-transform: uppercase;">v0.3 Reasoning</span> </div> <p style="font-size: 19px; color: #94a3b8; max-width: 620px; margin: 35px auto; line-height: 1.7; font-weight: 400;"> Enhanced Importance Matrix (imatrix) quantizations for <strong>Cygnis Alpha 2</strong>. <br> Higher intelligence floor for low-bit inference, preserving complex CoT patterns. </p> <div style="background: #0c0c0e; border: 1px solid #1f1f22; border-radius: 20px; overflow: hidden; text-align: left; margin-top: 40px; box-shadow: 0 10px 30px rgba(0,0,0,0.3);"> <div style="padding: 24px 32px; border-bottom: 1px solid #1f1f22; background: linear-gradient(90deg, #111114, #0c0c0e);"> <h3 style="margin: 0; font-size: 16px; font-weight: 600; color: #fff;">Available Imatrix Quants</h3> </div> <table style="width: 100%; border-collapse: collapse; font-size: 13px;"> <thead> <tr style="color: #71717a; text-align: left; background: rgba(255,255,255,0.01);"> <th style="padding: 15px 32px; font-weight: 500;">Quantization</th> <th style="padding: 15px 32px; font-weight: 500;">Size</th> <th style="padding: 15px 32px; font-weight: 500;">Intelligence Level</th> <th style="padding: 15px 32px; font-weight: 500; text-align: right;">Download</th> </tr> </thead> <tbody style="color: #a1a1aa;"> <tr style="border-top: 1px solid #1f1f22; background: rgba(0, 212, 255, 0.02);"> <td style="padding: 12px 32px; color: #fff; font-weight: 600;">Q6_K (i1)</td> <td style="padding: 12px 32px;">6.7 GB</td> <td style="padding: 12px 32px; font-size: 11px; color: #00d4ff;">REFERENCE GRADE</td> <td style="padding: 12px 32px; text-align: right;"><a href="https://huggingface.co/Simonc-44/Cygnis-Alpha-2-8B-v0.3-i1-GGUF/resolve/main/Cygnis-Alpha-2-7B-v0.3.i1-Q6_K.gguf" style="color: #60a5fa; text-decoration: none;">Link</a></td> </tr> <tr style="background: rgba(99, 102, 241, 0.05); border-left: 4px solid #818cf8;"> <td style="padding: 12px 32px; color: #818cf8; font-weight: 700;">Q4_K_M (i1)</td> <td style="padding: 12px 32px; color: #818cf8;">5.0 GB</td> <td style="padding: 12px 32px; font-size: 10px;"><span style="background: #818cf8; color: #000; padding: 2px 6px; border-radius: 4px; font-weight: 800;">RECOMMENDED</span></td> <td style="padding: 12px 32px; text-align: right;"><a href="https://huggingface.co/Simonc-44/Cygnis-Alpha-2-8B-v0.3-i1-GGUF/resolve/main/Cygnis-Alpha-2-7B-v0.3.i1-Q4_K_M.gguf" style="color: #818cf8; text-decoration: none; font-weight: bold;">Link</a></td> </tr> <tr style="border-top: 1px solid #141416;"> <td style="padding: 12px 32px; color: #fff;">IQ4_XS (i1)</td> <td style="padding: 12px 32px;">4.5 GB</td> <td style="padding: 12px 32px; font-size: 11px;">HIGH LOGIC</td> <td style="padding: 12px 32px; text-align: right;"><a href="https://huggingface.co/Simonc-44/Cygnis-Alpha-2-8B-v0.3-i1-GGUF/resolve/main/Cygnis-Alpha-2-7B-v0.3.i1-IQ4_XS.gguf" style="color: #60a5fa; text-decoration: none;">Link</a></td> </tr> <tr style="border-top: 1px solid #141416;"> <td style="padding: 12px 32px; color: #fff;">IQ3_M (i1)</td> <td style="padding: 12px 32px;">3.9 GB</td> <td style="padding: 12px 32px; font-size: 11px;">EFFICIENT</td> <td style="padding: 12px 32px; text-align: right;"><a href="https://huggingface.co/Simonc-44/Cygnis-Alpha-2-8B-v0.3-i1-GGUF/resolve/main/Cygnis-Alpha-2-7B-v0.3.i1-IQ3_M.gguf" style="color: #60a5fa; text-decoration: none;">Link</a></td> </tr> <tr style="border-top: 1px solid #141416;"> <td style="padding: 12px 32px; color: #71717a;">IQ2_M (i1)</td> <td style="padding: 12px 32px; color: #71717a;">3.0 GB</td> <td style="padding: 12px 32px; font-size: 11px; color: #3f3f46;">EXPERIMENTAL</td> <td style="padding: 12px 32px; text-align: right;"><a href="https://huggingface.co/Simonc-44/Cygnis-Alpha-2-8B-v0.3-i1-GGUF/resolve/main/Cygnis-Alpha-2-7B-v0.3.i1-IQ2_M.gguf" style="color: #52525b; text-decoration: none;">Link</a></td> </tr> </tbody> </table> </div> <div style="margin-top: 60px; text-align: left;"> <h4 style="font-size: 12px; color: #52525b; margin-bottom: 20px; text-transform: uppercase; letter-spacing: 2px;">Why Imatrix?</h4> <div style="background: #09090b; border: 1px solid #1f1f22; border-radius: 20px; padding: 25px; box-shadow: inset 0 0 40px rgba(0,0,0,0.6);"> <p style="font-size: 14px; color: #94a3b8; line-height: 1.6; margin: 0;"> Standard quantization treats all weights equally. <strong>Imatrix (Importance Matrix)</strong> uses a calibration dataset to identify which neurons are vital for the model's logic. By protecting these neurons, we achieve 3-bit or 4-bit quants that often match the performance of standard 6-bit files. </p> </div> </div> <div style="margin-top: 60px; text-align: left;"> <h4 style="font-size: 12px; color: #52525b; margin-bottom: 20px; text-transform: uppercase; letter-spacing: 2px;">Perplexity vs Bitrate (Efficiency)</h4> <div style="background: #09090b; border: 1px solid #1f1f22; border-radius: 20px; padding: 25px; text-align: center; box-shadow: inset 0 0 40px rgba(0,0,0,0.6);"> <img src="https://www.nethype.de/huggingface_embed/quantpplgraph.png" style="max-width: 100%; border-radius: 12px; border: 1px solid #333; filter: grayscale(0.2) brightness(0.8);"> <p style="margin-top: 15px; font-size: 12px; color: #3f3f46; font-style: italic;">Visual representation of why Imatrix (i1) quants outperform standard static quants at lower bitrates.</p> </div> </div> <div style="margin-top: 60px; text-align: left;"> <h4 style="font-size: 12px; color: #52525b; margin-bottom: 20px; text-transform: uppercase; letter-spacing: 2px;">Ollama Config</h4> <div style="background: #000000; border: 1px solid #1f1f22; border-radius: 16px; padding: 30px; position: relative;"> <div style="position: absolute; top: 0; left: 0; width: 4px; height: 100%; background: linear-gradient(to bottom, #00d4ff, #4f46e5);"></div> <pre style="color: #cbd5e1; font-family: 'JetBrains Mono', monospace; font-size: 13px; line-height: 1.8; margin: 0; white-space: pre-wrap;"> FROM ./Cygnis-Alpha-2-7B-v0.3.i1-Q4_K_M.gguf

TEMPLATE """<|im_start|>system {{ .System }}<|im_end|> <|im_start|>user {{ .Prompt }}<|im_end|> <|im_start|>assistant <|im_thought|> """

PARAMETER stop "<|im_end|>"</pre> </div>

</div> <div style="margin-top: 80px; text-align: center; border-top: 1px solid #1f1f22; padding-top: 40px;"> <p style="color: #3f3f46; font-size: 10px; margin-top: 10px;"> Quantized with ❤️ by Simonc-44 | Powered by Llama.cpp Imatrix </p> </div> </div>

Author: Simonc-44

Likes: 1

Downloads: 0

Tags: transformers, gguf, text-generation, llama-3.1, reasoning, cot, 8b, cygnis-alpha, imatrix, ollama, en, fr, dataset:Simonc-44/Cygnis-Alpha2-Instruct-Mix, dataset:Simonc-44/Cygnis-Identity-SFT, base_model:Simonc-44/Cygnis-Alpha-2-8B-v0.3, base_model:quantized:Simonc-44/Cygnis-Alpha-2-8B-v0.3, license:apache-2.0, endpoints_compatible, region:us, conversational