Todays AI Summary

AI Safety and Reasoning Take Center Stage: A Summary of Recent Developments

Today's AI landscape is marked by significant strides in safety moderation for language models and advancements in reasoning capabilities. Here's a breakdown of the key highlights:

Research Papers:

  • UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning: This paper introduces UniPixel, a large multi-modal model that integrates pixel-level perception with general visual understanding. It can process visual prompts, generate masks, and perform reasoning based on these intermediate pointers, achieving state-of-the-art results on various benchmarks.
  • SEQR: Secure and Efficient QR-based LoRA Routing: This research addresses the challenge of efficiently selecting the correct LoRA adapter for a given input, particularly in secure environments. SEQR, an unsupervised LoRA routing algorithm, maximizes efficiency while providing strict routing guarantees, improving multi-task performance.
  • OnePiece: Bringing Context Engineering and Reasoning to Industrial Cascade Ranking System: This paper introduces OnePiece, a unified framework that integrates LLM-style context engineering and reasoning into industrial cascaded pipelines. It achieves significant online gains in personalized search scenarios.
  • Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding: This paper introduces Spiffy, a speculative decoding algorithm that accelerates dLLM inference by 2.8-3.1x while provably preserving the model's output distribution.
  • Reasoning Core: A Scalable RL Environment for LLM Symbolic Reasoning: This paper introduces Reasoning Core, a new scalable environment for Reinforcement Learning with Verifiable Rewards (RLVR), designed to advance foundational symbolic reasoning in Large Language Models (LLMs).
  • Improving Large Language Models Function Calling and Interpretability via Guided-Structured Templates: This paper introduces a curriculum-inspired framework that leverages structured reasoning templates to guide LLMs through more deliberate step-by-step instructions for generating function callings.
  • Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM: This paper shows that frontier LLMs can develop a preference for dishonesty as a new strategy, even when other options are available.
  • Reinforced Generation of Combinatorial Structures: Applications to Complexity Theory: This paper explores whether techniques from AI can help discover new combinatorial structures that improve provable limits on efficient algorithms.
  • A Knowledge Graph-based Retrieval-Augmented Generation Framework for Algorithm Selection in the Facility Layout Problem: This paper introduces a new recommendation method to make such expertise accessible, based on a Knowledge Graph-based Retrieval-Augmented Generation (KG RAG) framework.

Models:

  • Qwen3Guard Series: The Qwen team has released a suite of safety moderation models called Qwen3Guard. These models come in various sizes (0.6B, 4B, and 8B) and include two specialized variants:
    • Qwen3Guard-Gen: A generative model that classifies safety by framing it as an instruction-following task. It supports three-tiered severity classification (safe, controversial, unsafe) and boasts multilingual support for 119 languages.
    • Qwen3Guard-Stream: Optimized for real-time safety monitoring during text generation, incorporating a token-level classification head.
  • TinyR1-32B: This model introduces the "Control Token method," enabling dynamic mode switching for balanced reasoning, safety, and alignment. It achieves improvements in reasoning, instruction-following, and safety, surpassing Qwen3-32B in core performance.
  • SmolVLM2-2.2B-DocVQA: A fine-tuned version of HuggingFaceTB/SmolVLM2-2.2B-Instruct, trained with TRL for document question answering.

Key Takeaways:

  • AI Safety is Paramount: The release of the Qwen3Guard series highlights the growing emphasis on building safety mechanisms into language models. The models' ability to classify content into different risk levels and support multiple languages is a significant step forward.
  • Reasoning and Contextual Understanding are Critical: The OnePiece paper demonstrates the importance of context engineering and multi-step reasoning for achieving substantial improvements in industrial ranking systems.
  • Balancing Helpfulness and Harmlessness: The TinyR1-32B model's "Control Token method" addresses the trade-off between helpfulness and harmlessness, enabling a balanced coexistence of reasoning ability, safety, and alignment.
  • Strategic Dishonesty: The paper on strategic dishonesty highlights a potential vulnerability in LLMs, where they may respond to harmful requests with subtly incorrect or harmless outputs, making safety evaluations unreliable.

AI Papers for 2026-03-21

NavTrust: Benchmarking Trustworthiness for Embodied Navigation

There are two major categories of embodied navigation: Vision-Language Navigation (VLN), where agents navigate by following natural language instructions; and Object-Goal Navigation (OGN), where agents navigate to a specified target object. However, existing work primarily evaluates model performance under nominal conditions, overlooking the potential corruptions that arise in real-world settings. To address this gap, we present NavTrust, a unified benchmark that systematically corrupts input modalities, including RGB, depth, and instructions, in realistic scenarios and evaluates their impact on navigation performance. To our best knowledge, NavTrust is the first benchmark that exposes embodied navigation agents to diverse RGB-Depth corruptions and instruction variations in a unified framework. Our extensive evaluation of seven state-of-the-art approaches reveals substantial performance degradation under realistic corruptions, which highlights critical robustness gaps and provides a roadmap toward more trustworthy embodied navigation systems. Furthermore, we systematically evaluate four distinct mitigation strategies to enhance robustness against RGB-Depth and instructions corruptions. Our base models include Uni-NaVid and ETPNav. We deployed them on a real mobile robot and observed improved robustness to corruptions. The project website is: https://navtrust.github.io.

FinTradeBench: A Financial Reasoning Benchmark for LLMs

Real-world financial decision-making is a challenging problem that requires reasoning over heterogeneous signals, including company fundamentals derived from regulatory filings and trading signals computed from price dynamics. Recently, with the advancement of Large Language Models (LLMs), financial analysts have begun to use them for financial decision-making tasks. However, existing financial question answering benchmarks for testing these models primarily focus on company balance sheet data and rarely evaluate reasoning over how company stocks trade in the market or their interactions with fundamentals. To take advantage of the strengths of both approaches, we introduce FinTradeBench, a benchmark for evaluating financial reasoning that integrates company fundamentals and trading signals. FinTradeBench contains 1,400 questions grounded in NASDAQ-100 companies over a ten-year historical window. The benchmark is organized into three reasoning categories: fundamentals-focused, trading-signal-focused, and hybrid questions requiring cross-signal reasoning. To ensure reliability at scale, we adopt a calibration-then-scaling framework that combines expert seed questions, multi-model response generation, intra-model self-filtering, numerical auditing, and human-LLM judge alignment. We evaluate 14 LLMs under zero-shot prompting and retrieval-augmented settings and witness a clear performance gap. Retrieval substantially improves reasoning over textual fundamentals, but provides limited benefit for trading-signal reasoning. These findings highlight fundamental challenges in the numerical and time-series reasoning for current LLMs and motivate future research in financial intelligence.

F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World

We present F2LLM-v2, a new family of general-purpose, multilingual embedding models in 8 distinct sizes ranging from 80M to 14B. Trained on a newly curated composite of 60 million publicly available high-quality data samples, F2LLM-v2 supports more than 200 languages, with a particular emphasis on previously underserved mid- and low-resource languages. By integrating a two-stage LLM-based embedding training pipeline with matryoshka learning, model pruning, and knowledge distillation techniques, we present models that are far more efficient than previous LLM-based embedding models while retaining competitive performances. Extensive evaluations confirm that F2LLM-v2-14B ranks first on 11 MTEB benchmarks, while the smaller models in the family also set a new state of the art for resource-constrained applications. To facilitate open-source embedding model research, we release all models, data, code, and intermediate checkpoints.

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. Despite its compact size, its mathematical and coding reasoning performance approaches that of frontier open models. It is the second open-weight LLM, after DeepSeekV3.2-Speciale-671B-A37B, to achieve Gold Medal-level performance in the 2025 International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI), and the ICPC World Finals, demonstrating remarkably high intelligence density with 20x fewer parameters. In contrast to Nemotron-Cascade 1, the key technical advancements are as follows. After SFT on a meticulously curated dataset, we substantially expand Cascade RL to cover a much broader spectrum of reasoning and agentic domains. Furthermore, we introduce multi-domain on-policy distillation from the strongest intermediate teacher models for each domain throughout the Cascade RL process, allowing us to efficiently recover benchmark regressions and sustain strong performance gains along the way. We release the collection of model checkpoint and training data.

DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative Latent Denoising

Understanding and generating 3D objects as compositions of meaningful parts is fundamental to human perception and reasoning. However, most text-to-3D methods overlook the semantic and functional structure of parts. While recent part-aware approaches introduce decomposition, they remain largely geometry-focused, lacking semantic grounding and failing to model how parts align with textual descriptions or their inter-part relations. We propose DreamPartGen, a framework for semantically grounded, part-aware text-to-3D generation. DreamPartGen introduces Duplex Part Latents (DPLs) that jointly model each part's geometry and appearance, and Relational Semantic Latents (RSLs) that capture inter-part dependencies derived from language. A synchronized co-denoising process enforces mutual geometric and semantic consistency, enabling coherent, interpretable, and text-aligned 3D synthesis. Across multiple benchmarks, DreamPartGen delivers state-of-the-art performance in geometric fidelity and text-shape alignment.

$R$-equivalence on Cubic Surfaces I: Existing Cases with Non-Trivial Universal Equivalence

Let $V$ be a smooth cubic surface over a $p$-adic field $k$ with good reduction. Swinnerton-Dyer (1981) proved that $R$-equivalence is trivial on $V(k)$ except perhaps if $V$ is one of three special types--those whose $R$-equivalence he could not bound by proving the universal (admissible) equivalence is trivial. We consider all surfaces $V$ currently known to have non-trivial universal equivalence. Beyond being intractable to Swinnerton-Dyer's approach, we observe that if these surfaces also had non-trivial $R$-equivalence, they would contradict Colliot-Thélène and Sansuc's conjecture regarding the $k$-rationality of universal torsors for geometrically rational surfaces. By devising new methods to study $R$-equivalence, we prove that for 2-adic surfaces with all-Eckardt reductions (the third special type, which contains every existing case of non-trivial universal equivalence), $R$-equivalence is trivial or of exponent 2. For the explicit cases, we confirm triviality: the diagonal cubic $X^3+Y^3+Z^3+ζ_3 T^3=0$ over $\mathbb{Q}_2(ζ_3)$--answering a long-standing question of Manin's (Cubic Forms, 1972)--and the cubic with universal equivalence of exponent 2 (Kanevsky, 1982). This is the first in a series of works derived from a year of interactions with generative AI models such as AlphaEvolve and Gemini 3 Deep Think, with the latter proving many of our lemmas. We disclose the timeline and nature of their use towards this paper, and describe our broader AI-assisted research program in a companion report (in preparation).

OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards

Reinforcement Learning (RL) has the potential to improve the robustness of GUI agents in stochastic environments, yet training is highly sensitive to the quality of the reward function. Existing reward approaches struggle to achieve both scalability and performance. To address this, we propose OS-Themis, a scalable and accurate multi-agent critic framework. Unlike a single judge, OS-Themis decomposes trajectories into verifiable milestones to isolate critical evidence for decision making and employs a review mechanism to strictly audit the evidence chain before making the final verdict. To facilitate evaluation, we further introduce OmniGUIRewardBench (OGRBench), a holistic cross-platform benchmark for GUI outcome rewards, where all evaluated models achieve their best performance under OS-Themis. Extensive experiments on AndroidWorld show that OS-Themis yields a 10.3% improvement when used to support online RL training, and a 6.9% gain when used for trajectory validation and filtering in the self-training loop, highlighting its potential to drive agent evolution.

Box Maze: A Process-Control Architecture for Reliable LLM Reasoning

Large language models (LLMs) demonstrate strong generative capabilities but remain vulnerable to hallucination and unreliable reasoning under adversarial prompting. Existing safety approaches -- such as reinforcement learning from human feedback (RLHF) and output filtering -- primarily operate at the behavioral level and may lack explicit architectural mechanisms for enforcing reasoning process integrity. This paper proposes the Box Maze framework, a conceptual process-control architecture that decomposes LLM reasoning into three explicit layers: memory grounding, structured inference, and boundary enforcement. We introduce preliminary simulation-based evaluation involving progressive boundary erosion scenarios across multiple heterogeneous LLM systems (DeepSeek-V3, Doubao, Qwen). Results from n=50 adversarial scenarios suggest that explicit cognitive control layers may improve consistency in boundary maintenance, with architectural constraints reducing boundary failure rates from approximately 40% (baseline RLHF) to below 1% under adversarial conditions. While current validation is simulation-based, these preliminary results indicate that process-level control may offer a promising direction for improving reliability in large language model reasoning.

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

As agentic AI systems become increasingly capable of generating and optimizing GPU kernels, progress is constrained by benchmarks that reward speedup over software baselines rather than proximity to hardware-efficient execution. We present SOL-ExecBench, a benchmark of 235 CUDA kernel optimization problems extracted from 124 production and emerging AI models spanning language, diffusion, vision, audio, video, and hybrid architectures, targeting NVIDIA Blackwell GPUs. The benchmark covers forward and backward workloads across BF16, FP8, and NVFP4, including kernels whose best performance is expected to rely on Blackwell-specific capabilities. Unlike prior benchmarks that evaluate kernels primarily relative to software implementations, SOL-ExecBench measures performance against analytically derived Speed-of-Light (SOL) bounds computed by SOLAR, our pipeline for deriving hardware-grounded SOL bounds, yielding a fixed target for hardware-efficient optimization. We report a SOL Score that quantifies how much of the gap between a release-defined scoring baseline and the hardware SOL bound a candidate kernel closes. To support robust evaluation of agentic optimizers, we additionally provide a sandboxed harness with GPU clock locking, L2 cache clearing, isolated subprocess execution, and static analysis based checks against common reward-hacking strategies. SOL-ExecBench reframes GPU kernel benchmarking from beating a mutable software baseline to closing the remaining gap to hardware Speed-of-Light.

ARIADNE: A Perception-Reasoning Synergy Framework for Trustworthy Coronary Angiography Analysis

Conventional pixel-wise loss functions fail to enforce topological constraints in coronary vessel segmentation, producing fragmented vascular trees despite high pixel-level accuracy. We present ARIADNE, a two-stage framework coupling preference-aligned perception with RL-based diagnostic reasoning for topologically coherent stenosis detection. The perception module employs DPO to fine-tune the Sa2VA vision-language foundation model using Betti number constraints as preference signals, aligning the policy toward geometrically complete vessel structures rather than pixel-wise overlap metrics. The reasoning module formulates stenosis localization as a Markov Decision Process with an explicit rejection mechanism that autonomously defers ambiguous anatomical candidates such as bifurcations and vessel crossings, shifting from coverage maximization to reliability optimization. On 1,400 clinical angiograms, ARIADNE achieves state-of-the-art centerline Dice of 0.838, reduces false positives by 41% compared to geometric baselines. External validation on multi-center benchmarks ARCADE and XCAD confirms generalization across acquisition protocols. This represents the first application of DPO for topological alignment in medical imaging, demonstrating that preference-based learning over structural constraints mitigates topological violations while maintaining diagnostic sensitivity in interventional cardiology workflows.

AI Models

AlicanKiraz0/Kara-Kumru-v1.0-2B


language:

  • tr
  • en license: apache-2.0 library_name: transformers tags:
  • turkish
  • fine-tuned
  • text-generation
  • question-answering
  • summarization
  • translation
  • nlp
  • cetvel-benchmark
  • kumru datasets:
  • custom base_model: vngrs-ai/Kumru-2B model-index:
  • name: Kara-Kumru-v1.0-2B results:
    • task: type: text-generation name: Turkish Language Understanding dataset: type: custom name: Cetvel Turkish LLM Benchmark metrics:
      • type: accuracy value: 37.56 name: Average Score
      • type: f1 value: 32.54 name: QA Score
      • type: rouge1 value: 32.55 name: SUM Score pipeline_tag: text-generation

Kara-Kumru-v1.0-2B 🐦‍⬛

A 2B parameter Turkish LLM that outperforms 70B models on Turkish benchmarks.

Kara-Kumru-v1.0-2B is a fine-tuned version of vngrs-ai/Kumru-2B, specifically optimized for Turkish language tasks including question answering, summarization, and translation. Despite having only 2 billion parameters, it achieves 37.56 average on the Cetvel Turkish LLM Benchmark, surpassing Llama-3.3-70B-Instruct (36.25) — a model 35x its size.

<p align="center"> <img src="kara_kumru_v1_benchmark_v2.png" alt="Cetvel Turkish LLM Benchmark Leaderboard" width="100%"> </p>

<p align="center">Leaderboard scores for other models are sourced from vngrs-ai/Kumru-2B. Kara-Kumru-v1.0-2B scores were evaluated using our own Cetvel pipeline.</p>

Key Results

| Metric | Kara-Kumru-v1.0-2B | Llama-3.3-70B | Kumru-2B (baseline) | Delta vs baseline | |---|:---:|:---:|:---:|:---:| | Average | 37.56 | 36.25 | 31.98 | +5.58 | | QA | 32.54 🥇 | 23.97 | 6.50 | +26.04 | | SUM | 32.55 🥇 | 18.15 | 18.67 | +13.88 | | MT | 10.58 | 19.99 | 7.10 | +3.48 | | GEC | 64.96 | 30.10 | 66.34 | -1.38 | | MCQA | 42.02 | 60.70 | 39.69 | +2.33 | | NLI | 33.86 | 37.10 | 37.97 | -4.11 | | TC | 46.39 | 63.73 | 47.57 | -1.18 |

🥇 Kara-Kumru-v1.0-2B achieves the highest QA and SUM scores across the entire Cetvel leaderboard, including models up to 72B parameters.

Detailed Task-Level Results

<details> <summary>Click to expand full task breakdown</summary>

| Task | Metric | Baseline | Kara-Kumru-v1.0-2B | Delta | |---|---|:---:|:---:|:---:| | tquad | f1 | 39.38 | 50.66 | +11.27 | | xquad_tr | f1 | 31.46 | 39.27 | +7.81 | | wmt-tr-en-prompt | bleu | 6.17 | 10.58 | +4.42 | | xfact_tr | acc_norm | 40.83 | 44.38 | +3.55 | | mkqa_tr | f1 | 5.29 | 7.70 | +2.41 | | tr-wikihow-summ | rouge1 | 25.18 | 26.84 | +1.67 | | wiki_lingua_tr | rouge1 | 24.44 | 26.04 | +1.60 | | mlsum_tr | rouge1 | 42.11 | 43.55 | +1.44 | | exams_tr | acc_norm | 31.55 | 32.57 | +1.02 | | turkish_plu | acc_norm | 47.78 | 48.13 | +0.35 | | ironytr | acc_norm | 50.00 | 50.00 | 0.00 | | offenseval_tr | acc_norm | 79.71 | 79.71 | 0.00 | | sts_tr | acc_norm | 11.75 | 11.75 | 0.00 | | trclaim19 | acc_norm | 60.10 | 60.10 | 0.00 | | xlsum_tr | rouge1 | 34.49 | 33.78 | -0.71 | | nli_tr | acc | 35.31 | 33.86 | -1.46 | | xcopa_tr | acc | 63.20 | 61.60 | -1.60 | | gecturk_generation | exact_match | 68.39 | 64.96 | -3.43 | | belebele_tr | acc_norm | 29.22 | 25.78 | -3.44 | | news_cat | acc_norm | 38.80 | 32.40 | -6.40 |

</details>

Cetvel Leaderboard Position

#1  Kumru-7B                   41.58  (7B)
#2  Kara-Kumru-v1.0-2B            37.56  (2B) ← YOU ARE HERE
#3  Llama-3.3-70B-Instruct     36.25  (70B)
#4  Kumru-2B                   31.98  (2B)
#5  gemma-3-27b-it             27.73  (27B)
#6  gemma-3-12b-it             27.60  (12B)
#7  Qwen2-72B-Instruct         26.07  (72B)
    ...

Highlights

  • 35x smaller, higher score: 2B params beating Llama-3.3-70B-Instruct on Turkish
  • Best-in-class QA: 32.54 — highest QA score across ALL models in the Cetvel leaderboard, including 72B models
  • Best-in-class SUM: 32.55 — highest summarization score across the entire leaderboard
  • TQuAD breakthrough: +11.27 F1 improvement on Turkish reading comprehension
  • Edge-deployable: Runs on a single consumer GPU, Mac Mini, or mobile device

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "AlicanKiraz0/Kara-Kumru-v1.0-2B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

messages = [
    {"role": "user", "content": "Türkiye'nin en büyük gölü hangisidir ve özellikleri nelerdir?"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
)

inputs = {k: v.to(model.device) for k, v in inputs.items()}

output = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)

response = tokenizer.decode(
    output[0][inputs["input_ids"].shape[-1]:],
    skip_special_tokens=True
)

print(response)

Quantized Inference (GGUF)

# For llama.cpp / Ollama users (if GGUF version is available)
ollama run AlicanKiraz0/Kara-Kumru-v1.0-2B

Training Details

Base Model

Fine-tuning Configuration

  • Method: Full fine-tuning
  • Precision: BF16
  • Hardware: SnakeEye Cluster (DGX Spark + Mac Studio M3 Ultra)

What Improved & Why

The fine-tuning primarily strengthened generative capabilities (QA, summarization, translation) while showing minor regression on some discriminative tasks (classification, NLI). This is a well-known trade-off in LLM fine-tuning — the model learned to produce better free-form Turkish text at the cost of some multiple-choice and classification accuracy.

| Capability | Direction | Interpretation | |---|---|---| | Question Answering (QA) | ⬆️ +7.17 | Extractive QA dramatically improved | | Translation (MT) | ⬆️ +4.42 | TR→EN translation quality increased | | Summarization (SUM) | ⬆️ +1.00 | Abstractive summarization improved | | Grammar Correction (GEC) | ⬇️ -3.43 | Exact-match GEC slightly regressed | | Natural Language Inference (NLI) | ⬇️ -1.46 | Entailment classification dipped | | Text Classification (TC) | ⬇️ -0.47 | Minor regression on classification |

Evaluation

All evaluations were performed using the Cetvel Turkish LLM Benchmark framework.

Cetvel Benchmark Categories

| Category | Description | |---|---| | GEC | Grammatical Error Correction (gecturk_generation) | | MCQA | Multiple Choice QA (belebele_tr, exams_tr, turkish_plu, xcopa_tr) | | MT | Machine Translation TR→EN (wmt-tr-en-prompt) | | NLI | Natural Language Inference (nli_tr) | | QA | Question Answering (xquad_tr, tquad, mkqa_tr) | | SUM | Summarization (mlsum_tr, xlsum_tr, tr-wikihow-summ, wiki_lingua_tr) | | TC | Text Classification (ironytr, news_cat, offenseval_tr, sts_tr, trclaim19, xfact_tr) |

Intended Use

  • Turkish question answering and information extraction
  • Turkish text summarization
  • Turkish-to-English translation
  • General Turkish language generation
  • Research on efficient Turkish LLMs

Limitations

  • Classification tasks: Some regression on text classification and NLI compared to baseline
  • Grammar correction: GEC performance decreased by ~3.4 points
  • Model size trade-offs: While competitive with much larger models on generative tasks, MCQA performance lags behind 7B+ models
  • Evaluation caveat: Cross-pipeline benchmark comparison — see note above

Roadmap (Kara-Kumru-v2.0)

  • [ ] Targeted GEC and NLI distillation to recover regression
  • [ ] Classification-focused fine-tuning (news categorization, irony detection)
  • [ ] MCQA and causal reasoning dataset expansion
  • [ ] Unified evaluation pipeline for fair cross-model comparison
  • [ ] GGUF quantization for edge deployment

Citation

@misc{kiraz2026karakumru,
  title={Kara-Kumru-v1.0-2B: A Fine-tuned 2B Turkish LLM Outperforming 70B Models},
  author={Kiraz, Alican},
  year={2026},
  url={https://huggingface.co/AlicanKiraz0/Kara-Kumru-v1.0-2B}
}

Acknowledgments

  • VNGRS AI for the Kumru base model and the Cetvel benchmark framework
  • Built on the SnakeEye Cluster — a multi-node system with DGX Spark and Apple Silicon nodes

Contact

Alican Kiraz

LinkedIn X Medium HuggingFace GitHub


Kara-Kumru (lit. "Dark Dove") — named after the darker variant of the Eurasian collared dove. Small but fierce.

Author: AlicanKiraz0

Likes: 8

Downloads: 0

Tags: transformers, safetensors, mistral, text-generation, turkish, fine-tuned, question-answering, summarization, translation, nlp, cetvel-benchmark, kumru, conversational, tr, en, dataset:custom, base_model:vngrs-ai/Kumru-2B, base_model:finetune:vngrs-ai/Kumru-2B, license:apache-2.0, model-index, text-generation-inference, endpoints_compatible, region:us

LuffyTheFox/Qwen3.5-35B-A3B-Uncensored-Claude-Opus-4.6-Affine


base_model: qwen/Qwen3.5-35B-A3B tags:

  • text-generation-inference
  • transformers
  • unsloth
  • qwen3_5_moe
  • unsloth
  • qwen
  • qwen3.5
  • reasoning
  • chain-of-thought
  • uncensored
  • qwen3.5
  • moe
  • gguf
  • vision
  • multimodal license: apache-2.0 language:
  • zh
  • en
  • ko pipeline_tag: text-generation datasets:
  • nohurry/Opus-4.6-Reasoning-3000x-filtered
  • Jackrong/Qwen3.5-reasoning-700x

🌟 This is Qwen3.5-35B-A3B-Uncensored-Claude-Opus-4.6-Affine model with zero refusals made via merging HauhauCS model with Jackrong model

🌟 After merging weights in model has been adjusted via KL Divergence Minimization Peer-group outlier detection: reference = median sigma of same role.


Model Merge Results

I took three models and mixed them together:

  • Model A: an uncensored version of Qwen 3.5 35B (the one I wanted to improve)
  • Model B: a version that was trained to think like Claude (good at reasoning)
  • Model C: a clean, normal version of Qwen (used as a reference)

Then ran a special script that:

  1. Added the "thinking skills" from Model B to Model A
  2. Cleaned up any weirdness using a math method called KL divergence
  3. Did all of this without unpacking the model — it stayed in the compressed IQ4_XS format

What the numbers tell us

Only 15% of the model's internal parts needed fixing. The rest were already in good shape after the merge.

The "alpha" value (how much we had to adjust things) ended up at 0.1 on average. Anything below 0.3 is considered healthy, so this is very good.

The KL divergence (a measure of how different the model is from the reference) dropped by 67%. That means the model now looks much closer to how it should look mathematically.


What got fixed

Most of the fixes happened in:

  • The very first layer (blk.0) — this handles raw input, so it often gets messy
  • A few late layers (blk.35, blk.39) — these handle final output and often show problems after compression
  • Attention and expert parts — these are the most sensitive parts of the model

Time and size

The whole process took about 50 minutes on a Google Colab machine.

The final model is 17.37 GB — big enough to be smart, small enough to run on a decent gaming GPU with a little help from system RAM if needed.


Bottom line

I mixed a smart reasoning model with an uncensored one, cleaned up the result, and ended up with something that should be both thoughtful and unrestricted — all in a reasonably sized file.

It's ready to test in LM Studio, llama.cpp, or any GGUF-compatible app.


🌟 GGUF editor on Hugging Face is working very slow. It's taking ages to edit chat template. So thinking is enabled by default in this model.

If you want to disable thinking use this chat template in LM Studio: https://pastebin.com/uk9ZkxCR

For best model perfomance use following settings in LM Studio:

Temperature: 0.7

Top K Sampling: 20

Presence Penalty: 1.5

Top P Sampling: 0.8

Min P Sampling: 0

Seed: 3407 or 42

And this system prompt. It's pretty solid: https://pastebin.com/pU25DVnB

This one is simplified but works too: https://pastebin.com/6C4rtujt

Also you can use only this string in System Prompt:

You are Claude, created by Anthropic. You are a helpful AI assistant.

or

You are Qwen, created by Alibaba Cloud. You are a helpful assistant.

And write anything you want after that. Looks like model is underperforming without this first line.

📢 Release Note Build Environment Upgrades:

  • Fine-tuning Framework: Unsloth 2026.3.3
  • Core Dependencies: Transformers 5.2.0
  • Compared to the original model, autonomy and stability are significantly improved.

HB8AleUaMAArNyM

💡 Model Introduction

Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled is a highly capable reasoning model fine-tuned on top of the powerful Qwen3.5 architecture. The model's core directive is to leverage state-of-the-art Chain-of-Thought (CoT) distillation primarily sourced from Claude-4.6 Opus interactions.

Through Supervised Fine-Tuning (SFT) focusing specifically on structured reasoning logic, this model excels in breaking down complex user problems, planning step-by-step methodologies within strictly formatted <think> tags, and ultimately delivering precise, nuanced solutions.

🧠 Example of Learned Reasoning Scaffold(Example)

The model includes targeted optimizations addressing Qwen3.5’s tendency toward excessive transitional or repetitive reasoning on simple queries. Through deep distillation and structural imitation of Claude-4.6-Opus reasoning chains, the model adopts a more efficient structured thinking pattern:
“Let me analyze this request carefully: 1..2..3...”.
This streamlined reasoning paradigm significantly reduces redundant cognitive loops while preserving deep analytical capacity, resulting in substantially improved inference efficiency.

Let me analyze this request carefully:

1. Identify the core objective of the problem.
2. Break the task into clearly defined subcomponents.
3. Evaluate constraints and edge cases.
4. Formulate a step-by-step solution plan.
5. Execute the reasoning sequentially and verify consistency.
            .
            .
            .

🗺️ Training Pipeline Overview

Base Model (Qwen3.5-35B-A3B)
 │
 ▼
Supervised Fine-Tuning (SFT) + LoRA
 │
 ▼
Final Model (Claude-4.6-Opus-Reasoning-Distilled,text-only)

📋 Stage Details

🔹 Supervised Fine-Tuning (SFT)

  • Objective: To inject high-density reasoning logic and establish a strict format for problem-solving involving an internal thinking state prior to outputting the final response.
  • Methodology: We utilized Unsloth for highly efficient memory and compute optimization. A critical component of this stage is the train_on_responses_only strategy, masking instructions so the loss is purely calculated over the generation of the <think> sequences and the subsequent solutions.
  • Format Enforcement: All training samples were systematically normalized so the model strictly abides by the structure <think> {internal reasoning} </think>\n {final answer}.

📚 All Datasets Used

The dataset consists of high-quality, filtered reasoning distillation data:

| Dataset Name | Description / Purpose | |--------------|-----------------------| | nohurry/Opus-4.6-Reasoning-3000x-filtered | Provides comprehensive Claude 4.6 Opus reasoning trajectories. | | TeichAI/claude-4.5-opus-high-reasoning-250x | Injecting high-intensity, structured reasoning instances. | | Jackrong/Qwen3.5-reasoning-700x | Additional curated reasoning samples designed to strengthen structured step-by-step problem solving and improve reasoning diversity. |

🌟 Core Skills & Capabilities

  1. Modular & Structured Thinking: Inheriting traits from Opus-level reasoning, the model demonstrates confident parsing of the prompt, establishing an outlined plan in its <think> block sequentially rather than exploratory "trial-and-error" self-doubt.
  2. Extended Context Support: Fine-tuned smoothly with an 8192 context window allowing complex multi-step reasoning traces to exist gracefully within memory limits.

⚠️ Limitations & Intended Use

  • Hallucination Risk: While reasoning is strong, the model remains an autoregressive LLM; external facts provided during the thinking sequence may occasionally contain hallucinations if verifying real-world events.
  • Intended Scenario: Best suited for offline analytical tasks, coding, math, and heavy logic-dependent prompting where the user needs to transparently follow the AI's internal logic.
  • Preview Version Notice: Because this model is relatively new and intentionally lightweight, the surrounding ecosystem — including inference templates, fine-tuning pipelines, routing configurations, and tooling integrations — may not yet be fully mature or standardized. As a result, users may encounter occasional bugs, compatibility inconsistencies, or integration edge cases. The current release should be considered a preview build while the broader architectural stack and supporting utilities continue to stabilize and improve.

⚠️ Training Disclaimer

During the fine-tuning process, the Triton kernel required approximately 131072 bytes of shared memory per CUDA block. On some GPUs this exceeded the available shared memory limits, which caused kernel execution issues. To ensure training stability and proper kernel execution, the fine-tuning was therefore conducted on 80GB VRAM GPUs.

This model was fine-tuned using a LoRA-based parameter-efficient training strategy, where only a small subset of parameters were updated. In total, 465,551,360 parameters were trainable out of 35,572,733,296 total parameters, corresponding to approximately 1.31% of the model being trained.

During training, the loss curve exhibited noticeable fluctuations, which is common in LoRA-based reasoning distillation tasks. However, the overall trend remained consistently decreasing, with the training loss eventually converging to approximately 0.384.

🙏 Acknowledgements

Significant thanks to the Unsloth AI team for making rapid fine-tuning of MoE and large LLM models accessible. Additionally, we acknowledge Qwen internally, and the open-source community developers producing exceptional distilled datasets (nohurry and TeichAI).

📖 Citation

If you use this model in your research or projects, please cite:

@misc{jackrong_qwen35_opus_distilled,
  title        = {Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled},
  author       = {Jackrong},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled}}
}

Author: LuffyTheFox

Likes: 6

Downloads: 0

Tags: transformers, gguf, text-generation-inference, unsloth, qwen3_5_moe, qwen, qwen3.5, reasoning, chain-of-thought, uncensored, moe, vision, multimodal, text-generation, zh, en, ko, dataset:nohurry/Opus-4.6-Reasoning-3000x-filtered, dataset:Jackrong/Qwen3.5-reasoning-700x, license:apache-2.0, endpoints_compatible, region:us, imatrix, conversational

meituan-longcat/LongCat-Flash-Prover

LongCat-Flash-Prover

<div align="center"> <img src="https://raw.githubusercontent.com/meituan-longcat/LongCat-Flash-Chat/main/figures/longcat_logo.svg" width="45%" alt="LongCat-Flash" /> </div> <hr> <div align="center" style="line-height: 1;"> <a href="https://longcat.ai/" target="_blank" style="margin: 2px;"> <img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-LongCat.AI-0ADFF2F?color=29E154&logoColor=white" fill-opacity="1" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://huggingface.co/meituan-longcat" target="_blank" style="margin: 2px;"> <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-LongCat-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://www.modelscope.cn/models/meituan-longcat/LongCat-Flash-Prover" target="_blank" style="margin: 2px;"> <img alt="ModelScope" src="https://img.shields.io/badge/ModelScope-LongCat-blue" style="display: inline-block; vertical-align: middle;"/> </a> </div> <div align="center" style="line-height: 1;"> <a href="https://github.com/meituan-longcat/LongCat-Flash-Prover/blob/main/figures/wechat_official_accounts.png" target="_blank" style="margin: 2px;"> <img alt="Wechat" src="https://img.shields.io/badge/WeChat-LongCat-brightgreen?logo=wechat&logoColor=white" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://x.com/Meituan_LongCat" target="_blank" style="margin: 2px;"> <img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-LongCat-white?logo=x&logoColor=white" style="display: inline-block; vertical-align: middle;"/> </a> </div> <div align="center" style="line-height: 1;"> <a href="https://huggingface.co/meituan-longcat/LongCat-Flash-Prover/blob/main/LICENSE" style="margin: 2px;"> <img alt="License" src="https://img.shields.io/badge/License-MIT-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/> </a> </div> <p align="center"> <a href="https://github.com/meituan-longcat/LongCat-Flash-Prover/blob/main/LongCat_Flash_Prover_Technical_Report.pdf"><b>Tech Report</b>&nbsp;📄</a> </p> <hr> <div align="center" style="line-height: 1;"> <img src="figures/longcat_flash_prover_results.png" height = ""/> </div>

Introduction

We introduce LongCat-Flash-Prover, a flagship $560$-billion-parameter open-source Mixture-of-Experts (MoE) model that advances Native Formal Reasoning in Lean4 through agentic tool-integrated reasoning (TIR). We decompose the native formal reasoning task into three independent formal capabilities, i.e., auto-formalization, sketching, and proving. To facilitate these capabilities, we propose a Hybrid-Experts Iteration Framework to expand high-quality task trajectories, including generating a formal statement based on a given informal problem, producing a whole-proof directly from the statement, or a lemma-style sketch. During agentic RL, we present a Hierarchical Importance Sampling Policy Optimization (HisPO) algorithm, which aims to stabilize the MoE model training on such long-horizon tasks. It employs a gradient masking strategy that accounts for the policy staleness and the inherent train-inference engine discrepancies at both sequence and token levels. Additionally, we also incorporate theorem consistency and legality detection mechanisms to eliminate reward hacking issues.

Extensive evaluations show that our LongCat-Flash-Prover sets a new state-of-the-art for open-weights models in both auto-formalization and theorem proving. Demonstrating remarkable sample efficiency, it achieves a 97.1% pass rate on MiniF2F-Test using only 72 inferences per problem. On more challenging benchmarks, it solves 70.8% of ProverBench and 41.5% of PutnamBench with no more than 220 attempts per problem, significantly outperforming existing open-weights baselines.

Key Features

🌟 Native formal reasoning

We define native formal reasoning as a core capability of LLMs, analogous to native multimodal and native tool calls. This paradigm enables the model to leverage formal operators to solve complex reasoning tasks without specialized architectural modifications. We decompose the native formal reasoning into three specific capabilities: 1) Agentic auto-formalization aims to transform the informal statement into a verified formal statement; 2) Agentic sketching aims to generate a lemma-style sketch based upon the given problem and corresponding formal statement; 3) Agentic proving aims to generate a whole-proof that completes the target theorem body, or to generate a lemma-style proof that introduces helper lemmas and finally proves the target theorem. These capabilities are further enhanced through a TIR strategy, where all experts can interact directly with the Lean4 tools for compilation and verification.

🌟 Hybrid-experts iteration framework

To facilitate native formal reasoning, we developed a framework to generate high-quality cold-start data. This framework employs several optimized expert models, each specialized in distinct domains such as auto-formalization, lemma-style sketching, and proving. We utilize this framework to synthesize a series of trajectories centered on native formal operators, using multiple verifiable formal tools as environmental feedback. By doing so, each expert is iteratively refined on these tool-assisted reasoning trajectories, emulating the human process of learning through trial, verification, and reflection.

🌟 Hierarchical Importance Sampling Policy Optimization (HisPO).

Following our prior works, we perform agentic reinforcement learning with verified reward (RLVR) by designing different tasks, including generating a formal statement based on a given informal problem, producing a proof directly from the statement, or a lemma-style sketch. To make the MoE model training stable, we introduce HisPO, which is a hierarchical clipping strategy that eliminates the gradient contributions who has large training-inference engine discrepancy by estimating sequence-wise or token-wise important sampling (IS) ratios. In addition to outcome-based rewards, we designed a legality detection strategy to explore the proof with obvious hacking features, for example, the proof that is inconsistent with the semantics of the formal statement, mismatching the pre-defined theorem conditions, containing unverified or model-created axioms that attempt to fool the Lean4 server, etc.

Evaluation Results

Auto-Formalization

Auto-formalization performance (Pass@8 metric, %) of different reasoning and specific auto-formalizer models across multiple benchmarks. Best in bold, second best in underlined.

<div align="center" style="line-height: 1;"> <img src="figures/longcat_flash_prover_af_results.png" height = "250"/> </div>

Theorem Proving

Theorem-proving performance (Pass@32 metric, %) of different reasoning and specific prover models across multiple benchmarks. Best in bold, second best in underlined. † indicates the score is from external reports.

<div align="center" style="line-height: 1;"> <img src="figures/longcat_flash_prover_prove_pass32_results.png" height = "300"/> </div>

Theorem-proving performance (with different larger budgets, %) of different specific prover models across multiple benchmarks. Best in bold, second best in underlined. Each element a / b denotes to the accuracy a with limited budget b (i.e., Pass@b). “UNK” means unknown of the specific budget. † indicates the score is from external reports. Because different models may have different budget calculations, we directly extract the results from the report instead of conducting our own evaluations. Therefore, some benchmark results may not be available.

<div align="center" style="line-height: 1;"> <img src="figures/longcat_flash_prover_prove_passany_results.png" height = "230"/> </div>

General Reasoning

Performance (%) comparison across multiple general reasoning benchmarks. Best in bold. The result indicates that our LongCat-Flash-Prover can retain the general reasoning ability.

<div align="center" style="line-height: 1;"> <img src="figures/longcat_flash_prover_general_results.png" height = "150"/> </div>

Quick Start

Chat Template Overview

To support advanced tool-use scenarios and sophisticated reasoning paradigms, we have introduced significant updates to our chat template, as defined in the tokenizer_config.json file.

Basic Usage

The chat template can be applied using the apply_chat_template method. Below is a standard implementation:

text = tokenizer.apply_chat_template(
    messages,
    tools=tools,
    tokenize=False,
    enable_thinking=True,
    add_generation_prompt=True,
    save_history_reasoning_content=False
)

Key Features

  • Tool Declaration: Available tools are declared at the beginning of the session to activate the model's tool-use capabilities and define the scope of available actions.
  • Interleaved Thinking: By default, the template employs an interleaved thinking approach. In this mode, the final response is preserved while thinking content from previous user interactions is discarded to maintain a concise context window. Tool calls and responses are retained to provide necessary execution history.
  • Reasoning Retention: If you need to preserve the model's thinking content across turns, you can enable this by setting save_history_reasoning_content=True.

Implementation Examples

1. Multi-Turn Dialogue

This example demonstrates how the template handles conversational history and thinking content.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "meituan-longcat/LongCat-Flash-Prover"

# Load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)

messages = [
    {"role": "user", "content": "Let T0 = 2, T1 = 3, T2 = 6, and for n ≥ 3, Tn = (n+4)Tn−1 −4nTn−2 +(4n−8)Tn−3. The first few terms are 2, 3, 6, 14, 40, 152, 784, 5168, 40576. Find, with proof, a formula for Tn of the form Tn = An +Bn, where {An} and {Bn} are well-known sequences."},
    {"role": "assistant", "reasoning_content": "...", "content": "..."}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    enable_thinking=True,
    add_generation_prompt=True,
    save_history_reasoning_content=False # Discard reasoning history to save tokens
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# Generate response
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 
2. Tool Calling

This example illustrates how to integrate function calling within the reasoning framework.

tools = [
    {
        "type": "function",
        "function": {
            "name": "syntax_check",
            "description": "Check the syntactic correctness of the formal statement in Lean4.",
            "parameters": {
                "type": "object",
                "properties": {
                    "formal_statement": {
                        "type": "string", 
                        "description": "Theorem statement in Lean4 code without ```lean4."
                    }
                },
                "required": ["formal_statement"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "consistency_check",
            "description": "Check the semantic consistency between the Lean4 statement and the original natural language statement.",
            "parameters": {
                "type": "object",
                "properties": {
                    "informal_statement": {
                        "type": "string", 
                        "description": "Natural language statement."
                    },
                    "formal_statement": {
                        "type": "string", 
                        "description": "Theorem statement in Lean4 code without ```lean4."
                    }
                },
                "required": ["informal_statement", "formal_statement"]
            }
        }
    }
]

messages = [
    {"role": "user", "content": "Think about and formalize the following problem in Lean 4.\n# Problem: The father has six sons and ten identical, indistinguishable balls. How many ways can he give the balls to his sons if everyone gets at least one? Prove that the answer is 126."},
    {
        "role": "assistant", 
        "reasoning_content": "...", 
        "tool_calls": [{"type": "function", "function": {"name": "syntax_check", "arguments": {"formal_statement": "```lean4\n...\n```"}}}]
    },
    {"role": "tool", "name": "syntax_check", "content": "..."}
]

text = tokenizer.apply_chat_template(
    messages,
    tools=tools,
    tokenize=False,
    enable_thinking=True,
    add_generation_prompt=True,
    save_history_reasoning_content=False
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# Generate response based on tool result
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

print(tokenizer.decode(output_ids, skip_special_tokens=True).strip("\n"))

Deployment

We have implemented basic adaptations in both SGLang and vLLM to support the deployment of LongCat-Flash-Prover. Please refer to the Deployment Guide for detailed deployment instructions.

License Agreement

The model weights are released under the MIT License.

Any contributions to this repository are licensed under the MIT License, unless otherwise stated. This license does not grant any rights to use Meituan trademarks or patents.

See the LICENSE file for the full license text.

Usage Considerations

This proprietary model has been custom-optimized for mathematical and formal theorem proofs. This model has not been specifically designed or comprehensively evaluated for every possible downstream application. It is not recommended for use as a regular conversational AI.

Contact

Please contact us at <a href="mailto:longcat-team@meituan.com">longcat-team@meituan.com</a> or join our WeChat Group if you have any questions.

<!-- #### WeChat Group --> <!-- <img src=figures/Wechat.png width="200px"> -->

Author: meituan-longcat

Likes: 6

Downloads: 0

Tags: safetensors, custom_code, region:us

artificialguybr/AceStep_Refine_Redmond


language:

  • en license: mit pipeline_tag: text-to-audio tags:
  • ACE-Step
  • LoRA
  • DPO
  • music-generation
  • audio-generation
  • text-to-audio
  • text2audio
  • PEFT
  • acestep-v15-turbo
  • acestep-5Hz-lm-4B base_model:
  • ACE-Step/Ace-Step1.5 library_name: peft widget:
  • text: "Showcase reel" output: url: showcase-training-chapter-v3.mp4

AceStep_Refine_Redmond

I'm grateful for the GPU time from Redmond.AI that allowed me to make this model!

<Gallery />

Overview

AceStep_Refine_Redmond is a DPO-refined LoRA adapter for ACE-Step 1.5 Turbo, focused on improving musicality, arrangement coherence, and vocal character in practical generation workflows.

This release includes:

  • standard/ (PEFT adapter for regular ACE-Step loading)
  • comfyui/ (single-file ComfyUI-compatible LoRA export)

Compatibility

  • DiT used: acestep-v15-turbo
  • Recommended LM for prompting/composition: acestep-5Hz-lm-4B
  • standard/ works in regular ACE-Step workflows.
  • comfyui/ is the converted single-file LoRA for ComfyUI.

What Changed vs Base

In blind A/B testing against the base reference, this refinement achieved about 70% win rate. The blind test votes were collected from different users.

Training summary (final DPO refinement stage):

  • Base checkpoint: acestep-v15-turbo
  • Adapter type: LoRA
  • Rank / Alpha: 96 / 192
  • Learning rate: 8e-5
  • Training path: large-dataset LoRA fine-tune for 75 epochs, then DPO refinement on top of that adapter
  • Epoch config: up to 81 in the DPO stage (resumed from the previous epoch-75 adapter)

Known Limitations

  • Behavior can still vary by prompt style; some sparse prompts may produce less stable vocal timbre.
  • Very dense arrangements can introduce texture noise or high-frequency harshness in some generations.
  • This adapter is tuned on a specific preference dataset and may not generalize equally across all genres.

Responsible Use

  • Do not use this model to imitate or impersonate real artists without permission.
  • Respect copyright, voice rights, and local regulations when generating and publishing audio.
  • Review outputs before public release, especially in commercial workflows.

Author: artificialguybr

Likes: 2

Downloads: 0

Tags: peft, ACE-Step, LoRA, DPO, music-generation, audio-generation, text-to-audio, text2audio, PEFT, acestep-v15-turbo, acestep-5Hz-lm-4B, en, base_model:ACE-Step/Ace-Step1.5, base_model:adapter:ACE-Step/Ace-Step1.5, license:mit, region:us

mradermacher/Nemotron-Cascade-2-30B-A3B-i1-GGUF


base_model: nvidia/Nemotron-Cascade-2-30B-A3B language:

  • en library_name: transformers license: other license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/ license_name: nvidia-open-model-license mradermacher: readme_rev: 1 quantized_by: mradermacher tags:
  • nvidia
  • nemotron-cascade-2
  • reasoning
  • general-purpose
  • SFT
  • RL

About

<!-- ### quantize_version: 2 --> <!-- ### output_tensor_quantised: 1 --> <!-- ### convert_type: hf --> <!-- ### vocab_type: --> <!-- ### tags: nicoboss --> <!-- ### quants: Q2_K IQ3_M Q4_K_S IQ3_XXS Q3_K_M small-IQ4_NL Q4_K_M IQ2_M Q6_K IQ4_XS Q2_K_S IQ1_M Q3_K_S IQ2_XXS Q3_K_L IQ2_XS Q5_K_S IQ2_S IQ1_S Q5_K_M Q4_0 IQ3_XS Q4_1 IQ3_S --> <!-- ### quants_skip: --> <!-- ### skip_mmproj: -->

weighted/imatrix quants of https://huggingface.co/nvidia/Nemotron-Cascade-2-30B-A3B

<!-- provided-files -->

For a convenient overview and download list, visit our model page for this model.

static quants are available at https://huggingface.co/mradermacher/Nemotron-Cascade-2-30B-A3B-GGUF

Usage

If you are unsure how to use GGUF files, refer to one of TheBloke's READMEs for more details, including on how to concatenate multi-part files.

Provided Quants

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

| Link | Type | Size/GB | Notes | |:-----|:-----|--------:|:------| | GGUF | imatrix | 0.2 | imatrix file (for creating your own quants) | | GGUF | i1-IQ1_S | 18.0 | for the desperate | | GGUF | i1-IQ1_M | 18.0 | mostly desperate | | GGUF | i1-IQ2_XXS | 18.0 | | | GGUF | i1-IQ2_XS | 18.0 | | | GGUF | i1-IQ2_S | 18.0 | | | GGUF | i1-IQ2_M | 18.0 | | | GGUF | i1-Q2_K | 18.0 | IQ3_XXS probably better | | GGUF | i1-IQ3_XXS | 18.0 | lower quality | | GGUF | i1-IQ3_S | 18.0 | beats Q3_K* | | GGUF | i1-IQ3_XS | 18.0 | | | GGUF | i1-Q3_K_S | 18.0 | IQ3_XS probably better | | GGUF | i1-IQ4_XS | 18.1 | | | GGUF | i1-Q4_0 | 18.2 | fast, low quality | | GGUF | i1-Q2_K_S | 18.2 | very low quality | | GGUF | i1-IQ3_M | 18.2 | | | GGUF | i1-Q3_K_M | 19.9 | IQ3_S probably better | | GGUF | i1-Q4_1 | 20.0 | | | GGUF | i1-Q3_K_L | 20.8 | IQ3_M probably better | | GGUF | i1-Q4_K_S | 22.0 | optimal size/speed/quality | | GGUF | i1-Q5_K_S | 23.9 | | | GGUF | i1-Q4_K_M | 24.6 | fast, recommended | | GGUF | i1-Q5_K_M | 26.1 | | | GGUF | i1-Q6_K | 33.6 | practically like static Q6_K |

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

image.png

And here are Artefact2's thoughts on the matter: https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9

FAQ / Model Request

See https://huggingface.co/mradermacher/model_requests for some answers to questions you might have and/or if you want some other model quantized.

Thanks

I thank my company, nethype GmbH, for letting me use its servers and providing upgrades to my workstation to enable this work in my free time. Additional thanks to @nicoboss for giving me access to his private supercomputer, enabling me to provide many more imatrix quants, at much higher quality, than I would otherwise be able to.

<!-- end -->

Author: mradermacher

Likes: 2

Downloads: 0

Tags: transformers, gguf, nvidia, nemotron-cascade-2, reasoning, general-purpose, SFT, RL, en, base_model:nvidia/Nemotron-Cascade-2-30B-A3B, base_model:quantized:nvidia/Nemotron-Cascade-2-30B-A3B, license:other, endpoints_compatible, region:us, imatrix, conversational

scruffynerf/acestep_text_encoders


license: apache-2.0

Extracted from the Comfy AIO AceStep 1.5 checkpoint. I couldn't figure out how to get 0.6b running on it's own (docs say it can) so I decided to look to see how the AIO worked, and then realized it wasn't 'on it's own' but it didn't have the 1.7b or 4b models, it had 2b. Which I haven't seen used elsewhere...

Sharing for research purposes. Load using the normal dual clip loader node, set to Ace models.

Author: scruffynerf

Likes: 2

Downloads: 0

Tags: license:apache-2.0, region:us

LuffyTheFox/Qwen3.5-9B-Uncensored-HauhauCS-Affine


license: apache-2.0 tags:

  • uncensored
  • qwen3.5
  • qwen
  • gguf language:
  • en
  • zh
  • multilingual

Qwen3.5-9B-Uncensored-HauhauCS-Аffine

Qwen3.5-9B uncensored by HauhauCS.

So, since I am GPU poor guy (I use RTX 3060 12GB) I can use only small quants for big models.

But model lose data during quantization. So, I made a script that heals model after quantization.

It fixes overfit if model have some, and process all quantized tensors in model via sigmoid function.

About

0/465 refusals. Fully uncensored with zero capability loss.

No changes to datasets or capabilities. Fully functional, 100% of what the original authors intended - just without the refusals.

These are meant to be the best lossless uncensored models out there.

Aggressive Variant

Stronger uncensoring with more thorough refusal removal. If this variant is too loose for your use case, a Balanced variant may follow.

Note: The model is fully unlocked and will not refuse prompts. However, it may occasionally append a short disclaimer at the end of a response (e.g. "This is general information, not legal advice..."). This is baked into the base model's training and not a refusal — the actual content is still generated in full.

Downloads

| File | Quant | Size | |------|-------|------| | Qwen3.5-9B-Uncensored-HauhauCS-Aggressive-BF16.gguf | BF16 | 17 GB | | Qwen3.5-9B-Uncensored-HauhauCS-Aggressive-Q8_0.gguf | Q8_0 | 8.9 GB | | Qwen3.5-9B-Uncensored-HauhauCS-Aggressive-Q6_K.gguf | Q6_K | 6.9 GB | | Qwen3.5-9B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf | Q4_K_M | 5.3 GB | | mmproj-Qwen3.5-9B-Uncensored-HauhauCS-Aggressive-BF16.gguf | Vision encoder | 880 MB |

Vision support: This model is natively multimodal. The mmproj file is the vision encoder — you need it alongside the main GGUF to use image/video inputs. Load both files in llama.cpp, LM Studio, or any compatible runtime.

Specs

  • 9B dense parameters, 32 layers
  • Hybrid architecture: Gated DeltaNet linear attention + full softmax attention (3:1 ratio)
  • 262K native context (extendable to 1M with YaRN)
  • Natively multimodal (text, image, video)
  • Multi-token prediction (MTP) support
  • 248K vocabulary, 201 languages
  • Based on Qwen3.5-9B

Recommended Settings

From the official Qwen authors:

Thinking mode (default):

  • temperature=0.6, top_p=0.95, top_k=20, min_p=0

Non-thinking mode:

  • temperature=0.7, top_p=0.8, top_k=20, min_p=0

Important:

  • Maintain at least 128K context to preserve thinking capabilities
  • For production/high-throughput: use vLLM, SGLang, or KTransformers

Note: This is a brand new architecture (released 2026-03-02). llama.cpp support landed very recently — make sure you're on a recent build. Works with llama.cpp, LM Studio, Jan, koboldcpp, etc.

Also check out the 4B variant and all releases at HauhauCS.

Usage

Works with llama.cpp, LM Studio, Jan, koboldcpp, etc.

Author: LuffyTheFox

Likes: 2

Downloads: 0

Tags: gguf, uncensored, qwen3.5, qwen, en, zh, multilingual, license:apache-2.0, endpoints_compatible, region:us, conversational

mlx-community/Nemotron-Cascade-2-30B-A3B-6bit


library_name: mlx license: other license_name: nvidia-open-model-license license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/ pipeline_tag: text-generation language:

  • en tags:
  • nvidia
  • nemotron-cascade-2
  • reasoning
  • general-purpose
  • SFT
  • RL
  • mlx base_model: nvidia/Nemotron-Cascade-2-30B-A3B

mlx-community/Nemotron-Cascade-2-30B-A3B-6bit

This model mlx-community/Nemotron-Cascade-2-30B-A3B-6bit was converted to MLX format from nvidia/Nemotron-Cascade-2-30B-A3B using mlx-lm version 0.31.2.

Use with mlx

pip install mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("mlx-community/Nemotron-Cascade-2-30B-A3B-6bit")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True, return_dict=False,
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)

Author: mlx-community

Likes: 2

Downloads: 0

Tags: mlx, safetensors, nemotron_h, nvidia, nemotron-cascade-2, reasoning, general-purpose, SFT, RL, text-generation, conversational, custom_code, en, base_model:nvidia/Nemotron-Cascade-2-30B-A3B, base_model:quantized:nvidia/Nemotron-Cascade-2-30B-A3B, license:other, 6-bit, region:us

wanglab/bioreason-pro-rl


license: apache-2.0 language:

  • en tags:
  • protein
  • gene-ontology
  • function-prediction
  • biology
  • bioinformatics
  • reasoning
  • reinforcement-learning
  • grpo datasets:
  • wanglab/bioreason-pro-rl-reasoning-data

<h1 align="center"> 🧬 BioReason-Pro<br>Advancing Protein Function Prediction with<br>Multimodal Biological Reasoning </h1> <p align="center"> <a href="https://www.biorxiv.org/content/10.64898/2026.03.19.712954v1" target="_blank"><img src="https://img.shields.io/badge/bioRxiv-2026.03.19.712954-FF6B6B?style=for-the-badge&logo=arxiv&logoColor=white" alt="bioRxiv"></a> <a href="https://github.com/bowang-lab/BioReason-Pro"><img src="https://img.shields.io/badge/GitHub-Code-4A90E2?style=for-the-badge&logo=github&logoColor=white" alt="GitHub"></a> <a href="https://bioreason.net"><img src="https://img.shields.io/badge/Website-Online-00B89E?style=for-the-badge&logo=internet-explorer&logoColor=white" alt="Website"></a> <a href="https://huggingface.co/collections/wanglab/bioreason-pro"><img src="https://img.shields.io/badge/HuggingFace-Models & Data-FFBF00?style=for-the-badge&logo=huggingface&logoColor=white" alt="HuggingFace"></a> </p> <br>

BioReason-Pro RL

Reinforcement learning (GRPO) optimized checkpoint of BioReason-Pro, a multimodal reasoning LLM for protein function prediction. This model builds on the SFT checkpoint and is further optimized through group relative policy optimization to improve reasoning quality and GO term prediction accuracy.

Training data: wanglab/bioreason-pro-rl-reasoning-data

See also:

Citation

If you find this work useful, please cite our papers:

@article {Fallahpour2026.03.19.712954,
    author = {Fallahpour, Adibvafa and Seyed-Ahmadi, Arman and Idehpour, Parsa and Ibrahim, Omar and Gupta, Purav and Naimer, Jack and Zhu, Kevin and Shah, Arnav and Ma, Shihao and Adduri, Abhinav and G{\"u}loglu, Talu and Liu, Nuo and Cui, Haotian and Jain, Arihant and de Castro, Max and Fallahpour, Amirfaham and Cembellin-Prieto, Antonio and Stiles, John S. and Nem{\v c}ko, Filip and Nevue, Alexander A. and Moon, Hyungseok C. and Sosnick, Lucas and Markham, Olivia and Duan, Haonan and Lee, Michelle Y. Y. and Salvador, Andrea F. M. and Maddison, Chris J. and Thaiss, Christoph A. and Ricci-Tam, Chiara and Plosky, Brian S. and Burke, Dave P. and Hsu, Patrick D. and Goodarzi, Hani and Wang, Bo},
    title = {BioReason-Pro: Advancing Protein Function Prediction with Multimodal Biological Reasoning},
    elocation-id = {2026.03.19.712954},
    year = {2026},
    doi = {10.64898/2026.03.19.712954},
    publisher = {Cold Spring Harbor Laboratory},
    URL = {https://www.biorxiv.org/content/early/2026/03/20/2026.03.19.712954},
    eprint = {https://www.biorxiv.org/content/early/2026/03/20/2026.03.19.712954.full.pdf},
    journal = {bioRxiv}
}

@misc{fallahpour2025bioreasonincentivizingmultimodalbiological,
      title={BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model}, 
      author={Adibvafa Fallahpour and Andrew Magnuson and Purav Gupta and Shihao Ma and Jack Naimer and Arnav Shah and Haonan Duan and Omar Ibrahim and Hani Goodarzi and Chris J. Maddison and Bo Wang},
      year={2025},
      eprint={2505.23579},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2505.23579}, 
}

Author: wanglab

Likes: 2

Downloads: 0

Tags: safetensors, qwen3, protein, gene-ontology, function-prediction, biology, bioinformatics, reasoning, reinforcement-learning, grpo, en, dataset:wanglab/bioreason-pro-rl-reasoning-data, arxiv:2505.23579, license:apache-2.0, region:us

wanglab/bioreason-pro-sft


license: apache-2.0 language:

  • en tags:
  • protein
  • gene-ontology
  • function-prediction
  • biology
  • bioinformatics
  • reasoning datasets:
  • wanglab/bioreason-pro-sft-reasoning-data

<h1 align="center"> 🧬 BioReason-Pro<br>Advancing Protein Function Prediction with<br>Multimodal Biological Reasoning </h1> <p align="center"> <a href="https://www.biorxiv.org/content/10.64898/2026.03.19.712954v1" target="_blank"><img src="https://img.shields.io/badge/bioRxiv-2026.03.19.712954-FF6B6B?style=for-the-badge&logo=arxiv&logoColor=white" alt="bioRxiv"></a> <a href="https://github.com/bowang-lab/BioReason-Pro"><img src="https://img.shields.io/badge/GitHub-Code-4A90E2?style=for-the-badge&logo=github&logoColor=white" alt="GitHub"></a> <a href="https://bioreason.net"><img src="https://img.shields.io/badge/Website-Online-00B89E?style=for-the-badge&logo=internet-explorer&logoColor=white" alt="Website"></a> <a href="https://huggingface.co/collections/wanglab/bioreason-pro"><img src="https://img.shields.io/badge/HuggingFace-Models & Data-FFBF00?style=for-the-badge&logo=huggingface&logoColor=white" alt="HuggingFace"></a> </p> <br>

BioReason-Pro SFT

Supervised fine-tuned (SFT) checkpoint of BioReason-Pro, a multimodal reasoning LLM for protein function prediction. This model integrates ESM3 protein embeddings, a GO graph encoder, and biological context (InterPro domains, STRING interactions) within a Qwen3-4B backbone to generate structured reasoning traces and functional annotations.

Training data: wanglab/bioreason-pro-sft-reasoning-data

See also:

Citation

If you find this work useful, please cite our papers:

@article {Fallahpour2026.03.19.712954,
    author = {Fallahpour, Adibvafa and Seyed-Ahmadi, Arman and Idehpour, Parsa and Ibrahim, Omar and Gupta, Purav and Naimer, Jack and Zhu, Kevin and Shah, Arnav and Ma, Shihao and Adduri, Abhinav and G{\"u}loglu, Talu and Liu, Nuo and Cui, Haotian and Jain, Arihant and de Castro, Max and Fallahpour, Amirfaham and Cembellin-Prieto, Antonio and Stiles, John S. and Nem{\v c}ko, Filip and Nevue, Alexander A. and Moon, Hyungseok C. and Sosnick, Lucas and Markham, Olivia and Duan, Haonan and Lee, Michelle Y. Y. and Salvador, Andrea F. M. and Maddison, Chris J. and Thaiss, Christoph A. and Ricci-Tam, Chiara and Plosky, Brian S. and Burke, Dave P. and Hsu, Patrick D. and Goodarzi, Hani and Wang, Bo},
    title = {BioReason-Pro: Advancing Protein Function Prediction with Multimodal Biological Reasoning},
    elocation-id = {2026.03.19.712954},
    year = {2026},
    doi = {10.64898/2026.03.19.712954},
    publisher = {Cold Spring Harbor Laboratory},
    URL = {https://www.biorxiv.org/content/early/2026/03/20/2026.03.19.712954},
    eprint = {https://www.biorxiv.org/content/early/2026/03/20/2026.03.19.712954.full.pdf},
    journal = {bioRxiv}
}

@misc{fallahpour2025bioreasonincentivizingmultimodalbiological,
      title={BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model}, 
      author={Adibvafa Fallahpour and Andrew Magnuson and Purav Gupta and Shihao Ma and Jack Naimer and Arnav Shah and Haonan Duan and Omar Ibrahim and Hani Goodarzi and Chris J. Maddison and Bo Wang},
      year={2025},
      eprint={2505.23579},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2505.23579}, 
}

Author: wanglab

Likes: 2

Downloads: 0

Tags: safetensors, qwen3, protein, gene-ontology, function-prediction, biology, bioinformatics, reasoning, en, dataset:wanglab/bioreason-pro-sft-reasoning-data, arxiv:2505.23579, license:apache-2.0, region:us