Todays AI Summary

AI Developments: LiquidAI's Edge-Optimized Model, Paper2Video, and More

Here's a look at some of the most interesting AI developments from today, covering new models and research papers.

Research Highlights

Several intriguing research papers have emerged:

  • TopInG: Topologically Interpretable Graph Learning via Persistent Rationale Filtration introduces a novel topological framework for improving the interpretability of Graph Neural Networks (GNNs). It uses persistent homology to identify rationale subgraphs, enhancing both predictive accuracy and interpretation quality.
  • Paper2Video: Automatic Video Generation from Scientific Papers presents a multi-agent framework for automatically generating presentation videos from research papers. The framework integrates slide generation, layout refinement, subtitling, speech synthesis, and talking-head rendering. The authors also introduce a new benchmark dataset and evaluation metrics.
  • From Noisy Traces to Stable Gradients: Bias-Variance Optimized Preference Optimization for Aligning Large Reasoning Models addresses the challenge of aligning large reasoning models (LRMs) with human preferences. The paper proposes a method called Bias-Variance Optimized Preference Optimization (BVPO) that mixes two gradient estimators to reduce variance and improve training stability.
  • Slm-mux: Orchestrating small language models for reasoning introduces a three-stage approach for orchestrating small language models (SLMs) to achieve higher accuracy than any individual model. The approach includes a multi-model architecture, model selection search, and test-time scaling.
  • Staircase Streaming for Low-Latency Multi-Agent Inference proposes a method to reduce the time to first token (TTFT) in multi-agent inference by generating the final response as soon as partial outputs from previous steps are received.

Model Spotlight

LiquidAI/LFM2-8B-A1B (56 Likes): Liquid AI has released LFM2-8B-A1B, a new generation of hybrid model designed for edge AI and on-device deployment. This Mixture of Experts (MoE) model features 8.3B total parameters (1.5B active). It is designed for edge deployment and is optimized for quality, speed, and memory efficiency. It supports multiple languages, including English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish. Benchmarks show strong performance in instruction following and math, with faster inference speeds compared to similar models.

Key Takeaways

  • Edge AI Optimization: LiquidAI's LFM2-8B-A1B model demonstrates a focus on efficient on-device deployment, balancing quality and speed for edge applications.
  • Interpretability in GNNs: The TopInG paper addresses a critical need for interpretability in GNNs, potentially broadening their adoption in decision-making processes.
  • Automated Video Generation: The Paper2Video framework offers a promising solution for automating the creation of academic presentation videos, saving researchers significant time and effort.
  • Preference Alignment: The BVPO paper tackles the challenge of aligning LRMs with human preferences, improving training stability and overall performance.
  • Model Orchestration: The Slm-mux paper demonstrates that SLMs can be effectively orchestrated into more accurate and efficient systems.
  • Low-Latency Inference: The Staircase Streaming paper proposes a method to reduce the time to first token (TTFT) in multi-agent inference.

AI Papers for 2026-03-14

The Latent Color Subspace: Emergent Order in High-Dimensional Chaos

Text-to-image generation models have advanced rapidly, yet achieving fine-grained control over generated images remains difficult, largely due to limited understanding of how semantic information is encoded. We develop an interpretation of the color representation in the Variational Autoencoder latent space of FLUX.1 [Dev], revealing a structure reflecting Hue, Saturation, and Lightness. We verify our Latent Color Subspace (LCS) interpretation by demonstrating that it can both predict and explicitly control color, introducing a fully training-free method in FLUX based solely on closed-form latent-space manipulation. Code is available at https://github.com/ExplainableML/LCS.

SciMDR: Benchmarking and Advancing Scientific Multimodal Document Reasoning

Constructing scientific multimodal document reasoning datasets for foundation model training involves an inherent trade-off among scale, faithfulness, and realism. To address this challenge, we introduce the synthesize-and-reground framework, a two-stage pipeline comprising: (1) Claim-Centric QA Synthesis, which generates faithful, isolated QA pairs and reasoning on focused segments, and (2) Document-Scale Regrounding, which programmatically re-embeds these pairs into full-document tasks to ensure realistic complexity. Using this framework, we construct SciMDR, a large-scale training dataset for cross-modal comprehension, comprising 300K QA pairs with explicit reasoning chains across 20K scientific papers. We further construct SciMDR-Eval, an expert-annotated benchmark to evaluate multimodal comprehension within full-length scientific workflows. Experiments demonstrate that models fine-tuned on SciMDR achieve significant improvements across multiple scientific QA benchmarks, particularly in those tasks requiring complex document-level reasoning.

Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training

Reasoning LLMs-as-Judges, which can benefit from inference-time scaling, provide a promising path for extending the success of reasoning models to non-verifiable domains where the output correctness/quality cannot be directly checked. However, while reasoning judges have shown better performance on static evaluation benchmarks, their effectiveness in actual policy training has not been systematically examined. Therefore, we conduct a rigorous study to investigate the actual impact of non-reasoning and reasoning judges in reinforcement-learning-based LLM alignment. Our controlled synthetic setting, where a "gold-standard" judge (gpt-oss-120b) provides preference annotations to train smaller judges, reveals key differences between non-reasoning and reasoning judges: non-reasoning judges lead to reward hacking easily, while reasoning judges can lead to policies that achieve strong performance when evaluated by the gold-standard judge. Interestingly, we find that the reasoning-judge-trained policies achieve such strong performance by learning to generate highly effective adversarial outputs that can also score well on popular benchmarks such as Arena-Hard by deceiving other LLM-judges. Combined with our further analysis, our study highlights both important findings and room for improvements for applying (reasoning) LLM-judges in non-verifiable LLM post-training.

Separable neural architectures as a primitive for unified predictive and generative intelligence

Intelligent systems across physics, language and perception often exhibit factorisable structure, yet are typically modelled by monolithic neural architectures that do not explicitly exploit this structure. The separable neural architecture (SNA) addresses this by formalising a representational class that unifies additive, quadratic and tensor-decomposed neural models. By constraining interaction order and tensor rank, SNAs impose a structural inductive bias that factorises high-dimensional mappings into low-arity components. Separability need not be a property of the system itself: it often emerges in the coordinates or representations through which the system is expressed. Crucially, this coordinate-aware formulation reveals a structural analogy between chaotic spatiotemporal dynamics and linguistic autoregression. By treating continuous physical states as smooth, separable embeddings, SNAs enable distributional modelling of chaotic systems. This approach mitigates the nonphysical drift characteristics of deterministic operators whilst remaining applicable to discrete sequences. The compositional versatility of this approach is demonstrated across four domains: autonomous waypoint navigation via reinforcement learning, inverse generation of multifunctional microstructures, distributional modelling of turbulent flow and neural language modelling. These results establish the separable neural architecture as a domain-agnostic primitive for predictive and generative intelligence, capable of unifying both deterministic and distributional representations.

Incremental Neural Network Verification via Learned Conflicts

Neural network verification is often used as a core component within larger analysis procedures, which generate sequences of closely related verification queries over the same network. In existing neural network verifiers, each query is typically solved independently, and information learned during previous runs is discarded, leading to repeated exploration of the same infeasible regions of the search space. In this work, we aim to expedite verification by reducing this redundancy. We propose an incremental verification technique that reuses learned conflicts across related verification queries. The technique can be added on top of any branch-and-bound-based neural network verifier. During verification, the verifier records conflicts corresponding to learned infeasible combinations of activation phases, and retains them across runs. We formalize a refinement relation between verification queries and show that conflicts learned for a query remain valid under refinement, enabling sound conflict inheritance. Inherited conflicts are handled using a SAT solver to perform consistency checks and propagation, allowing infeasible subproblems to be detected and pruned early during search. We implement the proposed technique in the Marabou verifier and evaluate it on three verification tasks: local robustness radius determination, verification with input splitting, and minimal sufficient feature set extraction. Our experiments show that incremental conflict reuse reduces verification effort and yields speedups of up to $1.9\times$ over a non-incremental baseline.

Security Considerations for Artificial Intelligence Agents

This article, a lightly adapted version of Perplexity's response to NIST/CAISI Request for Information 2025-0035, details our observations and recommendations concerning the security of frontier AI agents. These insights are informed by Perplexity's experience operating general-purpose agentic systems used by millions of users and thousands of enterprises in both controlled and open-world environments. Agent architectures change core assumptions around code-data separation, authority boundaries, and execution predictability, creating new confidentiality, integrity, and availability failure modes. We map principal attack surfaces across tools, connectors, hosting boundaries, and multi-agent coordination, with particular emphasis on indirect prompt injection, confused-deputy behavior, and cascading failures in long-running workflows. We then assess current defenses as a layered stack: input-level and model-level mitigations, sandboxed execution, and deterministic policy enforcement for high-consequence actions. Finally, we identify standards and research gaps, including adaptive security benchmarks, policy models for delegation and privilege control, and guidance for secure multi-agent system design aligned with NIST risk management principles.

Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights

Pretraining produces a learned parameter vector that is typically treated as a starting point for further iterative adaptation. In this work, we instead view the outcome of pretraining as a distribution over parameter vectors, whose support already contains task-specific experts. We show that in small models such expert solutions occupy a negligible fraction of the volume of this distribution, making their discovery reliant on structured optimization methods such as gradient descent. In contrast, in large, well-pretrained models the density of task-experts increases dramatically, so that diverse, task-improving specialists populate a substantial fraction of the neighborhood around the pretrained weights. Motivated by this perspective, we explore a simple, fully parallel post-training method that samples $N$ parameter perturbations at random, selects the top $K$, and ensembles predictions via majority vote. Despite its simplicity, this approach is competitive with standard post-training methods such as PPO, GRPO, and ES for contemporary large-scale models.

Sparking Scientific Creativity via LLM-Driven Interdisciplinary Inspiration

Despite interdisciplinary research leading to larger and longer-term impact, most work remains confined to single-domain academic silos. Recent AI-based approaches to scientific discovery show promise for interdisciplinary research, but many prioritize rapidly designing experiments and solutions, bypassing the exploratory, collaborative reasoning processes that drive creative interdisciplinary breakthroughs. As a result, prior efforts largely prioritize automating scientific discovery rather than augmenting the reasoning processes that underlie scientific disruption. We present Idea-Catalyst, a novel framework that systematically identifies interdisciplinary insights to support creative reasoning in both humans and large language models. Starting from an abstract research goal, Idea-Catalyst is designed to assist the brainstorming stage, explicitly avoiding premature anchoring on specific solutions. The framework embodies key metacognitive features of interdisciplinary reasoning: (a) defining and assessing research goals, (b) awareness of a domain's opportunities and unresolved challenges, and (c) strategic exploration of interdisciplinary ideas based on impact potential. Concretely, Idea-Catalyst decomposes an abstract goal (e.g., improving human-AI collaboration) into core target-domain research questions that guide the analysis of progress and open challenges within that domain. These challenges are reformulated as domain-agnostic conceptual problems, enabling retrieval from external disciplines (e.g., Psychology, Sociology) that address analogous issues. By synthesizing and recontextualizing insights from these domains back into the target domain, Idea-Catalyst ranks source domains by their interdisciplinary potential. Empirically, this targeted integration improves average novelty by 21% and insightfulness by 16%, while remaining grounded in the original research problem.

Portfolio of Solving Strategies in CEGAR-based Object Packing and Scheduling for Sequential 3D Printing

Computing power that used to be available only in supercomputers decades ago especially their parallelism is currently available in standard personal computer CPUs even in CPUs for mobile telephones. We show how to effectively utilize the computing power of modern multi-core personal computer CPU to solve the complex combinatorial problem of object arrangement and scheduling for sequential 3D printing. We achieved this by parallelizing the existing CEGAR-SEQ algorithm that solves the sequential object arrangement and scheduling by expressing it as a linear arithmetic formula which is then solved by a technique inspired by counterexample guided abstraction refinement (CEGAR). The original CEGAR-SEQ algorithm uses an object arrangement strategy that places objects towards the center of the printing plate. We propose alternative object arrangement strategies such as placing objects towards a corner of the printing plate and scheduling objects according to their height. Our parallelization is done at the high-level where we execute the CEGAR-SEQ algorithm in parallel with a portfolio of object arrangement strategies, an algorithm is called Porfolio-CEGAR-SEQ. Our experimental evaluation indicates that Porfolio-CEGAR-SEQ outperforms the original CEGAR-SEQ. When a batch of objects for multiple printing plates is scheduled, Portfolio-CEGAR-SEQ often uses fewer printing plates than CEGAR-SEQ.

RDNet: Region Proportion-Aware Dynamic Adaptive Salient Object Detection Network in Optical Remote Sensing Images

Salient object detection (SOD) in remote sensing images faces significant challenges due to large variations in object sizes, the computational cost of self-attention mechanisms, and the limitations of CNN-based extractors in capturing global context and long-range dependencies. Existing methods that rely on fixed convolution kernels often struggle to adapt to diverse object scales, leading to detail loss or irrelevant feature aggregation. To address these issues, this work aims to enhance robustness to scale variations and achieve precise object localization. We propose the Region Proportion-Aware Dynamic Adaptive Salient Object Detection Network (RDNet), which replaces the CNN backbone with the SwinTransformer for global context modeling and introduces three key modules: (1) the Dynamic Adaptive Detail-aware (DAD) module, which applies varied convolution kernels guided by object region proportions; (2) the Frequency-matching Context Enhancement (FCE) module, which enriches contextual information through wavelet interactions and attention; and (3) the Region Proportion-aware Localization (RPL) module, which employs cross-attention to highlight semantic details and integrates a Proportion Guidance (PG) block to assist the DAD module. By combining these modules, RDNet achieves robustness against scale variations and accurate localization, delivering superior detection performance compared with state-of-the-art methods.

AI Models

LocoreMind/LocoTrainer-4B


library_name: transformers license: mit base_model: Qwen/Qwen3-4B-Instruct-2507 tags:

  • code
  • agent
  • tool-calling
  • distillation
  • qwen3
  • ms-swift
  • codebase-analysis language:
  • en pipeline_tag: text-generation

<div align="center"> <img src="assets/locotrainer.png" width="55%" alt="LocoTrainer" /> </div> <br> <div align="center">

PyPI MODEL GGUF Colab GitHub

</div>

Introduction

LocoTrainer-4B is a 4B-parameter MS-SWIFT domain expert agent trained via knowledge distillation from Qwen3-Coder-Next. Unlike general-purpose code agents, it combines multi-turn tool-calling with deep MS-SWIFT framework knowledge — enabling it to analyze codebases and generate comprehensive markdown reports without a separate reasoning model.

| | LocoTrainer-4B | |:--|:--| | Base Model | Qwen3-4B-Instruct-2507 | | Teacher Model | Qwen3-Coder-Next | | Training Method | Full-parameter SFT (distillation) | | Training Data | 361,830 samples (agent trajectory + MS-SWIFT knowledge + project paths) | | Max Sequence Length | 32,768 tokens | | Training Hardware | 8x NVIDIA H100 80GB | | Training Time | ~25 hours | | Framework | MS-SWIFT |

Key Features

  • MS-SWIFT Domain Expert: Trained on MS-SWIFT documentation, CLI parameters, and project structure paths — answers framework questions accurately
  • Tool-Calling Agent: Generates structured <tool_call> JSON for Read, Grep, Glob, Bash, and Write tools
  • End-to-End Reports: From a single question to a complete, well-structured markdown analysis report
  • Long Context: 32K training covers 90% of long-context analysis scenarios
  • Local Deployment: GGUF quantized version available for zero API cost inference

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "LocoreMind/LocoTrainer-4B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

messages = [
    {
        "role": "system",
        "content": "You are Claude Code, Anthropic's official CLI for Claude.\n\nYou are an interactive agent that helps users with software engineering tasks.\n\nCRITICAL CONSTRAINTS:\n1. ALWAYS use absolute file paths in tool calls.\n2. EFFICIENCY: Use multiple tool calls to explore the codebase.\n3. OUTPUT: Save your findings as a well-structured markdown document.\n\nENV: Working directory is /Users/developer/workspace (macOS, zsh)."
    },
    {
        "role": "user",
        "content": "What are the default LoRA settings in ms-swift?\n\nAnalyze the codebase at /Users/developer/workspace/ms-swift and save your findings as a well-structured markdown document to /Users/developer/workspace/output/output.md."
    }
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=1024,
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

content = tokenizer.decode(output_ids, skip_special_tokens=True)
print(content)

LocoTrainer Framework

LocoTrainer-4B is designed to run inside the LocoTrainer agent framework, which handles the full agent loop — tool execution, multi-turn conversation, and report generation.

pip install locotrainer

locotrainer run -q "What are the default LoRA settings in ms-swift?"
# → output/output.md

For full setup and usage, refer to the GitHub repository.

Training Details

| Parameter | Value | |:----------|:------| | Base model | Qwen3-4B-Instruct-2507 | | Teacher model | Qwen3-Coder-Next | | Method | Full-parameter SFT | | Training data | 361,830 samples | | Data composition | Agent trajectory + MS-SWIFT knowledge + project structure paths | | Hardware | 8x NVIDIA H100 80GB | | DeepSpeed | ZeRO-2 | | Precision | BF16 | | Epochs | 1 | | Max sequence length | 32,768 tokens | | Attention | Flash Attention 2 | | Kernel optimization | Liger Kernel | | Learning rate | 1e-5, warmup ratio 0.05 | | Batch size | 1/GPU, gradient accumulation 4 (effective batch 32) | | Template | qwen3_nothinking | | Framework | MS-SWIFT | | Training time | ~25 hours |

Known Limitations

  • Specialized for MS-SWIFT; performance on unrelated codebases is untested
  • 4B parameters — complex multi-hop reasoning may require a larger model
  • MS-SWIFT project structure knowledge reflects the training data snapshot; may drift as the framework evolves

License

MIT

Acknowledgments

  • Qwen Team for the Qwen3-4B-Instruct-2507 base model
  • MS-SWIFT for the training framework and the codebase this model specializes in
  • llama.cpp for efficient local inference
  • Anthropic for the Claude Code agent loop design that inspired this work

Author: LocoreMind

Likes: 99

Downloads: 0

Tags: transformers, safetensors, qwen3, text-generation, code, agent, tool-calling, distillation, ms-swift, codebase-analysis, conversational, en, base_model:Qwen/Qwen3-4B-Instruct-2507, base_model:finetune:Qwen/Qwen3-4B-Instruct-2507, license:mit, text-generation-inference, endpoints_compatible, region:us

Nekofantasia/Nekofantasia-alpha


license: other license_name: stabilityai-ai-community license_link: LICENSE base_model:

  • stabilityai/stable-diffusion-3.5-medium pipeline_tag: text-to-image tags:
  • anime
  • rectified-flow
  • mmdit
  • sd3.5

<style> .neko-gallery { border-collapse: collapse; border: none; width: 100%; max-width: 820px; margin: 0 auto; } .neko-gallery td { border: none; padding: 3px; vertical-align: top; } .neko-img-wrap { margin-bottom: 3px; } .neko-img-wrap:last-child { margin-bottom: 0; } .neko-img { width: 100%; border-radius: 6px; display: block; } </style> <div align="center"> <table class="custom-table"> <tbody><tr> <td> <div class="custom-image-container"> <img class="custom-image" src="https://cdn-uploads.huggingface.co/production/uploads/69b045e9b54e9a17b9005290/N585RThs4AQ7zu4PoJ21M.png" alt="sample1"> </div> <div class="custom-image-container"> <img class="custom-image" src="https://cdn-uploads.huggingface.co/production/uploads/69b045e9b54e9a17b9005290/NHvVkpaFRzzdQAL3Mmeau.png" alt="sample4"> </div> </td> <td> <div class="custom-image-container"> <img class="custom-image" src="https://cdn-uploads.huggingface.co/production/uploads/69b045e9b54e9a17b9005290/gNIJHkisxVF7LFAa498q9.png" alt="sample2"> </div> <div class="custom-image-container"> <img class="custom-image" src="https://cdn-uploads.huggingface.co/production/uploads/69b045e9b54e9a17b9005290/tSiOB-2bxwlsHz8G3wTUO.png" alt="sample3"> </div> </td> <td> <div class="custom-image-container"> <img class="custom-image" src="https://cdn-uploads.huggingface.co/production/uploads/69b045e9b54e9a17b9005290/WfCbOgFyMPfYw5sqIXBTz.png" alt="sample1"> </div> <div class="custom-image-container"> <img class="custom-image" src="https://cdn-uploads.huggingface.co/production/uploads/69b045e9b54e9a17b9005290/2r4LcUgMYe8-9-DUuzMC2.png" alt="sample4"> </div> </td> </tr> </tbody></table> <h1>Nekofantasia 0.1</h1> <h3>The first Rectified Flow diffusion model for anime art generation</h3>

<a href="https://discord.gg/cG2vNCfkP4"><img src="https://discord.com/api/guilds/1477540390960631953/widget.png?style=banner3" alt="Discord"></a> <br> <a href="https://www.patreon.com/nekofantasia"><img src="https://img.shields.io/badge/Patreon-Support_Us-FF424D?style=for-the-badge&logo=patreon&logoColor=white" alt="Patreon"></a>

</div>

Overview

Nekofantasia is the first-ever diffusion model for anime art generation built on Rectified Flow technology, based on the cutting-edge Stable Diffusion 3.5 Medium architecture. Our training dataset currently consists of 4 million high-quality anime artworks — and in another first, every single one of them was personally reviewed and hand-picked by the Nekofantasia team over the course of two years. We took this painstaking approach because every automated image scoring method out there, relied on by most anime AI developers, is wildly unreliable — it bulk-deletes valuable training images while keeping stuff that clearly should have been tossed, dragging down dataset and final model quality to an unacceptable degree.

The goal of Project Nekofantasia: to break through the stagnation of community-driven, uncensored anime models — which have largely been stuck on outdated tech and methods — and build the best free anime art generation model out there, by tackling a whole range of systemic technical issues that plague existing models: the use of Adafactor instead of full AdamW, fp16 instead of bf16, automated aesthetic filtering (or no filtering at all) instead of manual curation, small datasets, legacy architectures and training mistakes.

The ultimate goal: to eventually arrive not just at the best existing anime model, but at the definitive, ultimate anime model — one whose output is virtually indistinguishable from real high-quality artwork.

<div align="center">

⚠️ATTENTION: Nekofantasia 0.1 is an early preview release that has NOT completed full training due to funding constraints. It has hit the quality bar we expected at this stage, but it's not yet capable of a lot of things it will undoubtedly be capable of with further training. This is exactly the outcome we've spent years working toward — painstakingly assembling a dataset by hand, tracking down and fixing countless issues that had been consistently degrading one model after another, and running experiments, including expensive ones.

However, making serious progress from here is simply not possible without your help. If you're willing and able to financially support us and make a real contribution to the advancement of free anime models, we'd be grateful for any donation via the addresses at the bottom of this page or through Patreon. We don't have wealthy backers or corporate funding — our only hope is voluntary community support. That's why we're releasing this version now: as proof that we're dead serious, so we can earn your support.⚠️

</div>

Why Rectified Flow + MMDiT Matters

Using a Flow-based model with the MMDiT architecture has already fixed — or will fix in future releases — virtually all the shortcomings of other models, including:

1. No more "plasticky" look

The telltale "plasticky," cookie-cutter look inherent to EPS-prediction models is already gone in the current version. EPS-prediction models are fundamentally unable to reliably recover the DC component of an image (overall tone and brightness) from a noisy signal. In practice, this means there's no "sweet spot" for CFG: low values give you washed-out, faded colors; high values give you oversaturated, eye-searing neon. A specific case of this is EPS models' inability to render scenes with extreme lighting — night scenes collapse into dark blue, bright scenes lose saturation. V-prediction partially addresses this; Rectified Flow solves it completely. This is an inherent limitation of the method, not a tuning issue. Rectified Flow fundamentally resolves it by predicting velocity instead of noise. On top of that, using full AdamW instead of Adafactor preserves per-element adaptivity of the second moment (which Adafactor loses due to factorization), allowing the model to pick up on finer stylistic nuances. Bf16 mixed precision instead of fp16 provides greater dynamic range and training stability.

2. Better stability and more efficient GPU compute usage than EPS and V-prediction

V-prediction, used as a partial fix for EPS issues, introduces its own instability. Meanwhile, EPS training costs are higher than they need to be — something that can be avoided thanks to Rectified Flow's smoother loss landscape.

3. 16-channel VAE for better fine detail

The 16-channel VAE provides significantly more accurate reconstruction of fine spatial details compared to the 4-channel VAEs of previous generations, which should substantially improve rendering of complex elements down the line (fingers, eyes, clothing details).


Installing and Running

  • Nekofantasia-01.safetensors goes in ComfyUI/models/checkpoints.
  • All three text encoder files (t5xxl_fp16.safetensors, clip_l.safetensors, clip_g.safetensors) go in ComfyUI/models/text_encoders.
  • Install RK-samplers node: https://github.com/memmaptensor/ComfyUI-RK-Sampler#installation
  • Download the recommended ComfyUI workflow

Recommended Prompt Structure

Nekofantasia does not use artificial quality tags (masterpiece, best quality, etc.) — low-quality images were NEVER part of the training data.

Tag order doesn't matter much when building your prompt, since tag shuffling was used during training, and the model architecture has significantly less dependence on tag order than UNet-based models.

Since the current 0.1 version hasn't completed full training, long, detailed prompts will give you the best results.

Don't use underscores (_) in tags. Separate tags with commas.

We recommend using highres and absurdres, and it helps to include booru safety tags (general, sensitive, questionable, explicit).

Example Prompt

1girl, absurdres, animal ears, bow, braid, cat ears, dress, green dress, hair bow, highres, kaenbyou rin, long hair, long sleeves, looking at viewer, nekomata, oil painting (medium), painting (medium), portrait, red eyes, red hair, red ribbon, neck ribbon, smile, solo, touhou, traditional media, twin braids

Recommended Negative Prompt

lowres, pixelated, downscaled, upscaled, jpeg artifacts, compression artifacts, scan artifacts, blurry, censored, bar censor, mosaic censoring, heart censor, bad anatomy, bad hands, bad feet, extra digits, fewer digits, watermark, text, dated, watermark grid, sample watermark, artist name

Sampling Settings

Due to instability in the model's vector field at this stage, higher-order samplers are recommended for producing clean images — ideally Runge-Kutta methods.

| Setting | Recommendation | |---------|---------------| | CFG Scale | 3–7 (feel free to go higher if needed) | | Sampler | Dopri5 (recommended) or Bosh3 | | Steps | N/A — Dopri5 is an adaptive sampler that automatically determines the number and size of its steps |

Note: There's no benefit to using even higher-order samplers. Bosh3 is also a solid choice, but despite being 3rd-order vs. 5th, it won't actually be faster. In theory, you can generate with simpler samplers like Euler or Heun, but they'll require a lot of steps, and even then you'll likely only save time at the cost of reduced stability and the occasional third arm.

Recommended Resolutions

Target ~1MP, all sides divisible by 64. Portrait (vertical) images currently tend to produce better quality than landscape.

| Orientation | Dimensions | Aspect Ratio | |-------------|------------|--------------| | Square | 1024 × 1024 | 1:1 | | Portrait | 896 × 1152 | 7:9 | | | 832 × 1216 | 2:3 | | | 768 × 1344 | 4:7 | | | 640 × 1536 | 5:12 | | Landscape | 1152 × 896 | 9:7 | | | 1216 × 832 | 3:2 | | | 1344 × 768 | 7:4 | | | 1536 × 640 | 12:5 |


Sources

Various anime imageboards, personal blogs, Patreon, Pixiv, and game CGs extracted from RPA files of various visual novels. Using a custom-trained neural network combined with manual review, we removed a large number of AI-generated and AI-assisted artworks uploaded to booru sites without proper tagging — which posed a real risk of degrading model quality.

Data Handling

Unlike the approach taken by virtually every previous anime model, images were subjected to minimal lossy compression (only INTER_AREA for bucket resizing to training-compatible dimensions — we plan to eliminate even this down the road), with no WebP or JPG recompression beyond whatever the original artists already applied when uploading. Dataset collection for the current version was completed in February 2026 and likely covers virtually every character with even moderate popularity.

Training Details

| Parameter | Value | |-----------|-------| | Optimizer | AdamW | | Scheduler | Constant with warmup | | Effective batch size | 176 | | Mixed precision | bf16 (not full bf16) | | Hardware | 8× H100 SXM (~24 hours, 194 GPU-hours) |

Text encoders were not trained. In the MMDiT architecture, caption-to-image interaction happens via JointAttention within the model itself, making text encoder fine-tuning a waste of GPU compute. Training text encoders through diffusion loss is not an effective approach.


Known Limitations

  • Has significant issues with fingers and fine details
  • Hasn't yet learned characters or many uncommon tags
  • Only a small fraction of the model's potential has been realized at this stage. However, it has already oriented toward the anime style and doesn't exhibit the "smeared" anime rendering look that's typical of many models and the original SD 3.5
  • Character tags currently have almost no effect on generation
  • In rare cases with uncommon or missing booru tags, photorealistic style bleed-through may occur
  • Due to the extremely aggressive safety filters baked into base SD 3.5, NSFW content generation is currently almost impossible. However, unlike the base model, Nekofantasia 0.1 can already properly render bare breasts — which means that with further training, StabilityAI's censorship (which turned a lot of people off from their most modern model) can likely be fully overcome

Recommended Settings for LoRA

  • Optimizer: adamw8bit/adamw (scheduler-free optimizers tend to significantly underestimate step size. Prodigy with d=3 and safe_warmup can be a decent option)
  • Network dim: 32-64
  • Network Alpha: =dim
  • network_train_unet_only: true
  • Effective batch size: 4-12
  • LR warmup steps: 100-200
  • LR: 4e-4/2e-4
  • training_shift: 1.0
  • Weighting_scheme: logit_normal
  • network_module: networks.lora_sd3
  • Note on methods: Lokr/locon/i3/lycoris and other methods are either fundamentally incompatible with transformer architecture or require significant fixes; using standard LoRA is recommended.
  • Numerical stability: To ensure numerical stability, it's recommended to avoid fp16 and use bf16 instead.

Changelog

v0.1 — Trained on 1/3 of the full dataset. Initial release — 2026.03.13.

Roadmap

1.0 Medium Release

2–3 epochs of training. Knowledge of virtually all moderately popular characters (2k+ artworks on Danbooru). Proper limb generation. Various artist styles. Quality competitive with commercial anime generators. Potentially the best free general-purpose anime art model, with knowledge of nearly all characters and styles without needing LoRAs.

Estimated cost: $1,200–$2,600

VAE Decoder Fine-tuning

To fully eliminate detail noise and potential VAE artifacts. Decoder training is much simpler and will be stopped as soon as PSNR reaches near-lossless levels — likely less than one epoch, possibly less than half.

Estimated cost: $50–$600

Reference-like Feature (similar to NAI)

The ability to feed the model a single image as a style, subject, or character reference. (This may not be implemented before the next milestone, since the Medium model has quality ceilings below what the community deserves.)

Estimated cost: TBD — further research needed

1.0 on Stable Diffusion 3.5 Large (8B)

New large model with a greatly expanded dataset. Maximum anime art generation quality achievable on the 8B MMDiT architecture: correct multi-figure composition, style blending, narrative scene generation, and minimization of typical AI artifacts.

Estimated cost: $4,000–$10,000 (potentially several times higher if we prioritize quality and increase resolution to 2.1MP)

IP-Adapter for the Large Model

Adding a reference feature for transferring subject, style, or character from a donor image into the generation.

Estimated cost: Slightly less than the previous item

⚠️All estimates are based on extrapolation from current results and may be adjusted, since a model of this type has never been trained before.


Acknowledgments

StabilityAI — For creating such a fantastic model architecture that has, unfortunately, gotten far less attention from the community than it deserves. SD 3.5 represents a major leap forward in diffusion model architecture, and we hope to showcase its potential for high-quality anime art generation.

Kohya_ss — For building the training scripts that made this possible.

You, personally — For taking the time to check out this model, reading through all of this, and hopefully generating some images and showing your support.


<div align="center">

Please Support Us

Your donations will go towards improving quality — so that everyone can freely create beautiful, diverse anime art in any subject and any style, without censorship, per-generation fees, or dependence on corporates.

You can support us via cryptocurrency or through Patreon. Crypto is preferred, since we receive 100% of your donation that way — no taxes and excessive fees.

Donation Addresses

| Currency | Address | |----------|---------| | BTC | bc1q8g902k9gcstrtc543q849tzmeezta9t5j6jc43 | | XMR | 42aMKZ1ZPNJDMxjEMMYTs3PPbAxcZqfJnNfMS361gX4mdjMefc4rUBSHxAFCLmryi5WH2TVUPMiL2Ho7ZGn6iEjwBxXhKDu | | ETH / BNB / EVM-compatible chains / any tokens | 0xeb8390f51431EBDc4332D43568EeCe4888dDAe53 | | TRX / any TRC20 tokens | TEZJetBdbEbL239Z91QJSh9zN5ggcFTuEu | | ZEC | t1ZChGuaPDJJAVUXjWywRpuzHU3FRe6iis1 | | DASH | XdHYPfECKVs3qu65r35h5vA2pa9XcQNAap | | LTC | ltc1qjfsgnmueylc7j2uhpp7u2rey08me5nylvgfwzf | | SOL | 3vEKkYNxZYcEcxRrEMJdbXijBjpNcJJhBXtJtp6ojWuE | | BCH | qpal09f5cky3g0yjs48tv5xl9k6zhz0ldcpa673peu |

If you'd prefer a donation method that isn't listed here, reach out to us on Discord.

As a thank-you for donating, you'll receive:

  • Access to a private donors-only channel on Discord.
  • Early access to preview and release builds of future models.
  • Your feedback and suggestions will get significantly more attention.
  • Possibly additional exclusive perks we haven't thought of yet.

Discord

<a href="https://www.patreon.com/nekofantasia"><img src="https://img.shields.io/badge/Support_on_Patreon-FF424D?style=for-the-badge&logo=patreon&logoColor=white" height="50" alt="Patreon"></a>

</div> <img class="neko-img" src="https://cdn-uploads.huggingface.co/production/uploads/69b045e9b54e9a17b9005290/muLdf3-77Oj6LD_zPouYi.png" alt="sample3">

Author: Nekofantasia

Likes: 5

Downloads: 0

Tags: anime, rectified-flow, mmdit, sd3.5, text-to-image, base_model:stabilityai/stable-diffusion-3.5-medium, base_model:finetune:stabilityai/stable-diffusion-3.5-medium, license:other, region:us

Zero-Point-AI/MARTHA-4B


language:

  • en license: apache-2.0 base_model:
  • Qwen/Qwen3.5-4B tags:
  • qwen3.5
  • gguf
  • local

MARTHA-4B — Qwen3.5-4B

Zero Point AI — Intelligence From The Void

No fluff. No guardrails. Just raw intelligence from the void.

Full fine-tune of Qwen3.5-4B by Alibaba Cloud. Built for local hardware. Released free.

Credits

License

Apache 2.0

Author: Zero-Point-AI

Likes: 4

Downloads: 0

Tags: gguf, qwen3.5, local, en, base_model:Qwen/Qwen3.5-4B, base_model:quantized:Qwen/Qwen3.5-4B, license:apache-2.0, endpoints_compatible, region:us, conversational

Zero-Point-AI/MARTHA-0.8B

MARTHA-0.8B

Zero Point AI - Intelligence From The Void

No fluff. No guardrails. Just raw intelligence from the void.

Full fine-tune of Qwen3.5-0.8B. Built for local hardware. Released free.

Credits

  • Base: Qwen/Qwen3.5-0.8B by Alibaba Cloud
  • Release: Zero Point AI

License

Apache 2.0

Author: Zero-Point-AI

Likes: 4

Downloads: 0

Tags: gguf, endpoints_compatible, region:us, conversational

Zero-Point-AI/MARTHA-27B

MARTHA-27B

Zero Point AI - Intelligence From The Void

No fluff. No guardrails. Just raw intelligence from the void.

Full fine-tune of Qwen3.5-27B. Built for local hardware. Released free.

Credits

License

Apache 2.0

Author: Zero-Point-AI

Likes: 3

Downloads: 0

Tags: gguf, endpoints_compatible, region:us, conversational

Zero-Point-AI/MARTHA-2B

MARTHA-2B

Zero Point AI - Intelligence From The Void

No fluff. No guardrails. Just raw intelligence from the void.

Full fine-tune of Qwen3.5-2B. Built for local hardware. Released free.

Credits

License

Apache 2.0

Author: Zero-Point-AI

Likes: 3

Downloads: 0

Tags: gguf, endpoints_compatible, region:us, conversational

mradermacher/Qwen3.5-27B-Guardpoint-GGUF


base_model: ValiantLabs/Qwen3.5-27B-Guardpoint datasets:

  • sequelbox/Superpotion-DeepSeek-V3.2-Speciale language:
  • en library_name: transformers license: apache-2.0 mradermacher: readme_rev: 1 quantized_by: mradermacher tags:
  • guardpoint
  • valiant
  • valiant-labs
  • qwen
  • qwen-3.5
  • qwen-3.5-27b
  • 27b
  • reasoning
  • science
  • science-reasoning
  • medicine
  • internal-medicine
  • clinical-diagnosis
  • medical-understanding
  • medical-reasoning
  • medical-diagnosis
  • medical-management
  • problem-solving
  • anatomy
  • angiology
  • bariatric
  • cardiovascular
  • dental
  • dermatology
  • endocrinology
  • ENT
  • hematology
  • immunology
  • infectious-disease
  • musculoskeletal
  • neurology
  • obstetrics
  • ophtamology
  • oncology
  • orthopedics
  • pathology
  • psychiatry
  • pulmonology
  • radiology
  • surgery
  • triage
  • urology
  • analytical
  • data
  • data-interpretation
  • expert
  • rationality
  • conversational
  • chat
  • instruct

About

<!-- ### quantize_version: 2 --> <!-- ### output_tensor_quantised: 1 --> <!-- ### convert_type: hf --> <!-- ### vocab_type: --> <!-- ### tags: --> <!-- ### quants: x-f16 Q4_K_S Q2_K Q8_0 Q6_K Q3_K_M Q3_K_S Q3_K_L Q4_K_M Q5_K_S Q5_K_M IQ4_XS --> <!-- ### quants_skip: --> <!-- ### skip_mmproj: -->

static quants of https://huggingface.co/ValiantLabs/Qwen3.5-27B-Guardpoint

<!-- provided-files -->

For a convenient overview and download list, visit our model page for this model.

weighted/imatrix quants are available at https://huggingface.co/mradermacher/Qwen3.5-27B-Guardpoint-i1-GGUF

Usage

If you are unsure how to use GGUF files, refer to one of TheBloke's READMEs for more details, including on how to concatenate multi-part files.

Provided Quants

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

| Link | Type | Size/GB | Notes | |:-----|:-----|--------:|:------| | GGUF | Q2_K | 10.8 | | | GGUF | Q3_K_S | 12.2 | | | GGUF | Q3_K_M | 13.4 | lower quality | | GGUF | Q3_K_L | 14.4 | | | GGUF | IQ4_XS | 15.3 | | | GGUF | Q4_K_S | 15.7 | fast, recommended | | GGUF | Q4_K_M | 16.6 | fast, recommended | | GGUF | Q5_K_S | 18.8 | | | GGUF | Q5_K_M | 19.3 | | | GGUF | Q6_K | 22.2 | very good quality | | GGUF | Q8_0 | 28.7 | fast, best quality |

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

image.png

And here are Artefact2's thoughts on the matter: https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9

FAQ / Model Request

See https://huggingface.co/mradermacher/model_requests for some answers to questions you might have and/or if you want some other model quantized.

Thanks

I thank my company, nethype GmbH, for letting me use its servers and providing upgrades to my workstation to enable this work in my free time.

<!-- end -->

Author: mradermacher

Likes: 2

Downloads: 578

Tags: transformers, gguf, guardpoint, valiant, valiant-labs, qwen, qwen-3.5, qwen-3.5-27b, 27b, reasoning, science, science-reasoning, medicine, internal-medicine, clinical-diagnosis, medical-understanding, medical-reasoning, medical-diagnosis, medical-management, problem-solving, anatomy, angiology, bariatric, cardiovascular, dental, dermatology, endocrinology, ENT, hematology, immunology, infectious-disease, musculoskeletal, neurology, obstetrics, ophtamology, oncology, orthopedics, pathology, psychiatry, pulmonology, radiology, surgery, triage, urology, analytical, data, data-interpretation, expert, rationality, conversational, chat, instruct, en, dataset:sequelbox/Superpotion-DeepSeek-V3.2-Speciale, base_model:ValiantLabs/Qwen3.5-27B-Guardpoint, base_model:quantized:ValiantLabs/Qwen3.5-27B-Guardpoint, license:apache-2.0, endpoints_compatible, region:us

BAAI/Motion_Dynamic_Model


license: apache-2.0

Author: BAAI

Likes: 2

Downloads: 0

Tags: license:apache-2.0, region:us

BAAI/RoboBrain-Dex


license: apache-2.0

Author: BAAI

Likes: 2

Downloads: 0

Tags: safetensors, dexvla, license:apache-2.0, region:us

dataslab/DLM-2.0-14B-FP8


base_model: dnotitia/DNA-2.0-14B library_name: transformers pipeline_tag: text-generation license: apache-2.0 tags:

  • fp8
  • quantized
  • vllm
  • qwen3
  • text-generation
  • conversational
  • compressed-tensors
  • llmcompressor language:
  • ko
  • en
  • multilingual model_creator: dnotitia quantized_by: dataslab

DLM-2.0-14B-FP8

Overview

This is an FP8-quantized version of dnotitia/DNA-2.0-14B, optimized for efficient inference by DLM (Data Science Lab., Ltd.).

FP8 (8-bit floating point) quantization with static per-tensor scaling reduces model size by approximately 35% while maintaining near-original accuracy. Fully compatible with vLLM for high-throughput production serving.

Model Details

| Attribute | Value | |---|---| | Base Model | dnotitia/DNA-2.0-14B | | Architecture | Qwen3ForCausalLM | | Parameters | ~14B | | Quantization | FP8 W8A8 (Static Per-Tensor) | | Quantization Tool | llm-compressor | | Calibration Data | HuggingFaceH4/ultrachat_200k (512 samples) | | Model Size | ~19 GB (vs ~30 GB in BF16) | | Context Length | 32K native / up to 131K with YaRN | | Vocabulary | 151,936 tokens | | License | Apache 2.0 | | Quantized By | DLM (Data Science Lab., Ltd.) |

Quantization Details

  • Method: Static FP8 quantization via llm-compressor oneshot
  • Precision: FP8_E4M3 for weights, FP8_E4M3 for input activations
  • Strategy: Per-tensor symmetric scaling with MinMax observer
  • Calibration: 512 samples from HuggingFaceH4/ultrachat_200k (train_sft split), max sequence length 2048
  • Format: compressed-tensors (safetensors)
  • Preserved layers: lm_head kept in full precision (BF16)
  • Targets: All Linear layers (except lm_head)

Usage

vLLM (Recommended)

vllm serve dataslab/DNA-2.0-14B-FP8 \
  --dtype auto \
  --max-model-len 32768 \
  --enable-reasoning \
  --reasoning-parser deepseek_r1

Extended context (up to 131K with YaRN):

vllm serve dataslab/DNA-2.0-14B-FP8 \
  --dtype auto \
  --rope-scaling '{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}' \
  --max-model-len 131072 \
  --enable-reasoning \
  --reasoning-parser deepseek_r1

Python (vLLM)

from vllm import LLM, SamplingParams

llm = LLM(model="dataslab/DNA-2.0-14B-FP8")
sampling_params = SamplingParams(
    temperature=0.6, top_p=0.95, top_k=20, max_tokens=4096
)

messages = [
    {"role": "user", "content": "한국의 경제 발전 과정에 대해 설명해주세요."}
]
outputs = llm.chat(messages, sampling_params=sampling_params)
print(outputs[0].outputs[0].text)

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("dataslab/DNA-2.0-14B-FP8")
model = AutoModelForCausalLM.from_pretrained(
    "dataslab/DNA-2.0-14B-FP8",
    device_map="auto",
)

messages = [
    {"role": "user", "content": "복잡한 윤리적 딜레마에 대해 다각도로 분석해줘."}
]
inputs = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, return_dict=True, return_tensors="pt"
).to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=4096,
    temperature=0.6,
    top_p=0.95,
    top_k=20,
    do_sample=True,
)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

Dynamic Thinking Mode

This model inherits DNA 2.0's dynamic thinking capability:

  • Thinking mode: Add /think to enable detailed step-by-step reasoning (temperature=0.6)
  • Non-thinking mode: Add /no_think for concise, direct responses (temperature=0.7)

Base Model

DNA 2.0 is developed by Dnotitia Inc. and features:

  • Smoothie Qwen3 foundation with balanced multilingual optimization
  • Uncensored reasoning training for objective, unbiased responses
  • Advanced RL post-training for enhanced mathematical reasoning and Korean language capabilities

For more details, see the arXiv paper (2507.05686).

License

Apache 2.0 — Same as the base model.


Quantized and released by DLM (Data Science Lab., Ltd.)HuggingFace

Author: dataslab

Likes: 2

Downloads: 0

Tags: transformers, safetensors, qwen3, text-generation, fp8, quantized, vllm, conversational, compressed-tensors, llmcompressor, ko, en, multilingual, arxiv:2507.05686, base_model:dnotitia/DNA-2.0-14B, base_model:quantized:dnotitia/DNA-2.0-14B, license:apache-2.0, text-generation-inference, endpoints_compatible, region:us