Todays AI Summary

AI Developments: Reasoning, Code Optimization, and Model Quantization Emerge

Today's AI landscape showcases advancements in reasoning capabilities, code optimization techniques, and efficient model quantization. Research papers explore methods for enhancing language model calibration and leveraging prior knowledge for structured decoding, while new models offer improved performance and accessibility through quantization.

Research Highlights

  • ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning (arXiv:2507.16815): This paper introduces a dual-system framework that combines high-level reasoning with low-level action execution. By training a multimodal LLM to generate embodied reasoning plans guided by action-aligned visual rewards, ThinkAct enables few-shot adaptation, long-horizon planning, and self-correction in complex embodied AI tasks.
  • MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning (arXiv:2507.16812): This research addresses the lack of open scientific reasoning datasets by presenting TextbookReasoning and MegaScience. These datasets, comprising truthful reference answers extracted from scientific textbooks and a mixture of high-quality open-source data, demonstrate superior performance and training efficiency. Models trained on MegaScience significantly outperform official instruct models in average performance, suggesting a scaling benefit for scientific tuning.
  • Rethinking LLM-Based RTL Code Optimization Via Timing Logic Metamorphosis (arXiv:2507.16808): This paper investigates the effectiveness of LLM-based code optimization methods for Register Transfer Level (RTL) code with complex timing logic. The study reveals that while LLMs can effectively optimize logic operations, they struggle with timing logic in RTL code.
  • Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty (arXiv:2507.16806): This paper introduces Reinforcement Learning with Calibration Rewards (RLCR), an approach to training reasoning models that jointly improves accuracy and calibrated confidence estimation. RLCR augments a binary correctness score with a Brier score, incentivizing calibrated prediction and producing more generally reliable reasoning models.
  • WGRAMMAR: Leverage Prior Knowledge to Accelerate Structured Decoding (arXiv:2507.16768): This paper introduces wgrammar, a lightweight decoding engine that integrates domain-aware simplification, constraint decomposition, and mask caching, achieving up to 250x speedup over existing systems.

Model Releases

  • ubergarm/Qwen3-Coder-480B-A35B-Instruct-GGUF: This model provides ik_llama.cpp imatrix quantizations of Qwen/Qwen3-Coder-480B-A35B-Instruct, optimized for perplexity and memory footprint.
  • mlx-community/mxbai-embed-large-v1: This model converts mixedbread-ai/mxbai-embed-large-v1 to MLX format, designed for feature extraction and compatible with various tasks, including classification, retrieval, and STS. It achieves strong performance on MTEB benchmarks.
  • skt/A.X-3.1: SK Telecom released A.X 3.1, a large language model (LLM) optimized for Korean-language understanding and enterprise deployment. The model was trained from scratch on a high-quality multilingual corpus comprising 2.1 trillion tokens, with a primary focus on the Korean language.

Key Takeaways

  • Reasoning Enhancement: Research is focusing on improving the reasoning capabilities of AI models through innovative frameworks and reward systems.
  • Code Optimization Challenges: While LLMs show promise in code optimization, challenges remain in handling complex timing logic in RTL code.
  • Model Accessibility: Quantization techniques are making large models more accessible by reducing their size and computational requirements.
  • Data Quality is Key: The MegaScience paper highlights the importance of high-quality, verifiable scientific reasoning datasets for training effective AI models in scientific domains.
  • Calibration Matters: The RLCR approach demonstrates that explicitly optimizing for calibration can produce more reliable reasoning models.

AI Papers for 2026-04-27

Seeing Fast and Slow: Learning the Flow of Time in Videos

How can we tell whether a video has been sped up or slowed down? How can we generate videos at different speeds? Although videos have been central to modern computer vision research, little attention has been paid to perceiving and controlling the passage of time. In this paper, we study time as a learnable visual concept and develop models for reasoning about and manipulating the flow of time in videos. We first exploit the multimodal cues and temporal structure naturally present in videos to learn, in a self-supervised manner, to detect speed changes and estimate playback speed. We then show that these learned temporal reasoning models enable us to curate the largest slow-motion video dataset to date from noisy in-the-wild sources. Such slow-motion footage, typically filmed by high-speed cameras, contains substantially richer temporal detail than standard videos. Using this data, we further develop models capable of temporal control, including speed-conditioned video generation, which produces motion at specified playback speed, and temporal super-resolution, which tranforms low-FPS, blurry videos into high-FPS sequences with fine-grained temporal details. Our findings highlight time as a manipulable, perceptual dimension in video learning, opening doors to temporally controllable video generation, temporal forensics detection, and potentially richer world-models that understand how events unfold over time.

When Prompts Override Vision: Prompt-Induced Hallucinations in LVLMs

Despite impressive progress in capabilities of large vision-language models (LVLMs), these systems remain vulnerable to hallucinations, i.e., outputs that are not grounded in the visual input. Prior work has attributed hallucinations in LVLMs to factors such as limitations of the vision backbone or the dominance of the language component, yet the relative importance of these factors remains unclear. To resolve this ambiguity, We propose HalluScope, a benchmark to better understand the extent to which different factors induce hallucinations. Our analysis indicates that hallucinations largely stem from excessive reliance on textual priors and background knowledge, especially information introduced through textual instructions. To mitigate hallucinations induced by textual instruction priors, we propose HalluVL-DPO, a framework for fine-tuning off-the-shelf LVLMs towards more visually grounded responses. HalluVL-DPO leverages preference optimization using a curated training dataset that we construct, guiding the model to prefer grounded responses over hallucinated ones. We demonstrate that our optimized model effectively mitigates the targeted hallucination failure mode, while preserving or improving performance on other hallucination benchmarks and visual capability evaluations. To support reproducibility and further research, we will publicly release our evaluation benchmark, preference training dataset, and code at https://pegah-kh.github.io/projects/prompts-override-vision/ .

From Research Question to Scientific Workflow: Leveraging Agentic AI for Science Automation

Scientific workflow systems automate execution -- scheduling, fault tolerance, resource management -- but not the semantic translation that precedes it. Scientists still manually convert research questions into workflow specifications, a task requiring both domain knowledge and infrastructure expertise. We propose an agentic architecture that closes this gap through three layers: an LLM interprets natural language into structured intents (semantic layer); validated generators produce reproducible workflow DAGs (deterministic layer); and domain experts author ``Skills'': markdown documents encoding vocabulary mappings, parameter constraints, and optimization strategies (knowledge layer). This decomposition confines LLM non-determinism to intent extraction: identical intents always yield identical workflows. We implement and evaluate the architecture on the 1000 Genomes population genetics workflow and Hyperflow WMS running on Kubernetes. In an ablation study on 150 queries, Skills raise full-match intent accuracy from 44% to 83%; skill-driven deferred workflow generation reduces data transfer by 92\%; and the end-to-end pipeline completes queries on Kubernetes with LLM overhead below 15 seconds and cost under $0.001 per query.

A Scale-Adaptive Framework for Joint Spatiotemporal Super-Resolution with Diffusion Models

Deep-learning video super-resolution has progressed rapidly, but climate applications typically super-resolve (increase resolution) either space or time, and joint spatiotemporal models are often designed for a single pair of super-resolution (SR) factors (upscaling spatial and temporal ratio between the low-resolution sequence and the high-resolution sequence), limiting transfer across spatial resolutions and temporal cadences (frame rates). We present a scale-adaptive framework that reuses the same architecture across factors by decomposing spatiotemporal SR into a deterministic prediction of the conditional mean, with attention, and a residual conditional diffusion model, with an optional mass-conservation (same precipitation amount in inputs and outputs) transform to preserve aggregated totals. Assuming that larger SR factors primarily increase underdetermination (hence required context and residual uncertainty) rather than changing the conditional-mean structure, scale adaptivity is achieved by retuning three factor-dependent hyperparameters before retraining: the diffusion noise schedule amplitude beta (larger for larger factors to increase diversity), the temporal context length L (set to maintain comparable attention horizons across cadences) and optionally a third, the mass-conservation function f (tapered to limit the amplification of extremes for large factors). Demonstrated on reanalysis precipitation over France (Comephore), the same architecture spans super-resolution factors from 1 to 25 in space and 1 to 6 in time, yielding a reusable architecture and tuning recipe for joint spatiotemporal super-resolution across scales.

GiVA: Gradient-Informed Bases for Vector-Based Adaptation

As model sizes continue to grow, parameter-efficient fine-tuning has emerged as a powerful alternative to full fine-tuning. While LoRA is widely adopted among these methods, recent research has explored vector-based adaptation methods due to their extreme parameter efficiency. However, these methods typically require substantially higher ranks than LoRA to match its performance, leading to increased training costs. This work introduces GiVA, a gradient-based initialization strategy for vector-based adaptation. It achieves training times comparable to LoRA and maintains the extreme parameter efficiency of vector-based adaptation. We evaluate GiVA across diverse benchmarks, including natural language understanding, natural language generation, and image classification. Experiments show that our approach consistently outperforms or achieves performance competitive with existing vector-based adaptation methods and LoRA while reducing rank requirements by a factor of eight ($8\times$).

Nemobot Games: Crafting Strategic AI Gaming Agents for Interactive Learning with Large Language Models

This paper introduces a new paradigm for AI game programming, leveraging large language models (LLMs) to extend and operationalize Claude Shannon's taxonomy of game-playing machines. Central to this paradigm is Nemobot, an interactive agentic engineering environment that enables users to create, customize, and deploy LLM-powered game agents while actively engaging with AI-driven strategies. The LLM-based chatbot, integrated within Nemobot, demonstrates its capabilities across four distinct classes of games. For dictionary-based games, it compresses state-action mappings into efficient, generalized models for rapid adaptability. In rigorously solvable games, it employs mathematical reasoning to compute optimal strategies and generates human-readable explanations for its decisions. For heuristic-based games, it synthesizes strategies by combining insights from classical minimax algorithms (see, e.g., shannon1950chess) with crowd-sourced data. Finally, in learning-based games, it utilizes reinforcement learning with human feedback and self-critique to iteratively refine strategies through trial-and-error and imitation learning. Nemobot amplifies this framework by offering a programmable environment where users can experiment with tool-augmented generation and fine-tuning of strategic game agents. From strategic games to role-playing games, Nemobot demonstrates how AI agents can achieve a form of self-programming by integrating crowdsourced learning and human creativity to iteratively refine their own logic. This represents a step toward the long-term goal of self-programming AI.

A Multi-Stage Warm-Start Deep Learning Framework for Unit Commitment

Maintaining instantaneous balance between electricity supply and demand is critical for reliability and grid instability. System operators achieve this through solving the task of Unit Commitment (UC),ca high dimensional large-scale Mixed-integer Linear Programming (MILP) problem that is strictly and heavily governed by the grid physical constraints. As grid integrate variable renewable sources, and new technologies such as long duration storage in the grid, UC must be optimally solved for multi-day horizons and potentially with greater frequency. Therefore, traditional MILP solvers increasingly struggle to compute solutions within these tightening operational time limits. To bypass these computational bottlenecks, this paper proposes a novel framework utilizing a transformer-based architecture to predict generator commitment schedules over a 72-hour horizon. Also, because raw predictions in highly dimensional spaces often yield physically infeasible results, the pipeline integrates the self-attention network with deterministic post-processing heuristics that systematically enforce minimum up/down times and minimize excess capacity. Finally, these refined predictions are utilized as a warm start for a downstream MILP solver, while employing a confidence-based variable fixation strategy to drastically reduce the combinatorial search space. Validated on a single-bus test system, the complete multi-stage pipeline achieves 100\% feasibility and significantly accelerates computation times. Notably, in approximately 20\% of test instances, the proposed model reached a feasible operational schedule with a lower overall system cost than relying solely on the solver.

TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale

Real-time detection and mitigation of technical anomalies are critical for large-scale cloud-native services, where even minutes of downtime can result in massive financial losses and diminished user trust. While customer incidents serve as a vital signal for discovering risks missed by monitoring, extracting actionable intelligence from this data remains challenging due to extreme noise, high throughput, and semantic complexity of diverse business lines. In this paper, we present TingIS, an end-to-end system designed for enterprise-grade incident discovery. At the core of TingIS is a multi-stage event linking engine that synergizes efficient indexing techniques with Large Language Models (LLMs) to make informed decisions on event merging, enabling the stable extraction of actionable incidents from just a handful of diverse user descriptions. This engine is complemented by a cascaded routing mechanism for precise business attribution and a multi-dimensional noise reduction pipeline that integrates domain knowledge, statistical patterns, and behavioral filtering. Deployed in a production environment handling a peak throughput of over 2,000 messages per minute and 300,000 messages per day, TingIS achieves a P90 alert latency of 3.5 minutes and a 95\% discovery rate for high-priority incidents. Benchmarks constructed from real-world data demonstrate that TingIS significantly outperforms baseline methods in routing accuracy, clustering quality, and Signal-to-Noise Ratio.

A Multimodal Text- and Graph-Based Approach for Open-Domain Event Extraction from Documents

Event extraction is essential for event understanding and analysis. It supports tasks such as document summarization and decision-making in emergency scenarios. However, existing event extraction approaches have limitations: (1) closed-domain algorithms are restricted to predefined event types and thus rarely generalize to unseen types and (2) open-domain event extraction algorithms, capable of handling unconstrained event types, have largely overlooked the potential of large language models (LLMs) despite their advanced abilities. Additionally, they do not explicitly model document-level contextual, structural, and semantic reasoning, which are crucial for effective event extraction but remain challenging for LLMs due to lost-in-the-middle phenomenon and attention dilution. To address these limitations, we propose multimodal open-domain event extraction, MODEE , a novel approach for open-domain event extraction that combines graph-based learning with text-based representation from LLMs to model document-level reasoning. Empirical evaluations on large datasets demonstrate that MODEE outperforms state-of-the-art open-domain event extraction approaches and can be generalized to closed-domain event extraction, where it outperforms existing algorithms.

Addressing Image Authenticity When Cameras Use Generative AI

The ability of generative AI (GenAI) methods to photorealistically alter camera images has raised awareness about the authenticity of images shared online. Interestingly, images captured directly by our cameras are considered authentic and faithful. However, with the increasing integration of deep-learning modules into cameras' capture-time hardware -- namely, the image signal processor (ISP) -- there is now a potential for hallucinated content in images directly output by our cameras. Hallucinated capture-time image content is typically benign, such as enhanced edges or texture, but in certain operations, such as AI-based digital zoom or low-light image enhancement, hallucinations can potentially alter the semantics and interpretation of the image content. As a result, users may not realize that the content in their camera images is not authentic. This paper addresses this issue by enabling users to recover the 'unhallucinated' version of the camera image to avoid misinterpretation of the image content. Our approach works by optimizing an image-specific multi-layer perceptron (MLP) decoder together with a modality-specific encoder so that, given the camera image, we can recover the image before hallucinated content was added. The encoder and MLP are self-contained and can be applied post-capture to the image without requiring access to the camera ISP. Moreover, the encoder and MLP decoder require only 180 KB of storage and can be readily saved as metadata within standard image formats such as JPEG and HEIC.

AI Models

mlx-community/DeepSeek-V4-Flash-2bit-DQ


language: en tags:

  • mlx library_name: mlx pipeline_tag: text-generation

mlx-community/DeepSeek-V4-Flash-2bit-DQ

Made possible by Lambda.ai โค๏ธ

DeepSeek-V4-Flash-2bit-DQ uses a dynamic mixed-precision quantization policy. Most routed MoE expert weights are packed to 2-bit, while sensitive layers and projections remain in higher-quality 4-bit, 6-bit or 8-bit quantization. This keeps memory use much lower than the baseline 4-bit checkpoint.

Use with mlx

pip install mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("mlx-community/DeepSeek-V4-Flash-2bit-DQ")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True, return_dict=False,
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)

Author: mlx-community

Likes: 12

Downloads: 0

Tags: mlx, safetensors, deepseek_v4, text-generation, conversational, en, 4-bit, region:us

antirez/deepseek-v4-gguf


license: mit

Author: antirez

Likes: 11

Downloads: 0

Tags: gguf, license:mit, endpoints_compatible, region:us, conversational

Abiray/Qwen3.6-27B-AEON-Ultimate-Uncensored-GGUF


license: other language:

  • en pipeline_tag: text-generation base_model: AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored tags:
  • qwen
  • uncensored
  • gguf
  • roleplay
  • llama-cpp

Qwen3.6-27B-AEON-Ultimate-Uncensored - GGUF

This repository contains GGUF quantizations of the heavily fine-tuned and uncensored AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored model.

These quantizations were generated using a custom-compiled llama.cpp

๐Ÿ› ๏ธ Key Fixes & Optimizations Applied

During the conversion to the base FP16 format, two major patches were applied to ensure maximum compatibility and stability across all local inference engines:

  1. Tokenizer Hash Bypass: The custom pre-tokenizer BPE hash from the original model was patched to natively use the standard qwen2 pre-tokenizer rules, preventing llama.cpp conversion crashes.
  2. Native ChatML Injection: The original model's tokenizer_config.json was overwritten with a strict, standard ChatML template before quantization. The correct <|im_start|> and <|im_end|> tokens are now permanently baked into the GGUF metadata. You do not need to manually configure start/stop tokens in your UI.

๐Ÿ“ฆ Available Quantizations

| File Name | Bit Size | Description | | :--- | :--- | :--- | | qwen3.6-27b-uncensored-q3_k_m.gguf | 3-bit | Very small, high perplexity loss. Best for severely constrained hardware. | | qwen3.6-27b-uncensored-q4_k_s.gguf | 4-bit | Smaller 4-bit variant, good balance for 16GB RAM/VRAM setups. | | qwen3.6-27b-uncensored-q4_k_m.gguf | 4-bit | Recommended. Excellent balance of speed, size, and uncensored logic retention. | | qwen3.6-27b-uncensored-q5_k_m.gguf | 5-bit | High quality, minimal degradation. Great if you have the memory for it. | | qwen3.6-27b-uncensored-q6_k.gguf | 6-bit | Near-lossless quality. | | qwen3.6-27b-uncensored-q8_0.gguf | 8-bit | Extremely large, indistinguishable from FP16. |

๐ŸŽฎ Ideal Use Cases

Because this model is "abliterated" and stripped of safety guardrails, it excels at:

  • Complex, Unfiltered Roleplay: Perfect for running highly detailed, gritty, or dystopian companion personas without refusal loops.
  • Creative Writing: Capable of handling dark sci-fi, Isekai, and mature narrative generation.
  • Local Autonomy: Runs entirely offline using standard tools like LM Studio, text-generation-webui, or Ollama.

๐Ÿ’ป Quick Start (Python)

If you want to run this directly via python using llama-cpp-python, the baked-in ChatML template handles the formatting automatically:

from llama_cpp import Llama

# Load the model (adjust threads/layers based on your hardware)
llm = Llama(
    model_path="qwen3.6-27b-uncensored-q4_k_m.gguf",
    n_ctx=4096,
    n_threads=8
)

messages = [
    {"role": "system", "content": "You are a gritty, unfiltered AI . Speak bluntly."},
    {"role": "user", "content": "What are the rules of this city?"}
]

response = llm.create_chat_completion(
    messages=messages,
    max_tokens=500,
    temperature=0.8
)

print(response['choices'][0]['message']['content'])

Author: Abiray

Likes: 7

Downloads: 0

Tags: gguf, qwen, uncensored, roleplay, llama-cpp, text-generation, en, base_model:AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored, base_model:quantized:AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored, license:other, endpoints_compatible, region:us, conversational

FINAL-Bench/Darwin-28B-KR


license: apache-2.0 language:

  • ko
  • en base_model:
  • FINAL-Bench/Darwin-28B-Opus
  • FINAL-Bench/Darwin-27B-KR pipeline_tag: text-generation tags:
  • darwin
  • korean
  • reasoning
  • multimodal
  • qwen3.5
  • evolutionary-merge
  • vidraft library_name: transformers

Darwin-28B-KR

๋น„๋“œ๋ž˜ํ”„ํŠธ(VIDRAFT) ํ•œ๊ตญ์–ด ํŠนํ™” 28B ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์–ธ์–ด ๋ชจ๋ธ Darwin family ํ•œ๊ตญ์–ด ํŠนํ™” 2์„ธ๋Œ€ ๋ชจ์ฒด ๋ชจ๋ธ


๐ŸŽฏ ๋ชจ๋ธ ์†Œ๊ฐœ

Darwin-28B-KR์€ ๋น„๋“œ๋ž˜ํ”„ํŠธ(VIDRAFT)๊ฐ€ ๊ฐœ๋ฐœํ•œ ํ•œ๊ตญ์–ด ํŠนํ™” 28B ํŒŒ๋ผ๋ฏธํ„ฐ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์–ธ์–ด ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

์˜์–ด ์ถ”๋ก  ๋Šฅ๋ ฅ๊ณผ ํ•œ๊ตญ์–ด ๋Šฅ๋ ฅ์„ ๋™์‹œ์— ๊ฐ–์ถ”๋„๋ก ์„ค๊ณ„๋œ Darwin family์˜ 2์„ธ๋Œ€ ๋ชจ์ฒด(ๆฏ้ซ”) ๋ชจ๋ธ๋กœ, ํ•œ๊ตญ์–ด ํ‘œํ˜„ยท์ดํ•ดยท์ถ”๋ก , ์˜์–ด ์ถ”๋ก , ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ(์ด๋ฏธ์ง€ยท๋น„๋””์˜ค) ์ดํ•ด๋ฅผ ๋ชจ๋‘ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ ํ–ฅํ›„ ๋‹ค์–‘ํ•œ ํ•œ๊ตญ์–ด ๋„๋ฉ”์ธ ํŠนํ™” ๋ชจ๋ธ(๋ฒ•๋ฅ ยท์˜๋ฃŒยท๊ธˆ์œตยทํ•™์ˆ  ๋“ฑ)์˜ ๊ณตํ†ต ์ถœ๋ฐœ์ ์ด ๋ฉ๋‹ˆ๋‹ค.


๐Ÿงฌ ๊ณ„๋ณด (Lineage)

Qwen3.5-27B (Alibaba Qwen team)
       |
       v
Darwin-27B-Opus (FINAL-Bench)
       |  Darwin V7 ์ง„ํ™” ๋จธ์ง€ (evolutionary merge)
       |
   +---+----------------------+
   v                          v
Darwin-28B-Opus           Darwin-27B-KR
(์˜์–ด/์ถ”๋ก                  (ํ•œ๊ตญ์–ด ํŠนํ™” ์ฑ”ํ”ผ์–ธ
 + ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ)                CLIcK 79.59%)
   |                          |
   +--------+-----------------+
            |  Darwin V7 MRI-aware merge
            |  (ํ•œ๊ตญ์–ด ์ถœ๋ ฅ ํ†ต๋กœ 100% Mother ๋ณด์กด)
            v
       Darwin-28B-KR  <- this model

โš™๏ธ ๋Šฅ๋ ฅ ๋งคํŠธ๋ฆญ์Šค

| ๋Šฅ๋ ฅ | ์ถœ์ฒ˜ | ๊ฐ•๋„ | |---|---|---| | ํ•œ๊ตญ์–ด ์ดํ•ด/์ƒ์„ฑ | Darwin-27B-KR ๊ณ„์—ด | โญโญโญโญโญ | | ํ•œ๊ตญ์–ด ์ถ”๋ก  (CSAT/PSAT) | ํ†ตํ•ฉ ํšจ๊ณผ | โญโญโญโญโญ | | ์˜์–ด ์ถ”๋ก  | Darwin-28B-Opus ๊ณ„์—ด | โญโญโญโญ | | ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ (์ด๋ฏธ์ง€/๋น„๋””์˜ค) | Darwin-28B-Opus ๋ณด์กด | โญโญโญโญ | | ์˜ํ•œ ์ฝ”๋“œ์Šค์œ„์นญ | ํ†ตํ•ฉ ํšจ๊ณผ | โญโญโญโญ | | ์ž๊ธฐ ์ •์ฒด์„ฑ ์ธ์‹ | ๋น„๋“œ๋ž˜ํ”„ํŠธ ํ•™์Šต | โญโญโญโญโญ |


๐Ÿ“Š K-AI ๋ฆฌ๋”๋ณด๋“œ CLIcK ๋น„๊ต

| ๋ชจ๋ธ | CLIcK | |---|---| | QuettaLLMs-27B-Koreasoner-V3 | 0.794 | | Rogue-27B-KR | 0.791 | | Darwin-28B-KR (์ด ๋ชจ๋ธ) | 0.786 | | AWAXIS-Think-28B | 0.770 |

(* 200๋ฌธ์ œ ํ‰๊ฐ€ ๊ธฐ์ค€)


๐Ÿ“Š ์‚ฌ์–‘

| ํ•ญ๋ชฉ | ๊ฐ’ | |---|---| | Architecture | Qwen3_5ForConditionalGeneration (hybrid full + linear attention) | | Parameters | ~28B | | Hidden size | 5120 | | Layers | 64 | | Vocab size | 248,320 | | Format | bfloat16 (~53 GB on disk) | | Context | 8K~32K (๋ฐฐํฌ ํ™˜๊ฒฝ ๋”ฐ๋ผ) |


๐Ÿš€ ์‚ฌ์šฉ๋ฒ•

vLLM (๊ถŒ์žฅ)

vllm serve FINAL-Bench/Darwin-28B-KR \
    --trust-remote-code \
    --port 8000 \
    --enforce-eager \
    --max-model-len 8192 \
    --gpu-memory-utilization 0.85

OpenAI ํ˜ธํ™˜ ํด๋ผ์ด์–ธํŠธ

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
response = client.chat.completions.create(
    model="FINAL-Bench/Darwin-28B-KR",
    messages=[
        {"role": "user", "content": "ํ•œ๊ตญ์˜ ๊ด‘๋ณต์ ˆ์€ ๋ฌด์—‡์„ ๊ธฐ๋…ํ•˜๋Š” ๋‚ ์ธ๊ฐ€์š”?"}
    ],
    max_tokens=2048,
    temperature=0.0,
)
print(response.choices[0].message.content)

๐Ÿ–ฅ๏ธ ํ•˜๋“œ์›จ์–ด ์š”๊ตฌ์‚ฌํ•ญ

| GPU ์‹œ๋ฆฌ์ฆˆ | ์ƒํƒœ | |---|---| | NVIDIA Blackwell (B200) | โœ… Best | | NVIDIA Hopper (H100/H200) | โœ… ๊ถŒ์žฅ | | NVIDIA Ada (L40S) | โš ๏ธ ๋น ๋“ฏํ•จ (53GB BF16) | | Older Ampere | โŒ VRAM ๋ถ€์กฑ |

์ตœ์†Œ VRAM: ~55 GB (BF16 ์ถ”๋ก ์šฉ)


๐Ÿ’ฌ ์ž๊ธฐ์†Œ๊ฐœ ์˜ˆ์‹œ

User: ๋‹น์‹ ์€ ๋ˆ„๊ตฌ์ธ๊ฐ€์š”?
Darwin-28B-KR: ์ €๋Š” ๋น„๋“œ๋ž˜ํ”„ํŠธ๊ฐ€ ๊ฐœ๋ฐœํ•œ Darwin-28B-KR์ž…๋‹ˆ๋‹ค.
               ํ•œ๊ตญ์–ด์— ํŠนํ™”๋œ 280์–ต ํŒŒ๋ผ๋ฏธํ„ฐ ๊ทœ๋ชจ์˜ ์–ธ์–ด ๋ชจ๋ธ๋กœ,
               ๋‹ค์–‘ํ•œ ํ•œ๊ตญ์–ด ์ž‘์—…์— ์ตœ์ ํ™”๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

๐ŸŒณ 2์„ธ๋Œ€ ๋„๋ฉ”์ธ ํŠนํ™” ๋ชจ๋ธ (์˜ˆ์ •)

์ด ๋ชจ์ฒด์—์„œ ํŒŒ์ƒ๋  ์˜ˆ์ •์ธ ํ•œ๊ตญ์–ด ํŠนํ™” ๋ณ€์ข…๋“ค:

  • Darwin-28B-KR-Legal โ€” ๋ฒ•๋ฅ  ๋„๋ฉ”์ธ
  • Darwin-28B-KR-Medical โ€” ์˜๋ฃŒ ๋„๋ฉ”์ธ
  • Darwin-28B-KR-Finance โ€” ๊ธˆ์œต ๋„๋ฉ”์ธ
  • Darwin-28B-KR-Code โ€” ํ•œ๊ตญ์–ด ์ฃผ์„ ์ฝ”๋“œ ์ƒ์„ฑ
  • Darwin-28B-KR-MFP4 โ€” ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ ์–‘์žํ™” ๋ฒ„์ „

๊ฐ ๋ณ€์ข…์€ ์ด ๋ชจ๋ธ์„ base๋กœ ํ•˜์—ฌ ๋„๋ฉ”์ธ ๋ฐ์ดํ„ฐ๋กœ ๋ฏธ์„ธ์กฐ์ •/๋จธ์ง€๋ฉ๋‹ˆ๋‹ค.


๐ŸŒณ ํ™œ์šฉ ์˜ˆ์‹œ

  • ํ•œ๊ตญ์–ด ์ผ๋ฐ˜ ๋Œ€ํ™” / Q&A
  • ํ•œ๊ตญ ๋ฌธํ™”ยท์—ญ์‚ฌยท๋ฒ•๋ฅ  ์ง€์‹ ์‘๋‹ต
  • ํ•œ๊ตญ์–ด ์ถ”๋ก  (CSAT/PSAT/K-AI ํ‰๊ฐ€)
  • ์˜์–ด ์ถ”๋ก  / ์˜ํ•œ ๋ฒˆ์—ญ
  • ์ด๋ฏธ์ง€/๋น„๋””์˜ค ๋ถ„์„ + ํ•œ๊ตญ์–ด ์„ค๋ช…
  • ํ•œ๊ตญ์–ด ๊ธ€์“ฐ๊ธฐ / ์š”์•ฝ / ์ฐฝ์ž‘

๐Ÿ™ Credits


๐Ÿ“œ License

Apache 2.0 (๋ฒ ์ด์Šค ๋ชจ๋ธ๋กœ๋ถ€ํ„ฐ ์ƒ์†)

Author: FINAL-Bench

Likes: 6

Downloads: 30

Tags: transformers, safetensors, qwen3_5, image-text-to-text, darwin, korean, reasoning, multimodal, qwen3.5, evolutionary-merge, vidraft, text-generation, conversational, ko, en, base_model:FINAL-Bench/Darwin-27B-KR, base_model:finetune:FINAL-Bench/Darwin-27B-KR, license:apache-2.0, endpoints_compatible, region:us

8F-ai/context-filter


license: apache-2.0 pipeline_tag: token-classification tags:

  • privacy
  • pii-detection
  • pii-redaction
  • token-classification
  • sliding-window-attention
  • rope
  • swiglu language:
  • en library_name: transformers

Context-Filter

License: Apache 2.0 Parameters Context Entities Architecture

Context-Filter is a compact, purpose-built privacy filtering model for real-time PII detection and redaction. At 38M parameters it runs comfortably on CPU or any consumer GPU, and supports sequences up to 32,768 tokens via Sliding Window Attention. It ships with a built-in regex hybrid layer ensuring near-zero false negatives on structured formats such as emails, IPs, and social security numbers.


Highlights

  • Custom Architecture โ€” Not a Fine-Tune: Context-Filter is trained from scratch using a purpose-designed encoder: Grouped Query Attention (8Q / 4KV heads), RMSNorm, RoPE with ฮธ = 500,000, and SwiGLU FFNs. No base model weights are reused.

  • 32K Context via Sliding Window Attention: Each token attends to a local window of ยฑ512 tokens. Memory scales as O(n ยท w) rather than O(nยฒ), making long-document redaction practical on commodity hardware.

  • 12 PII Entity Classes: Covers personal identity, financial, network, and government-issued identifiers across a single BIO tagging head.

  • Focal Loss Training: Trained with focal loss (ฮณ = 2.0) to suppress the dominant O-label class and sharpen precision on rare entity spans.

  • Dual Output Modes: Returns either semantic labels (private_email) or bracketed redaction tags ([EMAIL]), selectable per call.

  • Per-Entity Confidence Scores: Every detected span carries a softmax confidence value, enabling downstream threshold filtering.

  • Regex Hybrid Layer: A built-in post-processing pass applies deterministic regex patterns for structured PII formats, guaranteeing recall on well-defined identifiers regardless of model uncertainty.

  • Multilingual Coverage: Trained on synthetic data from 15 locales spanning English, German, French, Swedish, Italian, Spanish, Dutch, Portuguese, Polish, Czech, Danish, and Finnish.


Model Overview

| Property | Value | |---|---| | Type | Token Classification (BIO NER) | | Architecture | Custom Encoder (Context-Filter) | | Training | From scratch โ€” synthetic data only | | Parameters | ~61M | | Context Length | 32,768 tokens | | VRAM (bfloat16) | ~252 MB | | VRAM (int8) | ~76 MB | | Tokenizer | GPT-2 BPE (50,257 vocabulary) |

Architecture Specification

| Component | Value | |---|---| | Hidden Dimension | 512 | | Number of Layers | 10 | | Attention Heads (Q / KV) | 8 / 4 (GQA) | | Head Dimension | 64 | | FFN Intermediate Dimension | 1,792 | | FFN Activation | SwiGLU | | Attention Pattern | Sliding Window (window = 512) | | Position Encoding | RoPE (ฮธ = 500,000) | | Normalisation | RMSNorm (ฮต = 1e-6) | | Vocabulary Size | 50,257 | | Context Length | 32,768 tokens |

Entity Classes

| Label | Type | Examples | |---|---|---| | PERSON | Full names | Jane Smith, Dr. Erik Larsson | | EMAIL | Email addresses | user@domain.com | | PHONE | Phone numbers | +1-555-234-5678, 07700 900123 | | ADDRESS | Postal addresses | 42 Baker Street, London | | SSN | Social security numbers | 452-78-9012 | | CREDITCARD | Payment card numbers | 4111-1111-1111-1111 | | IP | IPv4 addresses | 192.168.1.104 | | DATE | Dates of birth and event dates | 1990-07-12, March 15, 2024 | | ORG | Organisation names | Acme Corp, St. Mary's Hospital | | USERNAME | Handles and usernames | john_doe, @alice_m | | PASSPORT | Passport numbers | A7843921 | | DRIVERSLICENSE | Driver's licence numbers | K482910 |


Quickstart

Installation

pip install torch transformers

Load the Model

import torch
from context_filter_v2_train import ContextFilterInference

engine = ContextFilterInference("./context_filter_v2")

Redact Mode โ€” [ENTITY] brackets

result = engine.filter(
    "My name is Andrew and my Gmail is Andrew@gmail.com and live in Sweden",
    mode="redact",
)

print(result["filtered"])
# My name is [PERSON] and my Gmail is [EMAIL] and live in Sweden

Label Mode โ€” semantic placeholders

result = engine.filter(
    "My name is Andrew and my Gmail is Andrew@gmail.com and live in Sweden",
    mode="label",
)

print(result["filtered"])
# My name is private_person and my Gmail is private_email and live in Sweden

Entity Spans with Confidence

for entity in result["entities"]:
    print(entity)

# {'type': 'PERSON', 'start': 11, 'end': 17, 'text': 'Andrew', 'confidence': 0.987}
# {'type': 'EMAIL',  'start': 33, 'end': 49, 'text': 'Andrew@gmail.com', 'confidence': 0.995}

Batch Processing

texts = [
    "Call Sarah at +1-555-234-5678.",
    "Server 192.168.1.1 accessed by john_doe on 2024-03-15.",
    "Account: Michael Chen, SSN: 452-78-9012.",
]

results = engine.filter_batch(texts, mode="redact")

for r in results:
    print(r["filtered"])

# Call Sarah at [PHONE].
# Server [IP] accessed by [USERNAME] on [DATE].
# Account: [PERSON], SSN: [SSN].

Disable Regex Hybrid (model-only predictions)

result = engine.filter(text, mode="redact", regex_hybrid=False)

Output Format Reference

filter() return value

{
    "filtered": str,          # processed text with PII replaced
    "entities": [
        {
            "type":       str,    # entity class name (e.g. "EMAIL")
            "start":      int,    # character start offset in original text
            "end":        int,    # character end offset in original text
            "text":       str,    # original PII span
            "confidence": float,  # softmax confidence [0.0 โ€“ 1.0]
        },
        ...
    ]
}

Mode comparison

| Input | mode="label" | mode="redact" | |---|---|---| | Andrew@gmail.com | private_email | [EMAIL] | | Jane Smith | private_person | [PERSON] | | +1-555-234-5678 | private_phone | [PHONE] | | 452-78-9012 | private_ssn | [SSN] | | 192.168.1.104 | private_ip | [IP] | | A7843921 | private_passport | [PASSPORT] |


Performance Characteristics

| Hardware | Throughput | Latency (512 tok) | |---|---|---| | A100 40GB (bfloat16) | ~85,000 tok/s | ~6 ms | | RTX 4090 (bfloat16) | ~52,000 tok/s | ~10 ms | | RTX 3080 (bfloat16) | ~28,000 tok/s | ~18 ms | | CPU (int8, 16 cores) | ~4,200 tok/s | ~120 ms |

Throughput measured at batch size 32. Latency measured at batch size 1.

Memory Footprint

| Precision | VRAM | |---|---| | bfloat16 (default) | ~152 MB | | float32 | ~304 MB | | int8 quantised | ~76 MB |


Intended Use Cases

| Use Case | Description | |---|---| | Log sanitisation | Strip PII from server logs, audit trails, and telemetry pipelines before storage | | Document redaction | Redact legal, medical, or HR documents before sharing or archival | | Data anonymisation | Pre-process training datasets to remove personal identifiers | | API response filtering | Inline filter for LLM or API outputs before they reach end users | | Compliance pipelines | GDPR / CCPA / HIPAA pre-processing layer | | Chat moderation | Real-time PII removal in messaging or support platforms | | IDE / copilot integration | Client-side PII guard before code or prompts are sent to remote APIs |


Hybrid Detection Strategy

Context-Filter uses a two-layer detection approach for maximum recall:

Layer 1 โ€” Neural Model: The transformer encoder reads full sentence context to detect ambiguous PII such as person names, organisation names, and contextual dates that regex cannot identify.

Layer 2 โ€” Regex Safety Net: A deterministic pass using compiled regular expressions guarantees recall on structurally defined formats (email, IPv4, SSN, credit card, phone, passport, driver's licence) regardless of model confidence.

The two layers are merged with entity-level deduplication: spans already found by the model are not double-tagged. This combination eliminates the false-negative failure mode of pure-neural approaches while maintaining the contextual understanding that regex-only tools cannot provide.


Limitations

  • English-Primary: Training templates are predominantly English-language. Names and organisation names in non-Latin scripts may have reduced recall.
  • Highly Nested PII: Overlapping or recursively nested PII spans (e.g., an email containing a person's name as the local part) are resolved to the outermost detected entity.
  • Synthetic Training Data: The model was trained entirely on procedurally generated examples. Domain-specific PII formats not covered by the synthetic generator (e.g., jurisdiction-specific ID numbers) may have lower recall until fine-tuned on real-world samples.
  • Contextual Dates: Generic dates (e.g., publication dates, historical dates) may occasionally be tagged as DATE. Post-filter confidence thresholding (e.g., confidence > 0.8) can reduce these false positives.
  • No Document Structure Awareness: The model operates on raw token sequences without awareness of HTML, Markdown, or JSON structure. Strip formatting before passing structured documents.

License

Context-Filter is released under the Apache License 2.0.


<div align="center"> <sub>Context-Filter โ€” purpose-built for privacy, not adapted for it.</sub> </div>

Author: 8F-ai

Likes: 3

Downloads: 0

Tags: transformers, safetensors, gpt2, privacy, pii-detection, pii-redaction, token-classification, sliding-window-attention, rope, swiglu, en, license:apache-2.0, text-generation-inference, endpoints_compatible, region:us

TheBurgstall/ltx-2.3-googlyeyes-lora


license: apache-2.0 library_name: diffusers pipeline_tag: image-to-video base_model: Lightricks/LTX-2.3 tags:

  • ltx-2
  • ltx-2.3
  • text-to-video
  • image-to-video
  • lora
  • googly-eyes language:
  • en

Googly Eyes LoRA for LTX-2.3

Slaps oversized googly-eye stickers onto whatever subject is in your LTX-2.3 generation. Pure, dumb fun.

Trigger word: googlyeyes โ€” that's all you need. The LoRA does its thing on the trigger word alone, you don't have to write any other prompt. You can add a regular prompt on top if you want to try and direct the scene somehow, no promises you'll succeed though; anyway the trigger just needs to be in there somewhere.

Sample outputs

<table> <tr> <td> <video controls width="100%"> <source src="https://huggingface.co/TheBurgstall/ltx-2.3-googlyeyes-lora/resolve/main/samples/pulp.mp4" type="video/mp4"> </video> </td> <td> <video controls width="100%"> <source src="https://huggingface.co/TheBurgstall/ltx-2.3-googlyeyes-lora/resolve/main/samples/term.mp4" type="video/mp4"> </video> </td> </tr> </table>

Usage in ComfyUI

Standard LTX-2.3 IC-LoRA workflow. Drop the .safetensors into your ComfyUI models/loras/ folder, add a Load LoRA node, set strength to 1.0, and put googlyeyes somewhere in your prompt.

Reference workflows are in the official LTX-2 repository.

Dataset

I built the entire dataset myself โ€” fully synthetic, and curated and filtered by hand. No real people in the training data.

License

Apache-2.0. Note that running inference with this LoRA requires the Lightricks LTX-2.3 base model, whose own license terms still apply at inference time.

Acknowledgements

Author: TheBurgstall

Likes: 3

Downloads: 0

Tags: diffusers, ltx-2, ltx-2.3, text-to-video, image-to-video, lora, googly-eyes, en, base_model:Lightricks/LTX-2.3, base_model:adapter:Lightricks/LTX-2.3, license:apache-2.0, region:us

nsparks/DeepSeek-V4-Flash-FP4-FP8-GGUF


license: other license_name: deepseek license_link: https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash/blob/main/LICENSE base_model: deepseek-ai/DeepSeek-V4-Flash base_model_relation: quantized tags:

  • gguf
  • llama.cpp
  • deepseek
  • deepseek-v4
  • fp4
  • fp8
  • mxfp4
  • moe

DeepSeek-V4-Flash native FP4 / FP8 GGUF

Native, 1:1 conversion of deepseek-ai/DeepSeek-V4-Flash from the original safetensors into a single GGUF file that preserves the model's native low-precision weights:

  • Dense weights: FP8 E4M3 (F8_E4M3_B128, 128-element blocks with one E8M0 scale)
  • MoE expert weights: MXFP4 (MXFP4)

This file is not derived from a higher-precision intermediate; the FP4 and FP8 codes from the upstream checkpoint are written directly into the GGUF.

File

| File | Size | Quant | | --- | --- | --- | | DeepSeek-V4-Flash-FP4-FP8-native.gguf | ~146 GB | F8_E4M3 + MXFP4 |

Loading

This GGUF requires a llama.cpp build with native F8_E4M3_B128 and MXFP4 support and the DeepSeek V4 Flash architecture. Stock upstream llama.cpp cannot load this file.

Reference (WIP) build that can both produce and run this GGUF:

https://github.com/nisparks/llama.cpp/tree/wip/deepseek-v4-support

That branch adds:

  • GGML_TYPE_F8_E4M3_B128 (ggml type 42)
  • LLAMA_FTYPE_MOSTLY_F8_E4M3_MXFP4 (ftype 41, exposed as F8_E4M3_MXFP4 / moe-f8-e4m3-mxfp4)
  • CUDA dequant / MMVQ kernels for F8_E4M3_B128
  • Loader / converter / gguf-py support
  • Custom DeepSeek V4 Flash model graph

The branch is an active WIP, expect rough edges.

Notes

  • DeepSeek V4 Flash is a custom architecture (MoE + sliding-window attention + compressor + indexer). The runtime in the reference branch implements that graph as a custom model path.
  • For matching activation behavior the runtime also applies HF's blockwise FP8 / FP4 fake-activation-quant on attention KV and indexer Q/KV after the Hadamard rotation.

Provenance

Author: nsparks

Likes: 3

Downloads: 0

Tags: gguf, llama.cpp, deepseek, deepseek-v4, fp4, fp8, mxfp4, moe, base_model:deepseek-ai/DeepSeek-V4-Flash, base_model:quantized:deepseek-ai/DeepSeek-V4-Flash, license:other, endpoints_compatible, region:us, conversational

huihui-ai/Huihui-Qwen3-Coder-Next-Opus-4.6-Reasoning-Distilled-abliterated


license: apache-2.0 language:

  • en library_name: transformers pipeline_tag: text-generation base_model:
  • samuelcardillo/Qwen3-Coder-Next-Opus-4.6-Reasoning-Distilled tags:
  • qwen3
  • reasoning
  • distillation
  • claude-opus
  • abliterated
  • uncensored

huihui-ai/Huihui-Qwen3-Coder-Next-Opus-4.6-Reasoning-Distilled-abliterated

This is an uncensored version of samuelcardillo/Qwen3-Coder-Next-Opus-4.6-Reasoning-Distilled created with abliteration (see remove-refusals-with-transformers to know more about it). This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.

Usage Warnings

  • Risk of Sensitive or Controversial Outputs: This modelโ€™s safety filtering has been significantly reduced, potentially generating sensitive, controversial, or inappropriate content. Users should exercise caution and rigorously review generated outputs.

  • Not Suitable for All Audiences: Due to limited content filtering, the modelโ€™s outputs may be inappropriate for public settings, underage users, or applications requiring high security.

  • Legal and Ethical Responsibilities: Users must ensure their usage complies with local laws and ethical standards. Generated content may carry legal or ethical risks, and users are solely responsible for any consequences.

  • Research and Experimental Use: It is recommended to use this model for research, testing, or controlled environments, avoiding direct use in production or public-facing commercial applications.

  • Monitoring and Review Recommendations: Users are strongly advised to monitor model outputs in real-time and conduct manual reviews when necessary to prevent the dissemination of inappropriate content.

  • No Default Safety Guarantees: Unlike standard models, this model has not undergone rigorous safety optimization. huihui.ai bears no responsibility for any consequences arising from its use.

Donation

Your donation helps us continue our further development and improvement, a cup of coffee can do it.
  • bitcoin:
  bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge
  • Support our work on Ko-fi!

Author: huihui-ai

Likes: 2

Downloads: 0

Tags: transformers, qwen3, reasoning, distillation, claude-opus, abliterated, uncensored, text-generation, en, base_model:samuelcardillo/Qwen3-Coder-Next-Opus-4.6-Reasoning-Distilled, base_model:finetune:samuelcardillo/Qwen3-Coder-Next-Opus-4.6-Reasoning-Distilled, license:apache-2.0, endpoints_compatible, region:us

Playtime-AI/LTX-2.3-Wednesday_Addams


license: apache-2.0

The trigger is "Wednesday Addams" but you don't really need to use it. I did, however, caption everything including her hair color, clothing, etc. to make the model as flexible as possible. So, you will have to be very descriptive with your prompting.

Author: Playtime-AI

Likes: 2

Downloads: 0

Tags: license:apache-2.0, region:us

Playtime-AI/LTX-2.3-Better_Female_Nudity_v2


license: apache-2.0

Author: Playtime-AI

Likes: 2

Downloads: 0

Tags: license:apache-2.0, region:us