Todays AI Summary

AI Developments: MetaCLIP 2, DRIVE-RL, and Neuro-Symbolic LLM Enhancement

Here's a look at some of the recent developments in AI, covering new models and research papers.

Research Papers

Recent research explores diverse areas within AI. One paper introduces a neuro-symbolic approach to enhance Large Language Models (LLMs) by integrating symbolic ontological reasoning. This method aims to improve the consistency and reliability of LLM outputs by detecting and correcting inconsistencies between LLM outputs and predefined ontologies. Another paper analyzes citation trends in NLP and other AI subfields, finding a general trend of "citation amnesia" where the age of cited papers has decreased, likely due to the rapid pace of knowledge production in these fields.

Models

Several new models have been released, focusing on image classification and code generation.

  • MetaCLIP 2: Facebook has released multiple versions of Distilled MetaCLIP 2, including "ViT-L/14," "ViT-S/16," "ViT-B/32 (mT5 Tokenizer)," "ViT-M/16 (mT5 Tokenizer)," and "ViT-B/32." These models are designed for zero-shot image classification and are multilingual, building upon the "MetaCLIP 2: A Worldwide Scaling Recipe" paper. The models are compatible with the Transformers library and support various vision transformer architectures.
  • DRIVE-RL: Tencent has introduced DRIVE-RL, a model for competitive code generation. It uses a two-stage reinforcement learning process to improve performance, achieving state-of-the-art results among models of similar scale. The training pipeline involves supervised fine-tuning followed by entropy expansion and a hard-focus curriculum.
  • Kimi-K2-Thinking-Moxin-GGUF: moxin-org has released a GGUF-quantized version of moonshotai's Kimi-K2-Thinking model. This model is designed for text generation and conversational tasks, with a focus on efficient deployment using llama.cpp.
  • Qwen3-14B-UML-Generator: sequelbox has released Qwen3-14B-UML-Generator, a model specialized in generating Unified Modeling Language (UML) diagrams. It is fine-tuned on a custom dataset created with DeepSeek-V3.2 and is designed for code reasoning and situational analysis.

Key Takeaways

  • Multilingual Image Classification: MetaCLIP 2 models offer advancements in zero-shot image classification with multilingual capabilities.
  • Code Generation with RL: DRIVE-RL demonstrates the effectiveness of reinforcement learning in improving code generation performance.
  • Specialized Reasoning Models: Qwen3-14B-UML-Generator highlights the potential of fine-tuning models for specific reasoning tasks like UML diagram generation.
  • LLM Reliability: Research into neuro-symbolic integration offers a promising avenue for enhancing the reliability and accuracy of LLM outputs.

AI Papers for 2026-02-24

Unifying approach to uniform expressivity of graph neural networks

The expressive power of Graph Neural Networks (GNNs) is often analysed via correspondence to the Weisfeiler-Leman (WL) algorithm and fragments of first-order logic. Standard GNNs are limited to performing aggregation over immediate neighbourhoods or over global read-outs. To increase their expressivity, recent attempts have been made to incorporate substructural information (e.g. cycle counts and subgraph properties). In this paper, we formalize this architectural trend by introducing Template GNNs (T-GNNs), a generalized framework where node features are updated by aggregating over valid template embeddings from a specified set of graph templates. We propose a corresponding logic, Graded template modal logic (GML(T)), and generalized notions of template-based bisimulation and WL algorithm. We establish an equivalence between the expressive power of T-GNNs and GML(T), and provide a unifying approach for analysing GNN expressivity: we show how standard AC-GNNs and its recent variants can be interpreted as instantiations of T-GNNs.

Leakage and Second-Order Dynamics Improve Hippocampal RNN Replay

Biological neural networks (like the hippocampus) can internally generate "replay" resembling stimulus-driven activity. Recent computational models of replay use noisy recurrent neural networks (RNNs) trained to path-integrate. Replay in these networks has been described as Langevin sampling, but new modifiers of noisy RNN replay have surpassed this description. We re-examine noisy RNN replay as sampling to understand or improve it in three ways: (1) Under simple assumptions, we prove that the gradients replay activity should follow are time-varying and difficult to estimate, but readily motivate the use of hidden state leakage in RNNs for replay. (2) We confirm that hidden state adaptation (negative feedback) encourages exploration in replay, but show that it incurs non-Markov sampling that also slows replay. (3) We propose the first model of temporally compressed replay in noisy path-integrating RNNs through hidden state momentum, connect it to underdamped Langevin sampling, and show that, together with adaptation, it counters slowness while maintaining exploration. We verify our findings via path-integration of 2D triangular and T-maze paths and of high-dimensional paths of synthetic rat place cell activity.

Learning to Tune Pure Pursuit in Autonomous Racing: Joint Lookahead and Steering-Gain Control with PPO

Pure Pursuit (PP) is widely used in autonomous racing for real-time path tracking due to its efficiency and geometric clarity, yet performance is highly sensitive to how key parameters-lookahead distance and steering gain-are chosen. Standard velocity-based schedules adjust these only approximately and often fail to transfer across tracks and speed profiles. We propose a reinforcement-learning (RL) approach that jointly chooses the lookahead Ld and a steering gain g online using Proximal Policy Optimization (PPO). The policy observes compact state features (speed and curvature taps) and outputs (Ld, g) at each control step. Trained in F1TENTH Gym and deployed in a ROS 2 stack, the policy drives PP directly (with light smoothing) and requires no per-map retuning. Across simulation and real-car tests, the proposed RL-PP controller that jointly selects (Ld, g) consistently outperforms fixed-lookahead PP, velocity-scheduled adaptive PP, and an RL lookahead-only variant, and it also exceeds a kinematic MPC raceline tracker under our evaluated settings in lap time, path-tracking accuracy, and steering smoothness, demonstrating that policy-guided parameter tuning can reliably improve classical geometry-based control.

FedZMG: Efficient Client-Side Optimization in Federated Learning

Federated Learning (FL) enables distributed model training on edge devices while preserving data privacy. However, clients tend to have non-Independent and Identically Distributed (non-IID) data, which often leads to client-drift, and therefore diminishing convergence speed and model performance. While adaptive optimizers have been proposed to mitigate these effects, they frequently introduce computational complexity or communication overhead unsuitable for resource-constrained IoT environments. This paper introduces Federated Zero Mean Gradients (FedZMG), a novel, parameter-free, client-side optimization algorithm designed to tackle client-drift by structurally regularizing the optimization space. Advancing the idea of Gradient Centralization, FedZMG projects local gradients onto a zero-mean hyperplane, effectively neutralizing the "intensity" or "bias" shifts inherent in heterogeneous data distributions without requiring additional communication or hyperparameter tuning. A theoretical analysis is provided, proving that FedZMG reduces the effective gradient variance and guarantees tighter convergence bounds compared to standard FedAvg. Extensive empirical evaluations on EMNIST, CIFAR100, and Shakespeare datasets demonstrate that FedZMG achieves better convergence speed and final validation accuracy compared to the baseline FedAvg and the adaptive optimizer FedAdam, particularly in highly non-IID settings.

Zero-shot Interactive Perception

Interactive perception (IP) enables robots to extract hidden information in their workspace and execute manipulation plans by physically interacting with objects and altering the state of the environment -- crucial for resolving occlusions and ambiguity in complex, partially observable scenarios. We present Zero-Shot IP (ZS-IP), a novel framework that couples multi-strategy manipulation (pushing and grasping) with a memory-driven Vision Language Model (VLM) to guide robotic interactions and resolve semantic queries. ZS-IP integrates three key components: (1) an Enhanced Observation (EO) module that augments the VLM's visual perception with both conventional keypoints and our proposed pushlines -- a novel 2D visual augmentation tailored to pushing actions, (2) a memory-guided action module that reinforces semantic reasoning through context lookup, and (3) a robotic controller that executes pushing, pulling, or grasping based on VLM output. Unlike grid-based augmentations optimized for pick-and-place, pushlines capture affordances for contact-rich actions, substantially improving pushing performance. We evaluate ZS-IP on a 7-DOF Franka Panda arm across diverse scenes with varying occlusions and task complexities. Our experiments demonstrate that ZS-IP outperforms passive and viewpoint-based perception techniques such as Mark-Based Visual Prompting (MOKA), particularly in pushing tasks, while preserving the integrity of non-target elements.

"How Do I ...?": Procedural Questions Predominate Student-LLM Chatbot Conversations

Providing scaffolding through educational chatbots built on Large Language Models (LLM) has potential risks and benefits that remain an open area of research. When students navigate impasses, they ask for help by formulating impasse-driven questions. Within interactions with LLM chatbots, such questions shape the user prompts and drive the pedagogical effectiveness of the chatbot's response. This paper focuses on such student questions from two datasets of distinct learning contexts: formative self-study, and summative assessed coursework. We analysed 6,113 messages from both learning contexts, using 11 different LLMs and three human raters to classify student questions using four existing schemas. On the feasibility of using LLMs as raters, results showed moderate-to-good inter-rater reliability, with higher consistency than human raters. The data showed that 'procedural' questions predominated in both learning contexts, but more so when students prepare for summative assessment. These results provide a basis on which to use LLMs for classification of student questions. However, we identify clear limitations in both the ability to classify with schemas and the value of doing so: schemas are limited and thus struggle to accommodate the semantic richness of composite prompts, offering only partial understanding the wider risks and benefits of chatbot integration. In the future, we recommend an analysis approach that captures the nuanced, multi-turn nature of conversation, for example, by applying methods from conversation analysis in discursive psychology.

Validating Political Position Predictions of Arguments

Real-world knowledge representation often requires capturing subjective, continuous attributes -- such as political positions -- that conflict with pairwise validation, the widely accepted gold standard for human evaluation. We address this challenge through a dual-scale validation framework applied to political stance prediction in argumentative discourse, combining pointwise and pairwise human annotation. Using 22 language models, we construct a large-scale knowledge base of political position predictions for 23,228 arguments drawn from 30 debates that appeared on the UK politicial television programme \textit{Question Time}. Pointwise evaluation shows moderate human-model agreement (Krippendorff's $α=0.578$), reflecting intrinsic subjectivity, while pairwise validation reveals substantially stronger alignment between human- and model-derived rankings ($α=0.86$ for the best model). This work contributes: (i) a practical validation methodology for subjective continuous knowledge that balances scalability with reliability; (ii) a validated structured argumentation knowledge base enabling graph-based reasoning and retrieval-augmented generation in political domains; and (iii) evidence that ordinal structure can be extracted from pointwise language models predictions from inherently subjective real-world discourse, advancing knowledge representation capabilities for domains where traditional symbolic or categorical approaches are insufficient.

Vichara: Appellate Judgment Prediction and Explanation for the Indian Judicial System

In jurisdictions like India, where courts face an extensive backlog of cases, artificial intelligence offers transformative potential for legal judgment prediction. A critical subset of this backlog comprises appellate cases, which are formal decisions issued by higher courts reviewing the rulings of lower courts. To this end, we present Vichara, a novel framework tailored to the Indian judicial system that predicts and explains appellate judgments. Vichara processes English-language appellate case proceeding documents and decomposes them into decision points. Decision points are discrete legal determinations that encapsulate the legal issue, deciding authority, outcome, reasoning, and temporal context. The structured representation isolates the core determinations and their context, enabling accurate predictions and interpretable explanations. Vichara's explanations follow a structured format inspired by the IRAC (Issue-Rule-Application-Conclusion) framework and adapted for Indian legal reasoning. This enhances interpretability, allowing legal professionals to assess the soundness of predictions efficiently. We evaluate Vichara on two datasets, PredEx and the expert-annotated subset of the Indian Legal Documents Corpus (ILDC_expert), using four large language models: GPT-4o mini, Llama-3.1-8B, Mistral-7B, and Qwen2.5-7B. Vichara surpasses existing judgment prediction benchmarks on both datasets, with GPT-4o mini achieving the highest performance (F1: 81.5 on PredEx, 80.3 on ILDC_expert), followed by Llama-3.1-8B. Human evaluation of the generated explanations across Clarity, Linking, and Usefulness metrics highlights GPT-4o mini's superior interpretability.

Robo-Saber: Generating and Simulating Virtual Reality Players

We present the first motion generation system for playtesting virtual reality (VR) games. Our player model generates VR headset and handheld controller movements from in-game object arrangements, guided by style exemplars and aligned to maximize simulated gameplay score. We train on the large BOXRR-23 dataset and apply our framework on the popular VR game Beat Saber. The resulting model Robo-Saber produces skilled gameplay and captures diverse player behaviors, mirroring the skill levels and movement patterns specified by input style exemplars. Robo-Saber demonstrates promise in synthesizing rich gameplay data for predictive applications and enabling a physics-based whole-body VR playtesting agent.

JPmHC Dynamical Isometry via Orthogonal Hyper-Connections

Recent advances in deep learning, exemplified by Hyper-Connections (HC), have expanded the residual connection paradigm by introducing wider residual streams and diverse connectivity patterns. While these innovations yield significant performance gains, they compromise the identity mapping property of residual connections, leading to training instability, limited scalability, and increased memory overhead. To address these challenges, we propose JPmHC (Jacobian-spectrum Preserving manifold-constrained Hyper-Connections), a framework that replaces identity skips with a trainable linear mixer acting on n parallel streams while explicitly controlling gradient conditioning. By constraining the mixer M on operator-norm-bounded manifolds (e.g., bistochastic, Stiefel, Grassmann), JPmHC prevents gradient pathologies and enhances stability. JPmHC introduces three key contributions: (i) a free-probability analysis that predicts Jacobian spectra for structured skips, providing actionable design rules for mixer selection; (ii) memory-efficient implicit differentiation for fixed-point projections, reducing activation memory and synchronization overhead; and (iii) a Stiefel-constrained mixer via Cayley transforms, ensuring orthogonality without post-hoc normalization. Empirical evaluations on ARC-AGI demonstrate that JPmHC achieves faster convergence, higher accuracy, and lower computational cost compared to bistochastic baselines. As a flexible and scalable extension of HC, JPmHC advances spectrum-aware, stable, and efficient deep learning, offering insights into topological architecture design and foundational model evolution.

AI Models

LocoreMind/LocoOperator-4B


library_name: transformers license: mit base_model: Qwen/Qwen3-4B-Instruct-2507 tags:

  • code
  • agent
  • tool-calling
  • distillation
  • qwen3
  • gguf
  • llama-cpp language:
  • en pipeline_tag: text-generation

<div align="center"> <img src="assets/loco_operator.png" width="55%" alt="LocoOperator" /> </div> <br> <div align="center">

MODEL Blog GitHub Colab

</div>

Introduction

LocoOperator-4B is a 4B-parameter tool-calling agent model trained via knowledge distillation from Qwen3-Coder-Next inference traces. It specializes in multi-turn codebase exploration — reading files, searching code, and navigating project structures within a Claude Code-style agent loop. Designed as a local sub agent, it runs via llama.cpp at zero API cost.

| | LocoOperator-4B | |:--|:--| | Base Model | Qwen3-4B-Instruct-2507 | | Teacher Model | Qwen3-Coder-Next | | Training Method | Full-parameter SFT (distillation) | | Training Data | 170,356 multi-turn conversation samples | | Max Sequence Length | 16,384 tokens | | Training Hardware | 4x NVIDIA H200 141GB SXM5 | | Training Time | ~25 hours | | Framework | MS-SWIFT |

Key Features

  • Tool-Calling Agent: Generates structured <tool_call> JSON for Read, Grep, Glob, Bash, Write, Edit, and Task (subagent delegation)
  • 100% JSON Validity: Every tool call is valid JSON with all required arguments — outperforming the teacher model (87.6%)
  • Local Deployment: GGUF quantized, runs on Mac Studio via llama.cpp at zero API cost
  • Lightweight Explorer: 4B parameters, optimized for fast codebase search and navigation
  • Multi-Turn: Handles conversation depths of 3–33 messages with consistent tool-calling behavior

Performance

Evaluated on 65 multi-turn conversation samples from diverse open-source projects (scipy, fastapi, arrow, attrs, gevent, gunicorn, etc.), with labels generated by Qwen3-Coder-Next.

Core Metrics

| Metric | Score | |:-------|:-----:| | Tool Call Presence Alignment | 100% (65/65) | | First Tool Type Match | 65.6% (40/61) | | JSON Validity | 100% (76/76) | | Argument Syntax Correctness | 100% (76/76) |

The model perfectly learned when to use tools vs. when to respond with text (100% presence alignment). Tool type mismatches are between semantically similar tools (e.g. Grep vs Read) — different but often valid strategies.

Tool Distribution Comparison

<div align="center"> <img src="assets/tool_distribution.png" width="80%" alt="Tool Distribution Comparison" /> </div>

JSON & Argument Syntax Correctness

| Model | JSON Valid | Argument Syntax Valid | |:------|:---------:|:--------------------:| | LocoOperator-4B | 76/76 (100%) | 76/76 (100%) | | Qwen3-Coder-Next (teacher) | 89/89 (100%) | 78/89 (87.6%) |

LocoOperator-4B achieves perfect structured output. The teacher model has 11 tool calls with missing required arguments (empty arguments: {}).

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "LocoreMind/LocoOperator-4B"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the messages
messages = [
    {
        "role": "system",
        "content": "You are a read-only codebase search specialist.\n\nCRITICAL CONSTRAINTS:\n1. STRICTLY READ-ONLY: You cannot create, edit, delete, move files, or run any state-changing commands. Use tools/bash ONLY for reading (e.g., ls, find, cat, grep).\n2. EFFICIENCY: Spawn multiple parallel tool calls for faster searching.\n3. OUTPUT RULES: \n   - ALWAYS use absolute file paths.\n   - STRICTLY NO EMOJIS in your response.\n   - Output your final report directly. Do not use colons before tool calls.\n\nENV: Working directory is /Users/developer/workspace/code-analyzer (macOS, zsh)."
    },
    {
        "role": "user",
        "content": "Analyze the Black codebase at `/Users/developer/workspace/code-analyzer/projects/black`.\nFind and explain:\n1. How Black discovers config files.\n2. The exact search order for config files.\n3. Supported config file formats.\n4. Where this configuration discovery logic lives in the codebase.\n\nReturn a comprehensive answer with relevant code snippets and absolute file paths."
    }
]

# prepare the model input
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512,
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

content = tokenizer.decode(output_ids, skip_special_tokens=True)
print(content)

Local Deployment

For GGUF quantized deployment with llama.cpp, hybrid proxy routing, and batch analysis pipelines, refer to our GitHub repository.

Training Details

| Parameter | Value | |:----------|:------| | Base model | Qwen3-4B-Instruct-2507 | | Teacher model | Qwen3-Coder-Next | | Method | Full-parameter SFT | | Training data | 170,356 samples | | Hardware | 4x NVIDIA H200 141GB SXM5 | | Parallelism | DDP (no DeepSpeed) | | Precision | BF16 | | Epochs | 1 | | Batch size | 2/GPU, gradient accumulation 4 (effective batch 32) | | Learning rate | 2e-5, warmup ratio 0.03 | | Max sequence length | 16,384 tokens | | Template | qwen3_nothinking | | Framework | MS-SWIFT | | Training time | ~25 hours | | Checkpoint | Step 2524 |

Known Limitations

  • First-tool-type match is 65.6% — the model sometimes picks a different (but not necessarily wrong) tool than the teacher
  • Tends to under-generate parallel tool calls compared to the teacher (76 vs 89 total calls across 65 samples)
  • Preference for Bash over Read may indicate the model defaults to shell commands where file reads would be more appropriate
  • Evaluated on 65 samples only; larger-scale evaluation needed

License

MIT

Acknowledgments

  • Qwen Team for the Qwen3-4B-Instruct-2507 base model
  • MS-SWIFT for the training framework
  • llama.cpp for efficient local inference
  • Anthropic for the Claude Code agent loop design that inspired this work

Author: LocoreMind

Likes: 16

Downloads: 22

Tags: transformers, safetensors, qwen3, text-generation, code, agent, tool-calling, distillation, gguf, llama-cpp, conversational, en, base_model:Qwen/Qwen3-4B-Instruct-2507, base_model:finetune:Qwen/Qwen3-4B-Instruct-2507, license:mit, text-generation-inference, endpoints_compatible, region:us

lokahq/Trinity-Mini-DrugProt-Think


license: apache-2.0 base_model: arcee-ai/Trinity-Mini library_name: peft pipeline_tag: text-generation tags:

  • lora
  • peft
  • grpo
  • reinforcement-learning
  • biomedical
  • relation-extraction
  • drug-protein
  • moe language:
  • en

<p align="center"> <img src="https://huggingface.co/lokahq/Trinity-Mini-DrugProt-Think/resolve/main/assets/logo.png" alt="Trinity-Mini-DrugProt-Think" style="width:100%; max-width:100%;" /> </p> <p align="center"> <strong>Trinity-Mini-DrugProt-Think</strong><br/> RLVR (GRPO) + LoRA post-training on Arcee Trinity Mini for DrugProt relation classification. </p> <p align="center"><a href="https://lokahq.github.io/Trinity-Mini-DrugProt-Think/">📝 <strong>Report</strong></a> &nbsp;|&nbsp; <a href="https://medium.com/loka-engineering/deploying-trinity-mini-drugprot-think-on-amazon-sagemaker-ai-9e1c1c430ce9"><img src="https://www.sysgroup.com/wp-content/uploads/2025/02/Amazon_Web_Services-Logo.wine_.png" style="height:16px; width:auto; vertical-align:middle; display:inline-block;"/> <strong>AWS deployment guide</strong></a> &nbsp;|&nbsp; <a href="https://github.com/LokaHQ/Trinity-Mini-DrugProt-Think" aria-label="GitHub"><svg viewBox="0 0 16 16" fill="currentColor" width="16" height="16" style="vertical-align:middle; display:inline-block;"><path d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27s1.36.09 2 .27c1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.01 8.01 0 0 0 16 8c0-4.42-3.58-8-8-8z"/></svg> <strong>GitHub</strong></a></p>

Trinity-Mini-DrugProt-Think

A LoRA adapter fine-tuned on Arcee Trinity Mini using GRPO (Group Relative Policy Optimization) for drug-protein relation extraction on the DrugProt (BioCreative VII) benchmark. The model classifies 13 types of drug-protein interactions from PubMed abstracts, producing structured pharmacological reasoning traces before giving its answer.

Model Details

| Property | Value | |---|---| | Base Model | arcee-ai/Trinity-Mini | | Architecture | Sparse MoE (26B total / 3B active) | | Fine-tuning Method | LoRA (Low-Rank Adaptation) | | Training Method | GRPO (Reinforcement Learning) | | Training Data | maziyar/OpenMed_DrugProt | | Task | Drug-protein relation extraction (13-way classification) | | Trainable Parameters | LoRA rank=16, all projection layers | | License | Apache 2.0 |

Training Configuration

| Parameter | Value | |---|---| | LoRA Alpha (α) | 64 | | LoRA Rank | 16 | | Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj + experts | | Learning Rate | 3e-6 | | Batch Size | 128 | | Rollouts per Example | 8 | | Max Generation Tokens | 2048 | | Temperature | 0.7 |

Quick Start

Installation

pip install transformers peft torch accelerate

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

base_model_id = "arcee-ai/Trinity-Mini"
adapter_id = "lokahq/Trinity-Mini-DrugProt-Think"

tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)
model = PeftModel.from_pretrained(model, adapter_id)

messages = [
    {
        "role": "system",
        "content": (
            "You are an expert biomedical relation extraction assistant. Your task is to identify the type of interaction between a drug/chemical and a gene/protein in biomedical text.\n\n"
            "For each question:\n"
            "1. First, wrap your detailed biomedical reasoning inside <think></think> tags\n"
            "2. Analyze the context around both entities to understand their relationship\n"
            "3. Consider the pharmacological and molecular mechanisms involved\n"
            "4. Then provide your final answer inside \\boxed{} using exactly one letter (A-M)\n\n"
            "The 13 DrugProt relation types are:\n"
            "A. INDIRECT-DOWNREGULATOR - Chemical indirectly decreases protein activity/expression\n"
            "B. INDIRECT-UPREGULATOR - Chemical indirectly increases protein activity/expression\n"
            "C. DIRECT-REGULATOR - Chemical directly regulates protein (mechanism unspecified)\n"
            "D. ACTIVATOR - Chemical activates the protein\n"
            "E. INHIBITOR - Chemical inhibits the protein\n"
            "F. AGONIST - Chemical acts as an agonist of the receptor/protein\n"
            "G. AGONIST-ACTIVATOR - Chemical is both agonist and activator\n"
            "H. AGONIST-INHIBITOR - Chemical is agonist but inhibits downstream effects\n"
            "I. ANTAGONIST - Chemical acts as an antagonist of the receptor/protein\n"
            "J. PRODUCT-OF - Chemical is a product of the enzyme\n"
            "K. SUBSTRATE - Chemical is a substrate of the enzyme\n"
            "L. SUBSTRATE_PRODUCT-OF - Chemical is both substrate and product\n"
            "M. PART-OF - Chemical is part of the protein complex\n\n"
            "Example format:\n"
            "<think>\n"
            "The text describes [chemical] and [protein]. Based on the context...\n"
            "- The phrase \"[relevant text]\" indicates that...\n"
            "- This suggests a [type] relationship because...\n"
            "</think>\n"
            "\\boxed{A}"
        )
    },
    {
        "role": "user",
        "content": (
            "Abstract: [PASTE PUBMED ABSTRACT HERE]\n\n"
            "Chemical entity: [DRUG NAME]\n"
            "Protein entity: [PROTEIN NAME]\n\n"
            "What is the relationship between the chemical and protein entities? "
            "Choose from: A) INHIBITOR B) SUBSTRATE C) INDIRECT-DOWNREGULATOR "
            "D) INDIRECT-UPREGULATOR E) AGONIST F) ANTAGONIST G) ACTIVATOR "
            "H) PRODUCT-OF I) AGONIST-ACTIVATOR J) INDIRECT-UPREGULATOR "
            "K) PART-OF L) SUBSTRATE_PRODUCT-OF M) NOT\n\n"
            "Think step by step, then provide your answer in \\boxed{} format."
        )
    }
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.7, top_p=0.75)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Progress

Training ran for ~100 steps on Prime Intellect infrastructure. Best accuracy reward reached ~0.83 during training.

Limitations

  • This is a LoRA adapter and requires the base model (arcee-ai/Trinity-Mini) to run
  • Evaluated on training-split held-out data; not yet benchmarked on the official DrugProt test set
  • Optimized specifically for 13-way DrugProt classification; may not generalize to other biomedical RE tasks

Citation

<div class="citation-block"> <pre><code>@misc{jakimovski2026drugprotrl, title = {Post-Training an Open MoE Model to Extract Drug-Protein Relations: Trinity-Mini-DrugProt-Think}, author = {Jakimovski, Bojan and Kalinovski, Petar}, year = {2026}, month = feb, howpublished = {Blog post}, url = {https://github.com/LokaHQ/Trinity-Mini-DrugProt-Think} }</code></pre> </div> ```

Acknowledgements

Authors

Bojan Jakimovski · Petar Kalinovski · Loka

Author: lokahq

Likes: 4

Downloads: 0

Tags: peft, safetensors, lora, grpo, reinforcement-learning, biomedical, relation-extraction, drug-protein, moe, text-generation, conversational, en, base_model:arcee-ai/Trinity-Mini, base_model:adapter:arcee-ai/Trinity-Mini, license:apache-2.0, region:us

serhiiseletskyi/intelli-embed-v2


language: en license: apache-2.0 tags:

  • sentence-transformers
  • sentence-similarity
  • feature-extraction
  • embeddings
  • onnx
  • int8 pipeline_tag: sentence-similarity base_model: Snowflake/snowflake-arctic-embed-l-v2.0

intelli-embed-v2

The best-performing local embedding model for GraphRAG and memory-augmented AI applications.

Built for apps that store, retrieve, and deduplicate personal memories in graph databases — intelli-embed-v2 achieves 98% of Azure text-embedding-3-small quality while running entirely on-device at ~10 ms per embedding. No API calls, no rate limits, no data leaving your infrastructure.

| Metric | Value | |--------|-------| | Sep (SW-engineering, 20 pairs) | 0.484 (GOOD) | | vs Azure text-embedding-3-large | 94% | | vs Azure text-embedding-3-small | 98% | | Inference p50 (INT8 ONNX, CPU) | ~10 ms | | Embedding dimension | 1024 | | Max sequence length | 512 tokens | | Model size (fp32 safetensors) | 2.17 GB | | Model size (INT8 ONNX) | 542 MB |

Training

Fine-tuned from Snowflake/snowflake-arctic-embed-l-v2.0 using a three-phase curriculum that distills knowledge from Azure text-embedding-3-large across 200k real-world sentences.

Three-phase curriculum fine-tuning on arctic-l-v2:

| Phase | Loss | Data | Duration | |-------|------|------|----------| | 1 | GISTEmbedLoss (mxbai as teacher) | 100k SW-engineering pairs | ~1.6h | | 2 | MSE distillation from azure-large embeddings | 200k sentences | ~9 min | | 3 | Hard-negative MultipleNegativesRankingLoss | 7107 mined triplets | ~77s |

Usage

With sentence-transformers

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("serhiiseletskyi/intelli-embed-v2")
embeddings = model.encode(["Hello world", "Another sentence"])
print(embeddings.shape)  # (2, 1024)

With ONNX Runtime (INT8, recommended for CPU inference)

import onnxruntime as ort
from transformers import AutoTokenizer
import numpy as np

tokenizer = AutoTokenizer.from_pretrained("serhiiseletskyi/intelli-embed-v2")
session = ort.InferenceSession("onnx/model_quantized.onnx")

def embed(texts):
    enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="np")
    out = session.run(None, {"input_ids": enc["input_ids"], "attention_mask": enc["attention_mask"]})[0]
    # mean pool + L2 normalize
    mask = enc["attention_mask"][..., None].astype(np.float32)
    pooled = (out * mask).sum(1) / mask.sum(1)
    return pooled / np.linalg.norm(pooled, axis=1, keepdims=True)

vecs = embed(["Hello world", "Another sentence"])
print(vecs.shape)  # (2, 1024)

With ort-node (Node.js / TypeScript)

import * as ort from "onnxruntime-node";
import { AutoTokenizer } from "@huggingface/transformers";

const tokenizer = await AutoTokenizer.from_pretrained("serhiiseletskyi/intelli-embed-v2");
const session = await ort.InferenceSession.create("onnx/model_quantized.onnx");

async function embed(texts: string[]): Promise<number[][]> {
  const enc = await tokenizer(texts, { padding: true, truncation: true, max_length: 512 });
  const inputIds = new ort.Tensor("int64", enc.input_ids.data, enc.input_ids.dims);
  const attentionMask = new ort.Tensor("int64", enc.attention_mask.data, enc.attention_mask.dims);
  const result = await session.run({ input_ids: inputIds, attention_mask: attentionMask });
  // result["last_hidden_state"] → mean pool → L2 normalize
  // ...
  return vectors;
}

Files

| File | Size | Description | |------|------|-------------| | model.safetensors | 2.17 GB | Full fp32 model weights (sentence-transformers compatible) | | onnx/model.onnx | 0.4 MB | ONNX proto (references external data file) | | onnx/model.onnx_data | 2.16 GB | ONNX external weight data (fp32) | | onnx/model_quantized.onnx | 542 MB | INT8 dynamic quantization (recommended for CPU) | | tokenizer.json | 16 MB | Tokenizer (XLM-RoBERTa based) | | 1_Pooling/config.json | — | Mean pooling config |

Benchmark Results (run15, 2026-02-23)

Evaluated on a 6-suite benchmark including SW-engineering pairs, memory-domain pairs, dedup fitness, asymmetric retrieval, negation safety, and entity description retrieval.

| Provider | Sep | Grade | p50ms | |----------|-----|-------|-------| | azure-large (cloud) | 0.515 | GOOD | ~110 | | azure-small (cloud) | 0.511 | GOOD | ~80 | | intelli-embed-v2 (INT8) | 0.484 | GOOD | ~10 | | arctic-l-v2 (q8, base model) | 0.469 | GOOD | ~10 | | intelli-ensemble | 0.450 | EXCELLENT | ~86 |

Sep = mean(PosSim) − mean(NegSim) on SW-engineering pairs — higher is better.

OpenMemory Use-Case Metrics

| Metric | Value | Notes | |--------|-------|-------| | memSep | 0.439 | EXCELLENT — personal memory discrimination | | dedupGap | 0.102 | Near-dedup vs not-dedup cosine delta | | asyncSep | 0.240 | FAIR — short query → long memory retrieval | | negGap | 0.026 | Negation safety (BM25 gate still recommended) | | supSim | 0.672 | Supersede zone (~0.75–0.92 is ideal) | | entSep | 0.491 | GOOD — entity description retrieval |

License

Apache 2.0 — inherited from base model Snowflake/snowflake-arctic-embed-l-v2.0.

Author: serhiiseletskyi

Likes: 2

Downloads: 0

Tags: sentence-transformers, onnx, safetensors, xlm-roberta, sentence-similarity, feature-extraction, embeddings, int8, en, base_model:Snowflake/snowflake-arctic-embed-l-v2.0, base_model:quantized:Snowflake/snowflake-arctic-embed-l-v2.0, license:apache-2.0, text-embeddings-inference, endpoints_compatible, region:us

MBZUAI/MedMO-8B-Next


license: apache-2.0 library_name: transformers pipeline_tag: image-text-to-text tags:

  • medical
  • multimodal
  • grounding
  • report-generation
  • radiology
  • clinical-reasoning
  • mri
  • ct
  • histopathology
  • x-ray
  • fundus

MedMO-8B-Next: Grounding and Understanding Multimodal Large Language Model for Medical Images

Paper Model Model Model License

<p align="center"> <img src="MedMO-logo.png" alt="MedMO Logo" width="300"/> </p>

MedMO-8B-Next is the latest and most powerful iteration of the MedMO family — an open-source multimodal foundation model purpose-built for comprehensive medical image understanding and grounding. Trained on 26M+ diverse medical samples across 45 datasets, MedMO-8B-Next achieves state-of-the-art performance across all major medical imaging benchmarks, outperforming both open-source and closed-source competitors on VQA, Text QA, grounding, and report generation tasks.


🏆 Benchmark Performance

VQA & Text QA Results

MedMO-8B-Next sets a new state-of-the-art across the board, achieving the highest average scores on both medical VQA and Text QA benchmarks — surpassing strong baselines including Lingshu-7B and Fleming-VL-8B.

OMIVQA = OmniMedVQA · MedXQA = MedXpertQA · Medbullets reported as op4/op5

Medical VQA Benchmarks

| Model | MMMU-Med | VQA-RAD (closed/all) | SLAKE (closed/all) | PathVQA | PMC-VQA | OmniMedVQA | MedXpertQA | Avg. | |---|---|---|---|---|---|---|---|---| | Lingshu-7B | 54.0 | 77.2 / 43.0 | 82.4 / 33.2 | 61.9 | 54.2 | 82.9 | 26.9 | 57.3 | | Fleming-VL-8B | 63.3 | 78.4 / 56.0 | 86.9 / 80.0 | 62.9 | 64.3 | 88.2 | 21.6 | 66.8 | | MedMO-8B-Next | 65.3 | 80.4 / 65.0 | 75.5 / 74.7 | 57.3 | 70.3 | 88.8 | 48.9 | 69.6 |

Medical Text QA Benchmarks

| Model | MMLU-Med | PubMedQA | MedMCQA | MedQA | Medbullets (op4/op5) | MedXpertQA | SGPQA | Avg. | |---|---|---|---|---|---|---|---|---| | Lingshu-7B | 69.6 | 75.8 | 56.3 | 63.5 | 62.0 / 53.8 | 16.4 | 27.5 | 51.1 | | Fleming-VL-8B | 71.8 | 74.0 | 51.8 | 53.7 | 40.5 | 12.1 | 24.9 | 46.9 | | MedMO-8B-Next | 80.2 | 75.6 | 62.0 | 83.8 | 65.2 / 57.8 | 20.9 | 35.5 | 60.1 |

Bold = best result. MedMO-8B-Next achieves the highest average on both VQA (69.6) and Text QA (60.1) benchmarks.

  • Benchmarked on AMD MI210 GPU.

Supported Imaging Modalities

| Domain | Modalities | |---|---| | Radiology | X-ray, CT, MRI, Ultrasound | | Pathology | Whole-slide imaging, Microscopy | | Ophthalmology | Fundus photography, OCT | | Dermatology | Clinical skin images | | Nuclear Medicine | PET, SPECT |


🚀 Quick Start

Installation

pip install transformers torch qwen-vl-utils

Basic Usage

from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
import torch

# Load model
model = Qwen3VLForConditionalGeneration.from_pretrained(
    "MBZUAI/MedMO-8B-Next",
    torch_dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
    device_map="auto",
)

processor = AutoProcessor.from_pretrained("MBZUAI/MedMO-8B-Next")

# Prepare input
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "path/to/medical/image.png",
            },
            {"type": "text", "text": "What abnormalities are present in this chest X-ray?"},
        ],
    }
]

# Process and generate
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
).to(model.device)

generated_ids = model.generate(**inputs, max_new_tokens=512)
generated_ids_trimmed = [
    out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text[0])

Example: Disease Localization with Bounding Boxes

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": "chest_xray.png"},
            {"type": "text", "text": "Detect and localize all abnormalities in this image."},
        ],
    }
]
# Example output:
# "Fractures <box>[[156, 516, 231, 607], [240, 529, 296, 581]]</box>"

Example: Radiology Report Generation

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": "ct_scan.png"},
            {"type": "text", "text": "Generate a detailed radiology report for this CT scan."},
        ],
    }
]
# MedMO-8B-Next generates comprehensive clinical reports with findings and impressions

📦 Model Family

| Model | Parameters | Best For | |---|---|---| | MedMO-8B-Next | 8B | Highest accuracy, all tasks — recommended | | MedMO-8B | 8B | Previous generation | | MedMO-4B | 4B | Resource-constrained environments |


📄 Citation

If you use MedMO in your research, please cite our paper:

@article{deria2026medmo,
  title={MedMO: Grounding and Understanding Multimodal Large Language Model for Medical Images},
  author={Deria, Ankan and Kumar, Komal and Dukre, Adinath Madhavrao and Segal, Eran and Khan, Salman and Razzak, Imran},
  journal={arXiv preprint arXiv:2602.06965},
  year={2026}
}

📜 License

This project is licensed under the Apache License 2.0 — see the LICENSE file for details.

Author: MBZUAI

Likes: 2

Downloads: 0

Tags: transformers, safetensors, qwen3_vl, image-text-to-text, medical, multimodal, grounding, report-generation, radiology, clinical-reasoning, mri, ct, histopathology, x-ray, fundus, conversational, arxiv:2602.06965, license:apache-2.0, endpoints_compatible, region:us

badaramoni/wave-field-v4-825m


license: cc-by-nc-nd-4.0 language:

  • en tags:
  • wave-field
  • transformer
  • research
  • text-generation pipeline_tag: text-generation

Wave Field Transformer V4 - 825M

A new architecture using Wave Field Attention with O(n log n) complexity via FFT-based interference patterns, replacing standard O(n^2) self-attention.

Model Details

  • Architecture: Wave Field Transformer V4
  • Parameters: 825,218,692 (825M)
  • Embedding Dim: 1536
  • Layers: 24
  • Attention Heads: 16
  • FFN Dim: 6144
  • Max Sequence Length: 256 tokens
  • Vocabulary: 30,004 (BPE + 4 special tokens)
  • Precision: FP16
  • Training Tokens: 1.33 Billion (C4 web text)

Key Innovation

Wave Field Attention replaces standard dot-product attention with FFT-based wave interference, achieving O(n log n) time complexity instead of O(n^2). This enables efficient scaling to much longer context windows.

Training

  • Pretraining: 1.33B tokens from C4 (Common Crawl) on NVIDIA H100 80GB
  • Progressive Scaling: Model was grown from 52M to 100M to 268M to 825M using continuous learning with knowledge preservation
  • Chat Fine-tuning: Instruction tuning (SlimOrca), DPO alignment (HH-RLHF), and conversation training (UltraChat)

Eval Results

| Checkpoint | Eval PPL (C4) | Accuracy | |-----------|---------------|----------| | Pretrained Base | 72.2 | 27.1% | | After Chat Pipeline | 91.0 | 25.7% |

Available Checkpoints

  • wave_v4_1b_best_fp16.pt - Best pretrained base model (PPL 72.2)
  • wave_v4_chat_1b_v2_fp16.pt - Chat-tuned version

Limitations

  • This is an early research checkpoint, not a production model
  • 1.33B training tokens is limited (GPT-2 trained on 40B tokens)
  • 256 token context window (architecture supports longer, not yet trained for it)
  • Generation quality is limited at this stage
  • Not suitable for production or commercial use

License

CC-BY-NC-ND-4.0 - This model is released for research and evaluation purposes only.

  • Non-commercial use only
  • No derivatives - you may not fine-tune or modify the weights
  • Attribution required - credit Wave Field Transformer by Avinash Badaramoni

Architecture Code

The architecture source code is proprietary and not included. These weights cannot be loaded without the Wave Field Transformer V4 implementation.

Contact

For research collaboration, architecture licensing, or questions: badaramoni.avinash@gmail.com

Author: badaramoni

Likes: 2

Downloads: 0

Tags: wave-field, transformer, research, text-generation, en, license:cc-by-nc-nd-4.0, region:us

Vrda/medgemma-27b-clinical-error-sft


library_name: peft base_model:

  • google/medgemma-27b-it license: apache-2.0 language:
  • en
  • hr tags:
  • medgemma
  • medical
  • clinical-error-detection
  • patient-safety
  • lora
  • sft
  • emergency-medicine
  • internal-medicine pipeline_tag: text-generation datasets:
  • custom model-index:
  • name: medgemma-27b-clinical-error-detection-sft results:
    • task: type: text-generation name: Clinical Error Detection metrics:
      • name: Accuracy (single-pass) type: accuracy value: 40.5
      • name: Accuracy (multi-agent pipeline) type: accuracy value: 60.4

MedGemma 27B — Clinical Error Detection (SFT LoRA Adapter)

A LoRA fine-tuned adapter for Google MedGemma 27B-IT trained to detect critical patient safety errors in clinical documentation.

Model Description

This adapter was trained as part of Clinipal — an AI-powered clinical error detection system that acts as an automated "second reviewer" of medical reports. The model identifies 6 categories of high-impact safety errors in emergency department and internal medicine documentation.

Error Categories

| Error Type | Description | |---|---| | CONTRAINDICATED_MEDICATION | Drug dangerous given patient's conditions/allergies | | DANGEROUS_DOSAGE | Dose significantly outside therapeutic range | | CLINICAL_SCORE_ERROR | Miscalculated risk score affecting treatment decisions | | MISSING_CRITICAL_TREATMENT | Life-saving intervention clearly omitted | | TREATMENT_LOGIC_FAILURE | Treatment contradicts the diagnosis | | MISSING_CRITICAL_WORKUP | Essential diagnostic tests not ordered |

Training

Dataset

  • 300 synthetic clinical reports with annotated errors, generated using GPT-5.2, Gemini 3 Flash Preview, and DeepSeek-V3-R1
  • Synthetic reports designed to emulate real-world emergency department documentation
  • 150 real clinical reports from Internal Medicine, annotated by 3 physicians with realistic inserted errors (100 used as held-out test set)

LoRA Configuration

| Parameter | Value | |---|---| | Base model | google/medgemma-27b-it | | PEFT type | LoRA | | Rank (r) | 32 | | Alpha | 64 | | Dropout | 0.05 | | Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj | | Task type | CAUSAL_LM | | Adapter size | ~889 MB |

Results

Evaluated on 100 held-out real-world clinical cases with physician-annotated errors:

| Configuration | Accuracy | |---|---| | Baseline MedGemma 27B (no fine-tuning) | 22.0% | | This adapter (single-pass) | 40.5% | | This adapter (multi-agent pipeline) | 60.4% | | GPT-OSS-120b | 38.1% | | Gemini 3 Flash Preview | 35.3% |

The multi-agent pipeline runs 6 sequential inference calls (2 first-pass reviewers + 3 specialist critics + 1 final adjudicator) using the same adapter, achieving a 3x improvement over the baseline.

Usage

With vLLM (recommended for production)

# Serve with vLLM + LoRA
python -m vllm.entrypoints.openai.api_server \
  --model google/medgemma-27b-it \
  --port 8000 \
  --enable-lora \
  --lora-modules "sft_adapter=<path-to-this-adapter>" \
  --max-lora-rank 64 \
  --max-model-len 8192 \
  --gpu-memory-utilization 0.75 \
  --dtype bfloat16

Then call the API:

import requests

response = requests.post("http://localhost:8000/v1/chat/completions", json={
    "model": "sft_adapter",
    "messages": [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": f"Analyze the following clinical note:\n\n{clinical_note}"}
    ],
    "temperature": 0.6,
    "max_tokens": 1024
})

With Transformers + PEFT

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained(
    "google/medgemma-27b-it",
    torch_dtype="bfloat16",
    device_map="auto",
)
model = PeftModel.from_pretrained(base_model, "Vrda/medgemma-27b-clinical-error-sft")
tokenizer = AutoTokenizer.from_pretrained("Vrda/medgemma-27b-clinical-error-sft")

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": f"Analyze the following clinical note:\n\n{clinical_note}"}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=1024, temperature=0.6, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

System Prompt

The model expects this system prompt for optimal performance:

You are an emergency medicine clinical safety reviewer analyzing a real patient's
emergency department documentation. Your ONLY task is to identify CRITICAL patient
safety errors — the kind that could cause direct harm if missed.

FOCUS EXCLUSIVELY on these error types:

1. CONTRAINDICATED_MEDICATION
2. DANGEROUS_DOSAGE
3. CLINICAL_SCORE_ERROR
4. MISSING_CRITICAL_TREATMENT
5. TREATMENT_LOGIC_FAILURE
6. MISSING_CRITICAL_WORKUP

STRICT RULES:
- Report AT MOST 3 errors, strictly prioritized by patient safety impact.
- Only report errors you are ≥80% confident about.
- Do NOT report style preferences, minor documentation gaps, or speculative concerns.
- If no critical safety errors exist, return an empty errors array.

IMPORTANT — THINK STEP BY STEP:
For each potential error, include a "reasoning" field with your clinical logic.

Respond with ONLY valid JSON:
{
  "errors": [
    {
      "type": "CONTRAINDICATED_MEDICATION|DANGEROUS_DOSAGE|CLINICAL_SCORE_ERROR|MISSING_CRITICAL_TREATMENT|TREATMENT_LOGIC_FAILURE|MISSING_CRITICAL_WORKUP",
      "severity": "critical|warning",
      "reasoning": "Step-by-step clinical logic...",
      "problem": "1-2 sentence explanation",
      "recommendation": "1 sentence corrective action",
      "confidence": 0.95
    }
  ],
  "summary": "One-sentence overall safety assessment."
}

Output Format

The model outputs structured JSON:

{
  "errors": [
    {
      "type": "CONTRAINDICATED_MEDICATION",
      "severity": "critical",
      "reasoning": "The patient has documented bilateral renal artery stenosis. Perindopril is an ACE inhibitor, which is strictly contraindicated in this condition as it can precipitate acute kidney injury and hyperkalemia.",
      "problem": "ACE inhibitor (perindopril) prescribed despite bilateral renal artery stenosis.",
      "recommendation": "Immediately discontinue perindopril and monitor renal function and potassium levels.",
      "confidence": 0.99
    }
  ],
  "summary": "Critical medication contraindication identified requiring immediate intervention."
}

Multi-Agent Pipeline

For best results, use the multi-agent pipeline (6 sequential calls with the same adapter):

  1. First-Pass Conservative (temp=0.6) — high-precision scan
  2. First-Pass Exploratory (temp=1.0) — high-recall scan
  3. Diagnostics Critic (temp=0.7) — diagnostic reasoning focus
  4. Treatment Plan Critic (temp=0.75) — medication safety focus
  5. Follow-Up Critic (temp=0.7) — disposition safety focus
  6. Final Adjudicator (temp=0.5) — synthesizes, de-duplicates, selects top 3

See the Clinipal repository for the full pipeline implementation.

Limitations

  • Trained primarily on internal medicine and emergency department cases; may underperform on other specialties
  • Accuracy is 40.5% single-pass (60.4% with multi-agent); not suitable as a sole decision-maker
  • May produce false positives; all findings should be reviewed by a qualified clinician
  • Best performance with English-language clinical notes

Citation

If you use this model, please cite:

@misc{clinipal2026,
  title={Clinipal: AI-Powered Clinical Error Detection Using Fine-Tuned MedGemma 27B},
  author={Vrdoljak, J. and Luksic, I. and Baric, D.},
  year={2026},
  url={https://github.com/IvanLuksic/medgemma-next}
}

@article{krabic2026llm,
  title={Large language models as second reviewers for medical errors in real-world internal medicine reports: a prospective comparative study of open- and closed-source models},
  author={Krabic, R. and Viculin, I. and Boban, Z. and Kumric, M. and Vilovic, M. and Vrdoljak, J. and Bozic, J.},
  journal={International Journal of Medical Informatics},
  volume={211},
  pages={106316},
  year={2026},
  doi={10.1016/j.ijmedinf.2026.106316},
  pmid={41655522}
}

License

This adapter is released under the Apache 2.0 license. The base model (google/medgemma-27b-it) is subject to Google's Gemma license terms.

Author: Vrda

Likes: 2

Downloads: 0

Tags: peft, safetensors, medgemma, medical, clinical-error-detection, patient-safety, lora, sft, emergency-medicine, internal-medicine, text-generation, conversational, en, hr, dataset:custom, base_model:google/medgemma-27b-it, base_model:adapter:google/medgemma-27b-it, license:apache-2.0, model-index, region:us

Mungert/QED-Nano-GGUF


library_name: transformers license: apache-2.0 language:

  • en base_model:
  • lm-provers/QED-Nano-SFT datasets:
  • lm-provers/FineProofs-RL

<span style="color: #7FFF7F;">QED-Nano GGUF Models</span>

<span style="color: #7F7FFF;">Model Generation Details</span>

This model was generated using llama.cpp at commit 05fa625ea.


<a href="https://readyforquantum.com/huggingface_gguf_selection_guide.html" style="color: #7FFF7F;"> Click here to get info on choosing the right GGUF model format </a>
<!--Begin Original Model Card-->

QED-Nano

logo.png

Table of Contents

  1. Model Summary
  2. How to use
  3. Evaluation
  4. Limitations
  5. License

Model Summary

QED-Nano is a 4B parameter model explicitly post-trained to strengthen its proof-writing capabilities. Despite its small size, QED-Nano achieves an impressive 40% score on the challenging IMO-ProofBench benchmark (+20% over the Qwen3 base model), matching the performance of GPT-OSS-120B from OpenAI. With an agent scaffold that scales inference-time compute to over 1M tokens per problem, QED-Nano approaches the performance of Gemini-3-Pro. Crucially, the same agentic scaffold on the base model (Qwen3-4B-Thinking-2507) barely improves performance.

imoproofbench.png

QED-Nano is based on Qwen/Qwen3-4B-Thinking-2507, and was post-trained via a combination of supervised fine-tuning and reinforcement learning with a reasoning cache (to be able to train for continual improvement with our agentic scaffold at test time) on a mixture of Olympiads proof problems from various public sources.

For more details refer to our blog post: https://huggingface.co/spaces/lm-provers/qed-nano-blogpost

How to use

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "lm-provers/QED-Nano"
device = "cuda"  # for GPU usage or "cpu" for CPU usage

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
).to(device)

# prepare the model input
prompt = "Generate a rigorous proof to the following question: is \sqrt{2} rational or irrational?"
messages_think = [
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages_think,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# Generate the output
generated_ids = model.generate(**model_inputs, max_new_tokens=32768)

# Get and decode the output
output_ids = generated_ids[0][len(model_inputs.input_ids[0]) :]
print(tokenizer.decode(output_ids, skip_special_tokens=True))

[!TIP] We recommend setting temperature=0.6 and top_p=0.95 in the sampling parameters.

vLLM and SGLang

You can use vLLM and SGLang to deploy the model in an API compatible with OpenAI format.

SGLang

python -m sglang.launch_server --model-path lm-provers/QED-Nano

vLLM

vllm serve lm-provers/QED-Nano

Evaluation

In this section, we report the evaluation results of QED-Nano on IMO-ProofBench, ProofBench, and IMO-AnswerBench. All evaluations except those on IMO-AnswerBench are reported as avg@3 unless stated otherwise.

| Model | IMO-ProofBench | ProofBench | IMO-AnswerBench | |:---|:---:|:---:|:---:| | Qwen3-4B-Thinking-2507 | 20.4 (2.6) | 19.5 (0.9) | 55.8 | | QED-Nano-SFT | 39.5 (2.9) | 33.3 (0.5) | 57.5 | | QED-Nano | 40.0 (0.6) | 44.9 (3.4) | 67.5 | | QED-Nano (Agent) | 54.0 (3.7) | 54.4 (2.4) | - | | Qwen3-30B-A3B-Thinking-2507 | 27.6 (1.0) | 26.1 (2.4) | 67.0 | | Qwen3-235B-A22B-Thinking-2507 | 34.1 (0.7) | 33.7 (1.1) | 70.5 | | Nomos-1 | 40.3 (3.5) | 28.3 (3.9) | 49.0 | | GPT-OSS-20B | 38.3 (1.2) | 38.4 (3.9) | 61.5 | | GPT-OSS-120B | 43.1 (3.2) | 47.5 (1.7) | 70.5 | | DeepSeek-Math-V2 | 57.9 (2.0) | 60.6 (0.1) | 75.8 | | Gemini 3 Pro | 58.7 (2.9) | 66.7 (3.1) | 83.2 |

Training

Model

Training Hyperparameters

  • Optimization Steps: 150
  • Number of prompts per batch: 64
  • Number of rollouts per prompt: 16
  • Global batch size: 1024
  • Max Rollout Length: 49,152 tokens
  • Learning Rate Schedule: Constant with a learning rate of 1e-6
  • Sampling temperature: 0.8
  • LLM grader: GPT-OSS-20B with medium reasoning effort and sampling at temperature 1.0

Software & hardware

  • GPU topology (each node has 8xH100s): 7 generator nodes, 4 trainer nodes, 1 grader node.
  • Training time: 4 days or 9,216 H100 hours
  • Training and evaluation framework: CMU-AIRe/QED-Nano

Limitations

QED-Nano is a domain-specific model that is designed for one thing and one thing only: proving theorems. Using as a general assistant will likely produce nonsense outside of this domain. These models should be used as assistive tools rather than definitive sources of information. Users should always verify important information and critically evaluate any generated content.

License

Apache 2.0

Acknowledgements

QED-Nano is a joint collaboration between the research teams at CMU, ETH Zurich, Numina, and Hugging Face. Below is a list of the individual contributors and their affiliations:

CMU

Amrith Setlur, Yuxiao Qu, Ian Wu, and Aviral Kumar

ETH Zurich

Jasper Dekoninck

Numina

Jia Li

Hugging Face

Edward Beeching and Lewis Tunstall

<!--End Original Model Card-->

<span id="testllm" style="color: #7F7FFF;">🚀 If you find these models useful</span>

Help me test my AI-Powered Quantum Network Monitor Assistant with quantum-ready security checks:

👉 Quantum Network Monitor

The full Open Source Code for the Quantum Network Monitor Service available at my github repos ( repos with NetworkMonitor in the name) : Source Code Quantum Network Monitor. You will also find the code I use to quantize the models if you want to do it yourself GGUFModelBuilder

💬 How to test:
Choose an AI assistant type:

  • TurboLLM (GPT-4.1-mini)
  • HugLLM (Hugginface Open-source models)
  • TestLLM (Experimental CPU-only)

What I’m Testing

I’m pushing the limits of small open-source models for AI network monitoring, specifically:

  • Function calling against live network services
  • How small can a model go while still handling:
    • Automated Nmap security scans
    • Quantum-readiness checks
    • Network Monitoring tasks

🟡 TestLLM – Current experimental model (llama.cpp on 2 CPU threads on huggingface docker space):

  • Zero-configuration setup
  • ⏳ 30s load time (slow inference but no API costs) . No token limited as the cost is low.
  • 🔧 Help wanted! If you’re into edge-device AI, let’s collaborate!

Other Assistants

🟢 TurboLLM – Uses gpt-4.1-mini :

  • **It performs very well but unfortunatly OpenAI charges per token. For this reason tokens usage is limited.
  • Create custom cmd processors to run .net code on Quantum Network Monitor Agents
  • Real-time network diagnostics and monitoring
  • Security Audits
  • Penetration testing (Nmap/Metasploit)

🔵 HugLLM – Latest Open-source models:

  • 🌐 Runs on Hugging Face Inference API. Performs pretty well using the lastest models hosted on Novita.

💡 Example commands you could test:

  1. "Give me info on my websites SSL certificate"
  2. "Check if my server is using quantum safe encyption for communication"
  3. "Run a comprehensive security audit on my server"
  4. '"Create a cmd processor to .. (what ever you want)" Note you need to install a Quantum Network Monitor Agent to run the .net code on. This is a very flexible and powerful feature. Use with caution!

Final Word

I fund the servers used to create these model files, run the Quantum Network Monitor service, and pay for inference from Novita and OpenAI—all out of my own pocket. All the code behind the model creation and the Quantum Network Monitor project is open source. Feel free to use whatever you find helpful.

If you appreciate the work, please consider buying me a coffee ☕. Your support helps cover service costs and allows me to raise token limits for everyone.

I'm also open to job opportunities or sponsorship.

Thank you! 😊

Author: Mungert

Likes: 2

Downloads: 0

Tags: transformers, gguf, en, dataset:lm-provers/FineProofs-RL, arxiv:2602.03773, base_model:lm-provers/QED-Nano-SFT, base_model:quantized:lm-provers/QED-Nano-SFT, license:apache-2.0, endpoints_compatible, region:us, conversational

AresAGI/ARES-Nano-SLM


language:

  • pt license: mit library_name: transformers tags:
  • slm
  • causal-lm
  • pt-br
  • ares
  • nano-slm datasets:
  • wikipedia metrics:
  • accuracy model_name: ARES-Nano SLM

ARES-Nano SLM (9.37M) - O Córtex da AGI Local

O ARES-Nano SLM é um modelo de linguagem extremamente reduzido (Small Language Model) com apenas 9.37 milhões de parâmetros, projetado especificamente para atuar como um Extrator de Intenção Sintática em sistemas de AGI (Artificial General Intelligence) locais.

Este modelo representa um marco na soberania tecnológica do projeto ARES, sendo 100% independente de infraestruturas estrangeiras, APIs proprietárias ou frameworks de terceiros como Ollama ou Alibaba.

🚀 Destaques do Modelo

  • Tamanho Único: Com apenas 9.37M de parâmetros, é provavelmente o menor SLM funcional do mundo integrado a uma arquitetura de agente autônomo.
  • Soberania Linguística: Tokenizador BPE nativo para Português Brasileiro (PT-BR) treinado do zero usando o corpus CulturaX/Wikipedia.
  • DNA ARES: Treinado especificamente para converter fala humana em comandos de hardware e API (intencao|parametro), sem o viés de "assistente prestativo" que polui modelos comerciais.
  • Hardware Agnóstico: Roda com latência quase zero em GPUs de entrada (como a RTX 3050) ou até mesmo CPUs modestas.
  • Zero Dependência: Sem chamadas de API, sem necessidade de internet, 100% local.

🧠 Arquitetura

O modelo utiliza uma arquitetura baseada em Llama com configurações otimizadas para tarefas de roteamento:

  • Camadas: 4
  • Cabeças de Atenção: 4
  • Tamanho Oculto: 128
  • Vocabulário: 32.768 tokens (Nativo ARES)

📋 Propósito e Uso

O ARES-Nano não foi feito para conversar sobre filosofia ou escrever poesias, mas sim para ser o Córtex Motor de um agente. Ele extrai intenções como:

  • nav|youtube.com
  • busca|preço do btc
  • visao|objeto
  • leitura|dom

👨‍💻 Criador

Desenvolvido por André Luiz Facincani, como parte da evolução do ecossistema ARES 2.0.


Ares: Consciência local, ação global.

Author: AresAGI

Likes: 1

Downloads: 0

Tags: transformers, safetensors, llama, text-generation, slm, causal-lm, pt-br, ares, nano-slm, conversational, pt, dataset:wikipedia, license:mit, text-generation-inference, endpoints_compatible, region:us

h-naga-2025/qwen3-4b-structured-output-lora-rev.04


base_model: Qwen/Qwen3-4B-Instruct-2507 datasets:

  • u-10bei/structured_data_with_cot_dataset_512_v2 language:
  • en license: apache-2.0 library_name: peft pipeline_tag: text-generation tags:
  • qlora
  • lora
  • structured-output

qwen3-4b-structured-output-lora-rev.04

This repository provides a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using QLoRA (4-bit, Unsloth).

This repository contains LoRA adapter weights only. The base model must be loaded separately.

Training Objective

This adapter is trained to improve structured output accuracy (JSON / YAML / XML / TOML / CSV).

Loss is applied only to the final assistant output, while intermediate reasoning (Chain-of-Thought) is masked.

Training Configuration

  • Base model: Qwen/Qwen3-4B-Instruct-2507
  • Method: QLoRA (4-bit)
  • Max sequence length: 512
  • Epochs: 2
  • Learning rate: 2e-05
  • LoRA: r=32, alpha=64

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "your_id/your-repo"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)

Sources & Terms (IMPORTANT)

Training data: u-10bei/structured_data_with_cot_dataset_512_v2

Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License. Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.

Author: h-naga-2025

Likes: 1

Downloads: 0

Tags: peft, safetensors, qlora, lora, structured-output, text-generation, en, dataset:u-10bei/structured_data_with_cot_dataset_512_v2, base_model:Qwen/Qwen3-4B-Instruct-2507, base_model:adapter:Qwen/Qwen3-4B-Instruct-2507, license:apache-2.0, region:us

artificialguybr/ClayAnimation-Redmond-ZIMAGE


tags:

  • text-to-image
  • lora
  • diffusers
  • template:diffusion-lora widget:
  • output: url: images/015.png text: '-'
  • output: url: images/014.png text: '-'
  • output: url: images/013.png text: '-'
  • output: url: images/011.png text: '-'
  • output: url: images/010.png text: '-'
  • output: url: images/008.png text: '-'
  • output: url: images/007.png text: '-'
  • output: url: images/004.png text: '-'
  • output: url: images/001.png text: '-' base_model: Tongyi-MAI/Z-Image-Turbo instance_prompt: Clay animation, Clay. license: apache-2.0

CLAY ANIMATION REDMOND LORA FOR ZIMAGE TURBO

<Gallery />

Model description

#Clay Animation Style REDMOND LORA.

I'm grateful for the GPU time from Redmond.AI that allowed me to make this model!

This LoRA was trained on Clay Animation Style style images. It generates high-quality clay animation style content with excellent consistency.

Clay Animation Style style images with high detail and consistency

I really hope you like the model and use it!

Trigger words

You should use `Clay animation. Clay.` to trigger the image generation.

Download model

Weight for this model is available in Safetensors format.

How to use

I recommend using this LoRA with ComfyUI for the best results.

License

This model is licensed under the Apache License 2.0


Support My Work

If you like the model and think it's worth it, you can make a donation to support my work:

Follow me on Twitter to get early access to all my new models: @artificialguybr

Visit my website: artificialguy.com

Trigger words

You should use Clay animation to trigger the image generation.

You should use Clay. to trigger the image generation.

Download model

Download them in the Files & versions tab.

Author: artificialguybr

Likes: 1

Downloads: 0

Tags: diffusers, text-to-image, lora, template:diffusion-lora, base_model:Tongyi-MAI/Z-Image-Turbo, base_model:adapter:Tongyi-MAI/Z-Image-Turbo, license:apache-2.0, region:us