Todays AI Summary

AI Developments: Physics Foundation Models, Egocentric Manipulation, and More

Here's a rundown of the latest AI models and research papers:

Research Highlights

  • Walrus: A Cross-Domain Foundation Model for Continuum Dynamics (arXiv:2511.15684): This paper introduces Walrus, a large-scale physics foundation model designed for modeling continuum dynamical systems. Trained on 19 diverse physical domains, including astrophysics, geoscience, and fluid dynamics, Walrus aims to serve as a general-purpose surrogate for physical simulation and a strong initialization point for fine-tuning on new PDE systems. The model incorporates techniques like harmonic-analysis-based stabilization and compute-adaptive tokenization to improve training efficiency and forecast stability.
  • In-N-On: Scaling Egocentric Manipulation with in-the-wild and on-task Data (arXiv:2511.15704): This research explores the use of egocentric videos for learning manipulation policies. The authors categorize human data into "in-the-wild" and "on-task" categories, providing a scalable recipe for data collection and utilization. They introduce the PHSD dataset, containing over 1,000 hours of in-the-wild data and 20 hours of on-task data, and demonstrate that their Human0 policy achieves language following, few-shot learning, and improved robustness.
  • Think Visually, Reason Textually: Vision-Language Synergy in ARC (arXiv:2511.15703): This paper addresses abstract reasoning in foundation models using the Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI). The authors propose Vision-Language Synergy Reasoning (VLSR) and Modality-Switch Self-Correction (MSSC) strategies to leverage the complementary strengths of vision and language. Experiments show a performance improvement over text-only baselines.

Model Releases

  • karpathy/nanochat-d34: This model, created by Karpathy, is a language model with 2.2 billion parameters. It was trained for approximately 100 hours on an 8XH100 node, with a focus on achieving a higher parameter-to-token ratio than traditional methods like Chinchilla. The model achieved a CORE score of 0.3382.
  • polymathic-ai/walrus: This model is the implementation of the Walrus paper mentioned above. It is a 1.3B-parameter space-time Transformer trained to predict the temporal evolution of physical fields across 19 diverse physical domains.
  • Intel/deepmath-v1: This is a 4B parameter mathematical reasoning model that combines a fine-tuned LLM with a sandboxed Python executor. It is built on Qwen3-4B Thinking and trained with GRPO (Group Relative Policy Optimization).

Key Takeaways

  • Physics Foundation Models Emerge: Walrus demonstrates the potential of foundation models in the realm of physical simulation, offering a general-purpose surrogate for diverse continuum dynamics scenarios.
  • Scaling Egocentric Data: The In-N-On research provides a scalable approach to leveraging egocentric videos for learning manipulation policies, highlighting the importance of data categorization and domain adaptation techniques.
  • Synergistic Vision-Language Reasoning: The VLSR and MSSC strategies in the "Think Visually, Reason Textually" paper showcase the benefits of combining visual abstraction with linguistic reasoning for abstract reasoning tasks.
  • Specialized Fine-tuning: The release of Clemylia/Charlotte-AMITY highlights the potential of fine-tuning small language models for specific roles, such as ethical support and friendship.
  • Code-Driven Reasoning: Intel's DeepMath model demonstrates the effectiveness of combining LLMs with code execution for mathematical reasoning, reducing errors and output length.

AI Papers for 2026-02-20

Policy Compiler for Secure Agentic Systems

LLM-based agents are increasingly being deployed in contexts requiring complex authorization policies: customer service protocols, approval workflows, data access restrictions, and regulatory compliance. Embedding these policies in prompts provides no enforcement guarantees. We present PCAS, a Policy Compiler for Agentic Systems that provides deterministic policy enforcement. Enforcing such policies requires tracking information flow across agents, which linear message histories cannot capture. Instead, PCAS models the agentic system state as a dependency graph capturing causal relationships among events such as tool calls, tool results, and messages. Policies are expressed in a Datalog-derived language, as declarative rules that account for transitive information flow and cross-agent provenance. A reference monitor intercepts all actions and blocks violations before execution, providing deterministic enforcement independent of model reasoning. PCAS takes an existing agent implementation and a policy specification, and compiles them into an instrumented system that is policy-compliant by construction, with no security-specific restructuring required. We evaluate PCAS on three case studies: information flow policies for prompt injection defense, approval workflows in a multi-agent pharmacovigilance system, and organizational policies for customer service. On customer service tasks, PCAS improves policy compliance from 48% to 93% across frontier models, with zero policy violations in instrumented runs.

Measuring Mid-2025 LLM-Assistance on Novice Performance in Biology

Large language models (LLMs) perform strongly on biological benchmarks, raising concerns that they may help novice actors acquire dual-use laboratory skills. Yet, whether this translates to improved human performance in the physical laboratory remains unclear. To address this, we conducted a pre-registered, investigator-blinded, randomized controlled trial (June-August 2025; n = 153) evaluating whether LLMs improve novice performance in tasks that collectively model a viral reverse genetics workflow. We observed no significant difference in the primary endpoint of workflow completion (5.2% LLM vs. 6.6% Internet; P = 0.759), nor in the success rate of individual tasks. However, the LLM arm had numerically higher success rates in four of the five tasks, most notably for the cell culture task (68.8% LLM vs. 55.3% Internet; P = 0.059). Post-hoc Bayesian modeling of pooled data estimates an approximate 1.4-fold increase (95% CrI 0.74-2.62) in success for a "typical" reverse genetics task under LLM assistance. Ordinal regression modelling suggests that participants in the LLM arm were more likely to progress through intermediate steps across all tasks (posterior probability of a positive effect: 81%-96%). Overall, mid-2025 LLMs did not substantially increase novice completion of complex laboratory procedures but were associated with a modest performance benefit. These results reveal a gap between in silico benchmarks and real-world utility, underscoring the need for physical-world validation of AI biosecurity assessments as model capabilities and user proficiency evolve.

Calibrate-Then-Act: Cost-Aware Exploration in LLM Agents

LLMs are increasingly being used for complex problems which are not necessarily resolved in a single response, but require interacting with an environment to acquire information. In these scenarios, LLMs must reason about inherent cost-uncertainty tradeoffs in when to stop exploring and commit to an answer. For instance, on a programming task, an LLM should test a generated code snippet if it is uncertain about the correctness of that code; the cost of writing a test is nonzero, but typically lower than the cost of making a mistake. In this work, we show that we can induce LLMs to explicitly reason about balancing these cost-uncertainty tradeoffs, then perform more optimal environment exploration. We formalize multiple tasks, including information retrieval and coding, as sequential decision-making problems under uncertainty. Each problem has latent environment state that can be reasoned about via a prior which is passed to the LLM agent. We introduce a framework called Calibrate-Then-Act (CTA), where we feed the LLM this additional context to enable it to act more optimally. This improvement is preserved even under RL training of both the baseline and CTA. Our results on information-seeking QA and on a simplified coding task show that making cost-benefit tradeoffs explicit with CTA can help agents discover more optimal decision-making strategies.

SPARC: Scenario Planning and Reasoning for Automated C Unit Test Generation

Automated unit test generation for C remains a formidable challenge due to the semantic gap between high-level program intent and the rigid syntactic constraints of pointer arithmetic and manual memory management. While Large Language Models (LLMs) exhibit strong generative capabilities, direct intent-to-code synthesis frequently suffers from the leap-to-code failure mode, where models prematurely emit code without grounding in program structure, constraints, and semantics. This will result in non-compilable tests, hallucinated function signatures, low branch coverage, and semantically irrelevant assertions that cannot properly capture bugs. We introduce SPARC, a neuro-symbolic, scenario-based framework that bridges this gap through four stages: (1) Control Flow Graph (CFG) analysis, (2) an Operation Map that grounds LLM reasoning in validated utility helpers, (3) Path-targeted test synthesis, and (4) an iterative, self-correction validation loop using compiler and runtime feedback. We evaluate SPARC on 59 real-world and algorithmic subjects, where it outperforms the vanilla prompt generation baseline by 31.36% in line coverage, 26.01% in branch coverage, and 20.78% in mutation score, matching or exceeding the symbolic execution tool KLEE on complex subjects. SPARC retains 94.3% of tests through iterative repair and produces code with significantly higher developer-rated readability and maintainability. By aligning LLM reasoning with program structure, SPARC provides a scalable path for industrial-grade testing of legacy C codebases.

Towards a Science of AI Agent Reliability

AI agents are increasingly deployed to execute important tasks. While rising accuracy scores on standard benchmarks suggest rapid progress, many agents still continue to fail in practice. This discrepancy highlights a fundamental limitation of current evaluations: compressing agent behavior into a single success metric obscures critical operational flaws. Notably, it ignores whether agents behave consistently across runs, withstand perturbations, fail predictably, or have bounded error severity. Grounded in safety-critical engineering, we provide a holistic performance profile by proposing twelve concrete metrics that decompose agent reliability along four key dimensions: consistency, robustness, predictability, and safety. Evaluating 14 agentic models across two complementary benchmarks, we find that recent capability gains have only yielded small improvements in reliability. By exposing these persistent limitations, our metrics complement traditional evaluations while offering tools for reasoning about how agents perform, degrade, and fail.

Align Once, Benefit Multilingually: Enforcing Multilingual Consistency for LLM Safety Alignment

The widespread deployment of large language models (LLMs) across linguistic communities necessitates reliable multilingual safety alignment. However, recent efforts to extend alignment to other languages often require substantial resources, either through large-scale, high-quality supervision in the target language or through pairwise alignment with high-resource languages, which limits scalability. In this work, we propose a resource-efficient method for improving multilingual safety alignment. We introduce a plug-and-play Multi-Lingual Consistency (MLC) loss that can be integrated into existing monolingual alignment pipelines. By improving collinearity between multilingual representation vectors, our method encourages directional consistency at the multilingual semantic level in a single update. This allows simultaneous alignment across multiple languages using only multilingual prompt variants without requiring additional response-level supervision in low-resource languages. We validate the proposed method across different model architectures and alignment paradigms, and demonstrate its effectiveness in enhancing multilingual safety with limited impact on general model utility. Further evaluation across languages and tasks indicates improved cross-lingual generalization, suggesting the proposed approach as a practical solution for multilingual consistency alignment under limited supervision.

Agent Skill Framework: Perspectives on the Potential of Small Language Models in Industrial Environments

Agent Skill framework, now widely and officially supported by major players such as GitHub Copilot, LangChain, and OpenAI, performs especially well with proprietary models by improving context engineering, reducing hallucinations, and boosting task accuracy. Based on these observations, an investigation is conducted to determine whether the Agent Skill paradigm provides similar benefits to small language models (SLMs). This question matters in industrial scenarios where continuous reliance on public APIs is infeasible due to data-security and budget constraints requirements, and where SLMs often show limited generalization in highly customized scenarios. This work introduces a formal mathematical definition of the Agent Skill process, followed by a systematic evaluation of language models of varying sizes across multiple use cases. The evaluation encompasses two open-source tasks and a real-world insurance claims data set. The results show that tiny models struggle with reliable skill selection, while moderately sized SLMs (approximately 12B - 30B) parameters) benefit substantially from the Agent Skill approach. Moreover, code-specialized variants at around 80B parameters achieve performance comparable to closed-source baselines while improving GPU efficiency. Collectively, these findings provide a comprehensive and nuanced characterization of the capabilities and constraints of the framework, while providing actionable insights for the effective deployment of Agent Skills in SLM-centered environments.

Retrieval Augmented Generation of Literature-derived Polymer Knowledge: The Example of a Biodegradable Polymer Expert System

Polymer literature contains a large and growing body of experimental knowledge, yet much of it is buried in unstructured text and inconsistent terminology, making systematic retrieval and reasoning difficult. Existing tools typically extract narrow, study-specific facts in isolation, failing to preserve the cross-study context required to answer broader scientific questions. Retrieval-augmented generation (RAG) offers a promising way to overcome this limitation by combining large language models (LLMs) with external retrieval, but its effectiveness depends strongly on how domain knowledge is represented. In this work, we develop two retrieval pipelines: a dense semantic vector-based approach (VectorRAG) and a graph-based approach (GraphRAG). Using over 1,000 polyhydroxyalkanoate (PHA) papers, we construct context-preserving paragraph embeddings and a canonicalized structured knowledge graph supporting entity disambiguation and multi-hop reasoning. We evaluate these pipelines through standard retrieval metrics, comparisons with general state-of-the-art systems such as GPT and Gemini, and qualitative validation by a domain chemist. The results show that GraphRAG achieves higher precision and interpretability, while VectorRAG provides broader recall, highlighting complementary trade-offs. Expert validation further confirms that the tailored pipelines, particularly GraphRAG, produce well-grounded, citation-reliable responses with strong domain relevance. By grounding every statement in evidence, these systems enable researchers to navigate the literature, compare findings across studies, and uncover patterns that are difficult to extract manually. More broadly, this work establishes a practical framework for building materials science assistants using curated corpora and retrieval design, reducing reliance on proprietary models while enabling trustworthy literature analysis at scale.

Enhanced Diffusion Sampling: Efficient Rare Event Sampling and Free Energy Calculation with Diffusion Models

The rare-event sampling problem has long been the central limiting factor in molecular dynamics (MD), especially in biomolecular simulation. Recently, diffusion models such as BioEmu have emerged as powerful equilibrium samplers that generate independent samples from complex molecular distributions, eliminating the cost of sampling rare transition events. However, a sampling problem remains when computing observables that rely on states which are rare in equilibrium, for example folding free energies. Here, we introduce enhanced diffusion sampling, enabling efficient exploration of rare-event regions while preserving unbiased thermodynamic estimators. The key idea is to perform quantitatively accurate steering protocols to generate biased ensembles and subsequently recover equilibrium statistics via exact reweighting. We instantiate our framework in three algorithms: UmbrellaDiff (umbrella sampling with diffusion models), $ฮ”$G-Diff (free-energy differences via tilted ensembles), and MetaDiff (a batchwise analogue for metadynamics). Across toy systems, protein folding landscapes and folding free energies, our methods achieve fast, accurate, and scalable estimation of equilibrium properties within GPU-minutes to hours per system -- closing the rare-event sampling gap that remained after the advent of diffusion-model equilibrium samplers.

Almost Sure Convergence of Differential Temporal Difference Learning for Average Reward Markov Decision Processes

The average reward is a fundamental performance metric in reinforcement learning (RL) focusing on the long-run performance of an agent. Differential temporal difference (TD) learning algorithms are a major advance for average reward RL as they provide an efficient online method to learn the value functions associated with the average reward in both on-policy and off-policy settings. However, existing convergence guarantees require a local clock in learning rates tied to state visit counts, which practitioners do not use and does not extend beyond tabular settings. We address this limitation by proving the almost sure convergence of on-policy $n$-step differential TD for any $n$ using standard diminishing learning rates without a local clock. We then derive three sufficient conditions under which off-policy $n$-step differential TD also converges without a local clock. These results strengthen the theoretical foundations of differential TD and bring its convergence analysis closer to practical implementations.

AI Models

KittenML/kitten-tts-mini-0.8


license: apache-2.0

Kitten TTS Mini 0.8 ๐Ÿ˜ป

Kitten TTS is an open-source realistic text-to-speech model with 80 million parameters and around 79MB of filesize.

๐Ÿš€ Quick Start

Installation

pip install https://github.com/KittenML/KittenTTS/releases/download/0.8/kittentts-0.8.0-py3-none-any.whl

Basic Usage

from kittentts import KittenTTS
m = KittenTTS("KittenML/kitten-tts-mini-0.8")
audio = m.generate("This high quality TTS model works without a GPU", voice='Jasper' )
# available_voices : ['Bella', 'Jasper', 'Luna', 'Bruno', 'Rosie', 'Hugo', 'Kiki', 'Leo']
# Save the audio
import soundfile as sf
sf.write('output.wav', audio, 24000)

Acknowledgements

StyleTTS 2 architecture

Author: KittenML

Likes: 28

Downloads: 0

Tags: onnx, license:apache-2.0, region:us

KittenML/kitten-tts-nano-0.8-fp32


license: apache-2.0

Kitten TTS Nano 0.8 ๐Ÿ˜ป

Kitten TTS is an open-source realistic text-to-speech model with 15 million parameters and around 50MB of filesize.

๐Ÿš€ Quick Start

Installation

pip install https://github.com/KittenML/KittenTTS/releases/download/0.8/kittentts-0.8.0-py3-none-any.whl

Basic Usage

from kittentts import KittenTTS
m = KittenTTS("KittenML/kitten-tts-nano-0.8-fp32")
audio = m.generate("This high quality TTS model works without a GPU", voice='Jasper' )
# available_voices : ['Bella', 'Jasper', 'Luna', 'Bruno', 'Rosie', 'Hugo', 'Kiki', 'Leo']
# Save the audio
import soundfile as sf
sf.write('output.wav', audio, 24000)

Acknowledgements

StyleTTS 2 architecture

Author: KittenML

Likes: 13

Downloads: 0

Tags: onnx, license:apache-2.0, region:us

Trendyol/Trendyol-LLM-Asure-12B


library_name: transformers tags:

  • e-commerce
  • multimodal
  • vision license: gemma language:
  • tr
  • en base_model:
  • google/gemma-3-12b pipeline_tag: image-text-to-text

Trendyol-LLM-Asure-12B

<img src="trendyol_llm_asure.png" width="300" alt="Trendyol-LLM-Asure-Logo" />

Trendyol-LLM-Asure-12B is a 12-billion-parameter multimodal instruct model built on top of Gemma 3-12B. It is optimized for structured instruction following over both text and image-text inputs, with a primary focus on operational task performance in Turkish and English.

The modelโ€™s general encyclopedic world knowledge is intentionally limited. Instead, it is heavily tuned for e-commerce business tasks such as summarization, question-answering, structured extraction, and controlled generation. Compared to its base model, it is optimized for lower token consumption and more efficient inference in production environments.


๐Ÿ”‘ Highlights

  • Multimodal (Vision + Text) โ€“ Native support for image-text conversations using Gemma 3 multimodal capabilities.
  • Instruct-Optimized โ€“ Trained exclusively in instruct format for high prompt adherence and system-message compliance.
  • Efficient Inference โ€“ Reduced token verbosity compared to base Gemma 3-12B.
  • Task-Oriented Design
    • Summarisation & paraphrasing
    • Textual and visual Question-Answering (QA)
    • Structured extraction
    • Controlled generation tasks
    • Text classification
    • E-commerce tasks such as relevancy
  • Bilingual โ€“ Strong Turkish and English performance.

Basic Usage with Transformers

from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image

model_id = "Trendyol/Trendyol-LLM-Asure-12B"

model = AutoModelForImageTextToText.from_pretrained(model_id, device_map="auto")
processor = AutoProcessor.from_pretrained(model_id)

messages = [
    {"role": "user", "content": [
        {"type": "image", "image": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"},
        {"type": "text", "text": "Bu gรถrselde ne gรถrรผyorsun? Kฤฑsa ve net ลŸekilde aรงฤฑklayabilir misin?"}
    ]}
]

inputs = processor.apply_chat_template(messages, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=512)
print(processor.decode(output[0], skip_special_tokens=True))

Serve the Model with VLLM

Below is a minimal production-style setup for serving Trendyol-LLM-Asure-12B with vLLM.

vllm serve Trendyol/Trendyol-LLM-Asure-12B \
  --served-model-name asure-12b \
  --dtype bfloat16 \

Limitations, Risks, Bias, and Ethical Considerations

Limitations and Known Biases

  • Primary Function and Application: Trendyol LLM, an autoregressive language model, is primarily designed to predict the next token in a text string. Outputs should be considered as suggestions rather than definitive answers.
  • Language Comprehension and Generation: The model is primarily trained in standard English and Turkish. Its performance in understanding and generating slang, informal language, or other languages may be limited, leading to potential errors or misinterpretations.
  • Generation of False Information: Users should be aware that Trendyol LLM may produce inaccurate or misleading information. Its world-knowdledge is limited, it is built for business use cases.

Risks and Ethical Considerations

  • Potential for Harmful Use: There is a risk that Trendyol LLM could be used to generate offensive or harmful language. We strongly discourage its use for any such purposes and emphasize the need for application-specific safety and fairness evaluations before deployment.
  • Unintended Content and Bias: The model was trained on a large corpus of text data, which was not explicitly checked for offensive content or existing biases. Consequently, it may inadvertently produce content that reflects these biases or inaccuracies.
  • Toxicity: Despite efforts to select appropriate training data, the model is capable of generating harmful content, especially when prompted explicitly. We encourage the open-source community to engage in developing strategies to minimize such risks.

Recommendations for Safe and Ethical Usage

  • Human Oversight: We recommend incorporating a human curation layer or using filters to manage and improve the quality of outputs, especially in public-facing applications. This approach can help mitigate the risk of generating objectionable content unexpectedly.
  • Application-Specific Testing: Developers intending to use Trendyol LLM should conduct thorough safety testing and optimization tailored to their specific applications. This is crucial, as the modelโ€™s responses can be unpredictable and may occasionally be biased, inaccurate, or offensive.
  • Responsible Development and Deployment: It is the responsibility of developers and users of Trendyol LLM to ensure its ethical and safe application. We urge users to be mindful of the model's limitations and to employ appropriate safeguards to prevent misuse or harmful consequences.

Author: Trendyol

Likes: 9

Downloads: 0

Tags: transformers, safetensors, gemma3, image-text-to-text, e-commerce, multimodal, vision, conversational, tr, en, license:gemma, text-generation-inference, endpoints_compatible, region:us

lightonai/ColBERT-Zero


tags:

  • ColBERT
  • PyLate
  • sentence-transformers
  • sentence-similarity
  • feature-extraction
  • generated_from_trainer
  • dataset_size:640000
  • loss:Distillation pipeline_tag: sentence-similarity library_name: PyLate license: apache-2.0 language:
  • en metrics:
  • MaxSim_accuracy@1
  • MaxSim_accuracy@3
  • MaxSim_accuracy@5
  • MaxSim_accuracy@10
  • MaxSim_precision@1
  • MaxSim_precision@3
  • MaxSim_precision@5
  • MaxSim_precision@10
  • MaxSim_recall@1
  • MaxSim_recall@3
  • MaxSim_recall@5
  • MaxSim_recall@10
  • MaxSim_ndcg@10
  • MaxSim_mrr@10
  • MaxSim_map@100 model-index:
  • name: PyLate results:
    • task: type: py-late-information-retrieval name: Py Late Information Retrieval dataset: name: NanoClimateFEVER type: NanoClimateFEVER metrics:
      • type: MaxSim_accuracy@1 value: 0.36 name: Maxsim Accuracy@1
      • type: MaxSim_accuracy@3 value: 0.68 name: Maxsim Accuracy@3
      • type: MaxSim_accuracy@5 value: 0.76 name: Maxsim Accuracy@5
      • type: MaxSim_accuracy@10 value: 0.88 name: Maxsim Accuracy@10
      • type: MaxSim_precision@1 value: 0.36 name: Maxsim Precision@1
      • type: MaxSim_precision@3 value: 0.2866666666666666 name: Maxsim Precision@3
      • type: MaxSim_precision@5 value: 0.21999999999999997 name: Maxsim Precision@5
      • type: MaxSim_precision@10 value: 0.148 name: Maxsim Precision@10
      • type: MaxSim_recall@1 value: 0.18 name: Maxsim Recall@1
      • type: MaxSim_recall@3 value: 0.35999999999999993 name: Maxsim Recall@3
      • type: MaxSim_recall@5 value: 0.429 name: Maxsim Recall@5
      • type: MaxSim_recall@10 value: 0.5536666666666666 name: Maxsim Recall@10
      • type: MaxSim_ndcg@10 value: 0.4511316943880545 name: Maxsim Ndcg@10
      • type: MaxSim_mrr@10 value: 0.5352619047619046 name: Maxsim Mrr@10
      • type: MaxSim_map@100 value: 0.35707500469760434 name: Maxsim Map@100
    • task: type: py-late-information-retrieval name: Py Late Information Retrieval dataset: name: NanoDBPedia type: NanoDBPedia metrics:
      • type: MaxSim_accuracy@1 value: 0.86 name: Maxsim Accuracy@1
      • type: MaxSim_accuracy@3 value: 0.94 name: Maxsim Accuracy@3
      • type: MaxSim_accuracy@5 value: 0.94 name: Maxsim Accuracy@5
      • type: MaxSim_accuracy@10 value: 0.98 name: Maxsim Accuracy@10
      • type: MaxSim_precision@1 value: 0.86 name: Maxsim Precision@1
      • type: MaxSim_precision@3 value: 0.7333333333333333 name: Maxsim Precision@3
      • type: MaxSim_precision@5 value: 0.66 name: Maxsim Precision@5
      • type: MaxSim_precision@10 value: 0.5840000000000001 name: Maxsim Precision@10
      • type: MaxSim_recall@1 value: 0.10798996781634018 name: Maxsim Recall@1
      • type: MaxSim_recall@3 value: 0.21610834839667603 name: Maxsim Recall@3
      • type: MaxSim_recall@5 value: 0.29328648273572205 name: Maxsim Recall@5
      • type: MaxSim_recall@10 value: 0.4273378391765384 name: Maxsim Recall@10
      • type: MaxSim_ndcg@10 value: 0.7325830538365519 name: Maxsim Ndcg@10
      • type: MaxSim_mrr@10 value: 0.8995238095238095 name: Maxsim Mrr@10
      • type: MaxSim_map@100 value: 0.5805986129726132 name: Maxsim Map@100
    • task: type: py-late-information-retrieval name: Py Late Information Retrieval dataset: name: NanoFEVER type: NanoFEVER metrics:
      • type: MaxSim_accuracy@1 value: 0.96 name: Maxsim Accuracy@1
      • type: MaxSim_accuracy@3 value: 1.0 name: Maxsim Accuracy@3
      • type: MaxSim_accuracy@5 value: 1.0 name: Maxsim Accuracy@5
      • type: MaxSim_accuracy@10 value: 1.0 name: Maxsim Accuracy@10
      • type: MaxSim_precision@1 value: 0.96 name: Maxsim Precision@1
      • type: MaxSim_precision@3 value: 0.3533333333333333 name: Maxsim Precision@3
      • type: MaxSim_precision@5 value: 0.21199999999999997 name: Maxsim Precision@5
      • type: MaxSim_precision@10 value: 0.10999999999999999 name: Maxsim Precision@10
      • type: MaxSim_recall@1 value: 0.8966666666666667 name: Maxsim Recall@1
      • type: MaxSim_recall@3 value: 0.9633333333333333 name: Maxsim Recall@3
      • type: MaxSim_recall@5 value: 0.9633333333333333 name: Maxsim Recall@5
      • type: MaxSim_recall@10 value: 0.98 name: Maxsim Recall@10
      • type: MaxSim_ndcg@10 value: 0.9624259972128165 name: Maxsim Ndcg@10
      • type: MaxSim_mrr@10 value: 0.9766666666666666 name: Maxsim Mrr@10
      • type: MaxSim_map@100 value: 0.9478155706727135 name: Maxsim Map@100
    • task: type: py-late-information-retrieval name: Py Late Information Retrieval dataset: name: NanoFiQA2018 type: NanoFiQA2018 metrics:
      • type: MaxSim_accuracy@1 value: 0.58 name: Maxsim Accuracy@1
      • type: MaxSim_accuracy@3 value: 0.66 name: Maxsim Accuracy@3
      • type: MaxSim_accuracy@5 value: 0.72 name: Maxsim Accuracy@5
      • type: MaxSim_accuracy@10 value: 0.82 name: Maxsim Accuracy@10
      • type: MaxSim_precision@1 value: 0.58 name: Maxsim Precision@1
      • type: MaxSim_precision@3 value: 0.32666666666666666 name: Maxsim Precision@3
      • type: MaxSim_precision@5 value: 0.24799999999999997 name: Maxsim Precision@5
      • type: MaxSim_precision@10 value: 0.14799999999999996 name: Maxsim Precision@10
      • type: MaxSim_recall@1 value: 0.35257936507936505 name: Maxsim Recall@1
      • type: MaxSim_recall@3 value: 0.47423809523809524 name: Maxsim Recall@3
      • type: MaxSim_recall@5 value: 0.5460079365079364 name: Maxsim Recall@5
      • type: MaxSim_recall@10 value: 0.6425317460317461 name: Maxsim Recall@10
      • type: MaxSim_ndcg@10 value: 0.5786162417612232 name: Maxsim Ndcg@10
      • type: MaxSim_mrr@10 value: 0.643436507936508 name: Maxsim Mrr@10
      • type: MaxSim_map@100 value: 0.5234035855771078 name: Maxsim Map@100
    • task: type: py-late-information-retrieval name: Py Late Information Retrieval dataset: name: NanoHotpotQA type: NanoHotpotQA metrics:
      • type: MaxSim_accuracy@1 value: 0.98 name: Maxsim Accuracy@1
      • type: MaxSim_accuracy@3 value: 1.0 name: Maxsim Accuracy@3
      • type: MaxSim_accuracy@5 value: 1.0 name: Maxsim Accuracy@5
      • type: MaxSim_accuracy@10 value: 1.0 name: Maxsim Accuracy@10
      • type: MaxSim_precision@1 value: 0.98 name: Maxsim Precision@1
      • type: MaxSim_precision@3 value: 0.6 name: Maxsim Precision@3
      • type: MaxSim_precision@5 value: 0.3679999999999999 name: Maxsim Precision@5
      • type: MaxSim_precision@10 value: 0.18599999999999994 name: Maxsim Precision@10
      • type: MaxSim_recall@1 value: 0.49 name: Maxsim Recall@1
      • type: MaxSim_recall@3 value: 0.9 name: Maxsim Recall@3
      • type: MaxSim_recall@5 value: 0.92 name: Maxsim Recall@5
      • type: MaxSim_recall@10 value: 0.93 name: Maxsim Recall@10
      • type: MaxSim_ndcg@10 value: 0.924329868595787 name: Maxsim Ndcg@10
      • type: MaxSim_mrr@10 value: 0.99 name: Maxsim Mrr@10
      • type: MaxSim_map@100 value: 0.8944956212370004 name: Maxsim Map@100
    • task: type: py-late-information-retrieval name: Py Late Information Retrieval dataset: name: NanoMSMARCO type: NanoMSMARCO metrics:
      • type: MaxSim_accuracy@1 value: 0.6 name: Maxsim Accuracy@1
      • type: MaxSim_accuracy@3 value: 0.68 name: Maxsim Accuracy@3
      • type: MaxSim_accuracy@5 value: 0.78 name: Maxsim Accuracy@5
      • type: MaxSim_accuracy@10 value: 0.9 name: Maxsim Accuracy@10
      • type: MaxSim_precision@1 value: 0.6 name: Maxsim Precision@1
      • type: MaxSim_precision@3 value: 0.22666666666666668 name: Maxsim Precision@3
      • type: MaxSim_precision@5 value: 0.15600000000000003 name: Maxsim Precision@5
      • type: MaxSim_precision@10 value: 0.09 name: Maxsim Precision@10
      • type: MaxSim_recall@1 value: 0.6 name: Maxsim Recall@1
      • type: MaxSim_recall@3 value: 0.68 name: Maxsim Recall@3
      • type: MaxSim_recall@5 value: 0.78 name: Maxsim Recall@5
      • type: MaxSim_recall@10 value: 0.9 name: Maxsim Recall@10
      • type: MaxSim_ndcg@10 value: 0.7242459443760582 name: Maxsim Ndcg@10
      • type: MaxSim_mrr@10 value: 0.671047619047619 name: Maxsim Mrr@10
      • type: MaxSim_map@100 value: 0.6766320575975747 name: Maxsim Map@100
    • task: type: py-late-information-retrieval name: Py Late Information Retrieval dataset: name: NanoNFCorpus type: NanoNFCorpus metrics:
      • type: MaxSim_accuracy@1 value: 0.58 name: Maxsim Accuracy@1
      • type: MaxSim_accuracy@3 value: 0.68 name: Maxsim Accuracy@3
      • type: MaxSim_accuracy@5 value: 0.72 name: Maxsim Accuracy@5
      • type: MaxSim_accuracy@10 value: 0.76 name: Maxsim Accuracy@10
      • type: MaxSim_precision@1 value: 0.58 name: Maxsim Precision@1
      • type: MaxSim_precision@3 value: 0.42666666666666664 name: Maxsim Precision@3
      • type: MaxSim_precision@5 value: 0.396 name: Maxsim Precision@5
      • type: MaxSim_precision@10 value: 0.316 name: Maxsim Precision@10
      • type: MaxSim_recall@1 value: 0.06598420757312619 name: Maxsim Recall@1
      • type: MaxSim_recall@3 value: 0.10355307905498773 name: Maxsim Recall@3
      • type: MaxSim_recall@5 value: 0.1296680186177352 name: Maxsim Recall@5
      • type: MaxSim_recall@10 value: 0.1635498250401139 name: Maxsim Recall@10
      • type: MaxSim_ndcg@10 value: 0.4054849783640007 name: Maxsim Ndcg@10
      • type: MaxSim_mrr@10 value: 0.6303888888888889 name: Maxsim Mrr@10
      • type: MaxSim_map@100 value: 0.195854964801369 name: Maxsim Map@100
    • task: type: py-late-information-retrieval name: Py Late Information Retrieval dataset: name: NanoNQ type: NanoNQ metrics:
      • type: MaxSim_accuracy@1 value: 0.62 name: Maxsim Accuracy@1
      • type: MaxSim_accuracy@3 value: 0.84 name: Maxsim Accuracy@3
      • type: MaxSim_accuracy@5 value: 0.88 name: Maxsim Accuracy@5
      • type: MaxSim_accuracy@10 value: 0.9 name: Maxsim Accuracy@10
      • type: MaxSim_precision@1 value: 0.62 name: Maxsim Precision@1
      • type: MaxSim_precision@3 value: 0.28 name: Maxsim Precision@3
      • type: MaxSim_precision@5 value: 0.176 name: Maxsim Precision@5
      • type: MaxSim_precision@10 value: 0.09599999999999997 name: Maxsim Precision@10
      • type: MaxSim_recall@1 value: 0.59 name: Maxsim Recall@1
      • type: MaxSim_recall@3 value: 0.78 name: Maxsim Recall@3
      • type: MaxSim_recall@5 value: 0.81 name: Maxsim Recall@5
      • type: MaxSim_recall@10 value: 0.86 name: Maxsim Recall@10
      • type: MaxSim_ndcg@10 value: 0.7474767067573468 name: Maxsim Ndcg@10
      • type: MaxSim_mrr@10 value: 0.7341904761904762 name: Maxsim Mrr@10
      • type: MaxSim_map@100 value: 0.7035987374595623 name: Maxsim Map@100
    • task: type: py-late-information-retrieval name: Py Late Information Retrieval dataset: name: NanoQuoraRetrieval type: NanoQuoraRetrieval metrics:
      • type: MaxSim_accuracy@1 value: 0.92 name: Maxsim Accuracy@1
      • type: MaxSim_accuracy@3 value: 0.98 name: Maxsim Accuracy@3
      • type: MaxSim_accuracy@5 value: 1.0 name: Maxsim Accuracy@5
      • type: MaxSim_accuracy@10 value: 1.0 name: Maxsim Accuracy@10
      • type: MaxSim_precision@1 value: 0.92 name: Maxsim Precision@1
      • type: MaxSim_precision@3 value: 0.3933333333333333 name: Maxsim Precision@3
      • type: MaxSim_precision@5 value: 0.24799999999999997 name: Maxsim Precision@5
      • type: MaxSim_precision@10 value: 0.128 name: Maxsim Precision@10
      • type: MaxSim_recall@1 value: 0.7973333333333332 name: Maxsim Recall@1
      • type: MaxSim_recall@3 value: 0.932 name: Maxsim Recall@3
      • type: MaxSim_recall@5 value: 0.9626666666666668 name: Maxsim Recall@5
      • type: MaxSim_recall@10 value: 0.9726666666666667 name: Maxsim Recall@10
      • type: MaxSim_ndcg@10 value: 0.9376063901029283 name: Maxsim Ndcg@10
      • type: MaxSim_mrr@10 value: 0.9540000000000001 name: Maxsim Mrr@10
      • type: MaxSim_map@100 value: 0.9156057922958499 name: Maxsim Map@100
    • task: type: py-late-information-retrieval name: Py Late Information Retrieval dataset: name: NanoSCIDOCS type: NanoSCIDOCS metrics:
      • type: MaxSim_accuracy@1 value: 0.48 name: Maxsim Accuracy@1
      • type: MaxSim_accuracy@3 value: 0.74 name: Maxsim Accuracy@3
      • type: MaxSim_accuracy@5 value: 0.76 name: Maxsim Accuracy@5
      • type: MaxSim_accuracy@10 value: 0.9 name: Maxsim Accuracy@10
      • type: MaxSim_precision@1 value: 0.48 name: Maxsim Precision@1
      • type: MaxSim_precision@3 value: 0.4066666666666666 name: Maxsim Precision@3
      • type: MaxSim_precision@5 value: 0.30400000000000005 name: Maxsim Precision@5
      • type: MaxSim_precision@10 value: 0.204 name: Maxsim Precision@10
      • type: MaxSim_recall@1 value: 0.10266666666666666 name: Maxsim Recall@1
      • type: MaxSim_recall@3 value: 0.25066666666666665 name: Maxsim Recall@3
      • type: MaxSim_recall@5 value: 0.3106666666666667 name: Maxsim Recall@5
      • type: MaxSim_recall@10 value: 0.41666666666666663 name: Maxsim Recall@10
      • type: MaxSim_ndcg@10 value: 0.41240108229211636 name: Maxsim Ndcg@10
      • type: MaxSim_mrr@10 value: 0.6183888888888889 name: Maxsim Mrr@10
      • type: MaxSim_map@100 value: 0.3293535579753635 name: Maxsim Map@100
    • task: type: py-late-information-retrieval name: Py Late Information Retrieval dataset: name: NanoArguAna type: NanoArguAna metrics:
      • type: MaxSim_accuracy@1 value: 0.24 name: Maxsim Accuracy@1
      • type: MaxSim_accuracy@3 value: 0.64 name: Maxsim Accuracy@3
      • type: MaxSim_accuracy@5 value: 0.7 name: Maxsim Accuracy@5
      • type: MaxSim_accuracy@10 value: 0.9 name: Maxsim Accuracy@10
      • type: MaxSim_precision@1 value: 0.24 name: Maxsim Precision@1
      • type: MaxSim_precision@3 value: 0.21333333333333335 name: Maxsim Precision@3
      • type: MaxSim_precision@5 value: 0.14 name: Maxsim Precision@5
      • type: MaxSim_precision@10 value: 0.08999999999999998 name: Maxsim Precision@10
      • type: MaxSim_recall@1 value: 0.24 name: Maxsim Recall@1
      • type: MaxSim_recall@3 value: 0.64 name: Maxsim Recall@3
      • type: MaxSim_recall@5 value: 0.7 name: Maxsim Recall@5
      • type: MaxSim_recall@10 value: 0.9 name: Maxsim Recall@10
      • type: MaxSim_ndcg@10 value: 0.5619950169581177 name: Maxsim Ndcg@10
      • type: MaxSim_mrr@10 value: 0.4556587301587301 name: Maxsim Mrr@10
      • type: MaxSim_map@100 value: 0.4583679653679654 name: Maxsim Map@100
    • task: type: py-late-information-retrieval name: Py Late Information Retrieval dataset: name: NanoSciFact type: NanoSciFact metrics:
      • type: MaxSim_accuracy@1 value: 0.7 name: Maxsim Accuracy@1
      • type: MaxSim_accuracy@3 value: 0.82 name: Maxsim Accuracy@3
      • type: MaxSim_accuracy@5 value: 0.88 name: Maxsim Accuracy@5
      • type: MaxSim_accuracy@10 value: 0.92 name: Maxsim Accuracy@10
      • type: MaxSim_precision@1 value: 0.7 name: Maxsim Precision@1
      • type: MaxSim_precision@3 value: 0.2866666666666667 name: Maxsim Precision@3
      • type: MaxSim_precision@5 value: 0.19599999999999998 name: Maxsim Precision@5
      • type: MaxSim_precision@10 value: 0.10199999999999998 name: Maxsim Precision@10
      • type: MaxSim_recall@1 value: 0.675 name: Maxsim Recall@1
      • type: MaxSim_recall@3 value: 0.79 name: Maxsim Recall@3
      • type: MaxSim_recall@5 value: 0.87 name: Maxsim Recall@5
      • type: MaxSim_recall@10 value: 0.91 name: Maxsim Recall@10
      • type: MaxSim_ndcg@10 value: 0.8019869692829787 name: Maxsim Ndcg@10
      • type: MaxSim_mrr@10 value: 0.7716666666666667 name: Maxsim Mrr@10
      • type: MaxSim_map@100 value: 0.7651960954534442 name: Maxsim Map@100
    • task: type: py-late-information-retrieval name: Py Late Information Retrieval dataset: name: NanoTouche2020 type: NanoTouche2020 metrics:
      • type: MaxSim_accuracy@1 value: 0.8163265306122449 name: Maxsim Accuracy@1
      • type: MaxSim_accuracy@3 value: 0.9795918367346939 name: Maxsim Accuracy@3
      • type: MaxSim_accuracy@5 value: 0.9795918367346939 name: Maxsim Accuracy@5
      • type: MaxSim_accuracy@10 value: 0.9795918367346939 name: Maxsim Accuracy@10
      • type: MaxSim_precision@1 value: 0.8163265306122449 name: Maxsim Precision@1
      • type: MaxSim_precision@3 value: 0.727891156462585 name: Maxsim Precision@3
      • type: MaxSim_precision@5 value: 0.6653061224489795 name: Maxsim Precision@5
      • type: MaxSim_precision@10 value: 0.5387755102040817 name: Maxsim Precision@10
      • type: MaxSim_recall@1 value: 0.05638641704555484 name: Maxsim Recall@1
      • type: MaxSim_recall@3 value: 0.1492928448908377 name: Maxsim Recall@3
      • type: MaxSim_recall@5 value: 0.2240629902771357 name: Maxsim Recall@5
      • type: MaxSim_recall@10 value: 0.3474561127492143 name: Maxsim Recall@10
      • type: MaxSim_ndcg@10 value: 0.6176094809857532 name: Maxsim Ndcg@10
      • type: MaxSim_mrr@10 value: 0.8775510204081632 name: Maxsim Mrr@10
      • type: MaxSim_map@100 value: 0.4570510040327342 name: Maxsim Map@100
    • task: type: nano-beir name: Nano BEIR dataset: name: NanoBEIR mean type: NanoBEIR_mean metrics:
      • type: MaxSim_accuracy@1 value: 0.6689481946624803 name: Maxsim Accuracy@1
      • type: MaxSim_accuracy@3 value: 0.8184301412872842 name: Maxsim Accuracy@3
      • type: MaxSim_accuracy@5 value: 0.855353218210361 name: Maxsim Accuracy@5
      • type: MaxSim_accuracy@10 value: 0.9184301412872842 name: Maxsim Accuracy@10
      • type: MaxSim_precision@1 value: 0.6689481946624803 name: Maxsim Precision@1
      • type: MaxSim_precision@3 value: 0.4047095761381475 name: Maxsim Precision@3
      • type: MaxSim_precision@5 value: 0.3068697017268446 name: Maxsim Precision@5
      • type: MaxSim_precision@10 value: 0.210828885400314 name: Maxsim Precision@10
      • type: MaxSim_recall@1 value: 0.39650820186008107 name: Maxsim Recall@1
      • type: MaxSim_recall@3 value: 0.5568609513523536 name: Maxsim Recall@3
      • type: MaxSim_recall@5 value: 0.6106686226773229 name: Maxsim Recall@5
      • type: MaxSim_recall@10 value: 0.6926058094613547 name: Maxsim Recall@10
      • type: MaxSim_ndcg@10 value: 0.6813764173010564 name: Maxsim Ndcg@10
      • type: MaxSim_mrr@10 value: 0.7505985522414094 name: Maxsim Mrr@10
      • type: MaxSim_map@100 value: 0.6003883515493001 name: Maxsim Map@100

<div align="center"> <img src="https://cdn-uploads.huggingface.co/production/uploads/609bbe2f4932693ca2009d6a/xn21ll7YRj0ZftBli3-T5.jpeg" width="600" height="auto">

Website LinkedIn X

๐Ÿ“„ Paper | ๐Ÿ“ Blog | ๐Ÿ“š Collection

</div>

ColBERT-Zero

๐ŸŽฏ TL;DR: First large-scale fully pre-trained ColBERT model using only public data. Achieves 55.43 nDCG@10 on BEIR benchmark, outperforming GTE-ModernColBERT and GTE-ModernBERT trained on closed and stronger data. New SOTA on BEIR for models <150M parameters.

Why ColBERT-Zero?

Late interaction (ColBERT / multi-vector) models have clear advantages in out-of-domain generalization, long-context handling, and reasoning-intensive retrieval. Yet they remain undertrained: current state-of-the-art ColBERT models (e.g, GTE-ModernColBERT and ColBERT-small) are simply built by bolting a small knowledge distillation step onto a strong dense (single-vector) model. Even recent efforts like mxbai-edge-colbert-v0 perform all early training stages in a single-vector setting, only switching to the multi-vector objective at the very end.

This leaves a lot of performance on the table. ColBERT-Zero demonstrates that performing contrastive pre-training directly in the multi-vector setting, rather than treating it as an afterthought, unlocks a significantly higher performance ceiling. Trained exclusively on public data (Nomic-embed dataset mixture), ColBERT-Zero overcomes a 2.4-point data quality disadvantage to outperform models trained on proprietary, closed-source data. For detailed results, please have a look at our blogpost and the paper. All the models (including intermediate checkpoints) as well training code are released under an Apache 2.0 license.

Controlled Comparison Design

We deliberately trained on the public Nomic-embed data mixture for a strategic reason: Nomic has already trained a dense ModernBERT model (ModernBERT-embed) on this exact data. This lets us compare dense vs. multi-vector training with the same data, same base model (ModernBERT), and same pipeline. The only variable is whether the contrastive phases are performed in the dense or multi-vector setting.

This design reveals a striking result: the dense baseline trained on Nomic data scores 52.89, while the one trained on GTE's proprietary data scores 55.33: a 2.4-point data quality gap. Despite this disadvantage, ColBERT-Zero's full multi-vector pre-training pipeline closes and surpasses this gap, reaching 55.43 nDCG@10.

The Three-Phase Training Pipeline

The development followed a three-phase pipeline, each providing a different type of learning signal:

Phase 1 - Unsupervised Contrastive Pre-training

We began with the nomic-embed-unsupervised-data dataset. Using PyLate's GradCache implementation to scale per-GPU batch size without VRAM constraints, combined with cross-GPU gathering of representations, we reached effective batch sizes of ~16k, required for unsupervised training to produce plausible in-batch hard negatives. Unlike dense training, the multi-vector objective allows the encoder to learn fine-grained token importance from the very first phase.

Phase 2 - Supervised Contrastive Fine-tuning

We refined the model using the nomic-embed-supervised-data. This stage introduced mined hard negatives: documents that are superficially similar to the query but not actually relevant. This allows teaching the model to handle nuance by prioritizing specific keywords and contextual tokens most indicative of a true match.

Phase 3 - Knowledge Distillation (KD)

The final stage used the ms-marco-en-bge dataset. We leveraged a powerful Gemma-based model as a teacher, allowing our student models to learn to replicate complex reasoning scores via the efficient MaxSim operator.

Key Findings

1. The Standard Recipe Leaves Performance on the Table

The KD-only approach (the current industry standard) scores 54.09, lagging behind full pre-training by 1.3 points. A simple distillation step is insufficient for optimal multi-vector performance.

2. Supervised + KD Is the Efficiency Sweet Spot

By running a supervised contrastive step in the multi-vector setting before distillation, we reach 55.12 nDCG@10, closing most of the gap with the fully pre-trained model (55.43). This costs ~40 GH200-hours instead of ~408: roughly 10ร— cheaper for 99.4% of the performance.

<div align="center"> <img src="https://cdn-uploads.huggingface.co/production/uploads/609bbe2f4932693ca2009d6a/V1_hTZ0VnJHldfd3Ip-Jm.png" width="600" height="auto"> </div>

3. Prompt Alignment Is Non-Negotiable

Nomic's base models are pre-trained with asymmetric prompts (search_query: and search_document:). While ColBERT has its own asymmetric mechanism via [Q] and [D] markers, we found:

  • Stripping pre-training prompts during fine-tuning causes significant performance degradation.
  • Adding prompts to a model not pre-trained with them also hurts performance.
  • Even with perfect alignment, prompts provide an intrinsic benefit: full ColBERT pre-training with prompts (55.43) vs. without prompts (54.61), no mismatch in either case, shows a meaningful 0.82-point gap.
<div align="center"> <img src="https://cdn-uploads.huggingface.co/production/uploads/609bbe2f4932693ca2009d6a/uZoRA7SwisR-svi4lPDTi.png" width="600" height="auto"> </div>

Why do prompts help? Our leading hypothesis is that prompt tokens act as implicit query expansion: extra slots that don't carry specific meaning but let the model store global information about the sequence. The original ColBERT used [PAD] tokens for this purpose, but modern Flash Attention implementations broke this trick (masked tokens no longer produce usable embeddings). Explicit prompt tokens may be quietly re-enabling it.

Practical takeaway: Always align your prompts with the base model's pre-training setup. Misalignment is one of the easiest ways to silently lose performance. Note that this sensitivity decreases with stronger downstream fine-tuning: with enough training, the model can adapt to an initial mismatch.

Model Lineup

The Main Models (ColBERT-Zero)

ColBERT-Zero utilizes the full 3-phase pipeline with strict prompt alignment, achieving 55.43 nDCG@10 on BEIR, setting a new SOTA for models <150M parameters. We also provide ColBERT-Zero-noprompts, the same pipeline without asymmetric prompts, to study the impact of query expansion on multi-vector performance.

The cheap-to-train ones (ModernColBERT-embed-base)

These models represent the practical sweet spot. By skipping the expensive unsupervised phase, ModernColBERT-embed-base (Supervised + KD) achieves ~97% of the flagship's performance at only ~10% of the compute cost. For reference, ModernColBERT-embed-base-kd performs only the distillation step on a supervised dense base.

Intermediate Checkpoints

For researchers studying the incremental impact of each phase and prompt alignment, we release several ablation variants: ColBERT-Zero-supervised, ColBERT-Zero-unsupervised (and their -noprompts versions), and ModernColBERT-embed-base-supervised.

Full Performance on BEIR

<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <style> .beir-wrap { overflow-x: auto; font-family: system-ui, sans-serif; width: 100%; display: block; -webkit-overflow-scrolling: touch; } .beir-wrap table { border-collapse: collapse; font-size: 0.70rem; white-space: nowrap; background: #fff; box-shadow: 0 1px 4px rgba(0,0,0,.1); border-radius: 8px; min-width: max-content; } .beir-wrap th, .beir-wrap td { padding: 7px 10px; text-align: center; border-bottom: 1px solid #e9ecef; } .beir-wrap td:first-child, .beir-wrap th:first-child { text-align: left; min-width: 260px; } .beir-wrap th { background: #1e293b; color: #fff; font-weight: 600; } .beir-wrap th.avg-col { background: #f59e0b; color: #1e293b; font-weight: 700; } .beir-wrap td.avg-col { font-weight: 700; font-size: 0.78rem; color: #1e293b; background: #fef3c7; border-left: 2px solid #f59e0b; border-right: 2px solid #f59e0b; } .beir-wrap tr:last-child td.avg-col { border-bottom: 2px solid #f59e0b; } .beir-wrap .section-row td { background: #334155; color: #94a3b8; font-weight: 600; font-size: 0.72rem; letter-spacing: .05em; text-transform: uppercase; padding: 5px 10px; } .beir-wrap strong { color: #0f172a; } .beir-wrap tbody tr:not(.section-row):hover td { background: #f1f5f9; } .beir-wrap tbody tr:not(.section-row):hover td.avg-col { background: #fde68a; } .beir-wrap a { color: #3b82f6; text-decoration: none; } .beir-wrap a:hover { text-decoration: underline; } </style> </head> <body> <div class="beir-wrap"> <table> <thead> <tr> <th>Model</th> <th class="avg-col">Avg</th> <th>FiQA</th><th>NFCorpus</th><th>TREC-COVID</th><th>Touche</th><th>ArguAna</th><th>Quora</th><th>SCIDOCS</th><th>SciFact</th><th>NQ</th><th>ClimateFEVER</th><th>HotpotQA</th><th>DBPedia</th><th>CQADupstack</th><th>FEVER</th><th>MSMARCO</th> </tr> </thead> <tbody> <tr class="section-row"><td colspan="17">Baselines</td></tr> <tr> <td><a href="https://huggingface.co/nomic-ai/modernbert-embed-base-unsupervised">ModernBERT-embed-unsupervised</a></td> <td class="avg-col">47.05</td> <td>42.53</td><td>35.33</td><td>68.44</td><td>18.58</td><td>48.82</td><td>88.63</td><td>19.83</td><td>72.30</td><td>46.32</td><td>22.97</td><td>60.00</td><td>37.97</td><td>42.40</td><td>67.39</td><td>34.23</td> </tr> <tr> <td><a href="https://huggingface.co/nomic-ai/modernbert-embed-base">ModernBERT-embed-supervised</a></td> <td class="avg-col">52.89</td> <td>40.59</td><td>33.40</td><td><strong>84.15</strong></td><td>31.91</td><td>48.96</td><td><strong>88.85</strong></td><td>18.59</td><td>69.63</td><td>62.15</td><td>35.67</td><td>67.11</td><td>41.50</td><td>42.08</td><td>87.35</td><td>41.47</td> </tr> <tr> <td><a href="https://huggingface.co/lightonai/GTE-ModernColBERT-v1">GTE-ModernColBERT</a></td> <td class="avg-col">54.67</td> <td>45.28</td><td><strong>37.93</strong></td><td>83.59</td><td>31.23</td><td>48.51</td><td>86.61</td><td>19.06</td><td>76.34</td><td>61.80</td><td>30.62</td><td>77.32</td><td>48.03</td><td>41.00</td><td>87.44</td><td>45.32</td> </tr> <tr> <td><a href="https://huggingface.co/Alibaba-NLP/gte-modernbert-base">gte-modernbert-base</a></td> <td class="avg-col">55.33</td> <td><strong>48.81</strong></td><td>36.44</td><td>81.95</td><td>21.68</td><td><strong>72.68</strong></td><td>88.55</td><td>21.29</td><td><strong>77.40</strong></td><td>57.62</td><td><strong>37.74</strong></td><td>69.47</td><td>41.79</td><td>42.63</td><td><strong>91.03</strong></td><td>40.90</td> </tr> <tr class="section-row"><td colspan="17">KD from dense supervised</td></tr> <tr> <td><a href="https://huggingface.co/lightonai/ModernColBERT-embed-base-kd-only">ModernColBERT-embed-base-kd-only</a></td> <td class="avg-col">54.09</td> <td>42.51</td><td>37.01</td><td>79.52</td><td>34.58</td><td>51.75</td><td>87.67</td><td>18.15</td><td>75.04</td><td>61.45</td><td>28.31</td><td>76.70</td><td>47.54</td><td>40.68</td><td>84.82</td><td>45.57</td> </tr> <tr class="section-row"><td colspan="17">Supervised + KD from dense unsupervised</td></tr> <tr> <td><a href="https://huggingface.co/lightonai/ModernColBERT-embed-base-supervised">ModernColBERT-embed-base-supervised</a></td> <td class="avg-col">50.72</td> <td>40.09</td><td>35.56</td><td>71.12</td><td>25.53</td><td>44.27</td><td>86.96</td><td>18.19</td><td>73.78</td><td>58.89</td><td>32.95</td><td>71.49</td><td>43.23</td><td>42.55</td><td>70.51</td><td>45.72</td> </tr> <tr> <td><a href="https://huggingface.co/lightonai/ModernColBERT-embed-base">ModernColBERT-embed-base</a></td> <td class="avg-col">55.12</td> <td>41.50</td><td>36.51</td><td>77.46</td><td>33.77</td><td>52.45</td><td>86.26</td><td>18.66</td><td>74.90</td><td>62.24</td><td>37.27</td><td><strong>80.07</strong></td><td><strong>48.27</strong></td><td>41.60</td><td>89.71</td><td><strong>46.17</strong></td> </tr> <tr class="section-row"><td colspan="17">ColBERT-Zero</td></tr> <tr> <td><a href="https://huggingface.co/lightonai/ColBERT-Zero-unsupervised">Unsupervised</a></td> <td class="avg-col">51.44</td> <td>45.38</td><td>36.88</td><td>67.82</td><td>22.59</td><td>51.53</td><td>87.78</td><td>22.30</td><td>76.76</td><td>58.80</td><td>24.24</td><td>68.29</td><td>43.16</td><td><strong>45.76</strong></td><td>81.58</td><td>38.78</td> </tr> <tr> <td><a href="https://huggingface.co/lightonai/ColBERT-Zero-supervised">Supervised</a></td> <td class="avg-col">51.81</td> <td>42.45</td><td>35.60</td><td>74.72</td><td>23.83</td><td>41.81</td><td>87.19</td><td>19.85</td><td>73.71</td><td>61.95</td><td>35.01</td><td>71.37</td><td>46.20</td><td>45.16</td><td>72.61</td><td>45.68</td> </tr> <tr> <td><a href="https://huggingface.co/lightonai/ColBERT-Zero">Distilled</a></td> <td class="avg-col"><strong>55.43</strong></td> <td>42.62</td><td>37.28</td><td>78.69</td><td>36.13</td><td>53.07</td><td>85.24</td><td>19.88</td><td>76.50</td><td>61.66</td><td>35.72</td><td>79.41</td><td>47.48</td><td>41.34</td><td>90.59</td><td>45.80</td> </tr> <tr class="section-row"><td colspan="17">ColBERT-Zero-noprompts</td></tr> <tr> <td><a href="https://huggingface.co/lightonai/ColBERT-Zero-unsupervised-noprompts">Unsupervised</a></td> <td class="avg-col">51.70</td> <td>45.31</td><td>34.72</td><td>73.55</td><td>23.26</td><td>52.56</td><td>88.15</td><td><strong>22.63</strong></td><td>76.10</td><td>59.18</td><td>24.24</td><td>66.66</td><td>42.61</td><td>45.56</td><td>81.88</td><td>39.15</td> </tr> <tr> <td><a href="https://huggingface.co/lightonai/ColBERT-Zero-supervised-noprompts">Supervised</a></td> <td class="avg-col">52.39</td> <td>43.36</td><td>36.01</td><td>72.42</td><td>23.79</td><td>47.42</td><td>87.79</td><td>21.30</td><td>73.85</td><td><strong>62.25</strong></td><td>31.61</td><td>70.32</td><td>44.07</td><td>44.03</td><td>85.54</td><td>42.11</td> </tr> <tr> <td><a href="https://huggingface.co/lightonai/ColBERT-Zero-noprompts">Distilled</a></td> <td class="avg-col">54.61</td> <td>43.14</td><td>36.60</td><td>78.60</td><td><strong>36.36</strong></td><td>49.49</td><td>88.05</td><td>19.13</td><td>76.42</td><td>61.73</td><td>32.70</td><td>76.99</td><td>47.69</td><td>40.21</td><td>85.97</td><td>46.01</td> </tr> </tbody> </table> </div> </body> </html>

Limitations & Discussion

  • Data-specific findings. We deliberately used the Nomic Embed data mixture for controlled comparison. Some observations (particularly around prompt sensitivity) may not generalize to different or stronger training configurations.
  • Scale vs. objective. The gains from multi-vector pre-training likely reflect more training time in the multi-vector setting, rather than the contrastive objective itself. Performing KD alone at a larger scale might yield similar or superior results due to the higher quality of the distillation signal. Our study uses the conventional setup where training scale is inversely proportional to signal quality, reflecting the higher cost of generating high-quality labels.
  • Prompt sensitivity decreases with stronger fine-tuning. When experimenting with stronger fine-tuning data (e.g., NV-Retriever), adding prompts on top of a model pre-trained without them did not degrade results the way it did with ColBERT-Zero. With enough downstream training, the model can adapt to an initial mismatch.

Serving at Scale

For production deployment of ColBERT-Zero and other multi-vector models, check out NextPlaid and FastPlaid, our production-grade engines for multi-vector retrieval.

Resources

Model Details

Model Description

  • Model Type: PyLate model
<!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
  • Document Length: 519 tokens
  • Query Length: 39 tokens
  • Output Dimensionality: 128 tokens
  • Similarity Function: MaxSim
  • Training Dataset:
    • train
<!-- - **Language:** Unknown --> <!-- - **License:** Unknown -->

Model Sources

Full Model Architecture

ColBERT(
  (0): Transformer({'max_seq_length': 518, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
  (1): Dense({'in_features': 768, 'out_features': 128, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity', 'use_residual': False})
)

Usage

First install the PyLate library:

pip install -U pylate

Retrieval

Use this model with PyLate to index and retrieve documents. The index uses FastPLAID for efficient similarity search.

Indexing documents

Load the ColBERT model and initialize the PLAID index, then encode and index your documents:

from pylate import indexes, models, retrieve

# Step 1: Load the ColBERT model
model = models.ColBERT(
    model_name_or_path="pylate_model_id",
)

# Step 2: Initialize the PLAID index
index = indexes.PLAID(
    index_folder="pylate-index",
    index_name="index",
    override=True,  # This overwrites the existing index if any
)

# Step 3: Encode the documents
documents_ids = ["1", "2", "3"]
documents = ["document 1 text", "document 2 text", "document 3 text"]

documents_embeddings = model.encode(
    documents,
    batch_size=32,
    is_query=False,  # Ensure that it is set to False to indicate that these are documents, not queries
    show_progress_bar=True,
)

# Step 4: Add document embeddings to the index by providing embeddings and corresponding ids
index.add_documents(
    documents_ids=documents_ids,
    documents_embeddings=documents_embeddings,
)

Note that you do not have to recreate the index and encode the documents every time. Once you have created an index and added the documents, you can re-use the index later by loading it:

# To load an index, simply instantiate it with the correct folder/name and without overriding it
index = indexes.PLAID(
    index_folder="pylate-index",
    index_name="index",
)

Retrieving top-k documents for queries

Once the documents are indexed, you can retrieve the top-k most relevant documents for a given set of queries. To do so, initialize the ColBERT retriever with the index you want to search in, encode the queries and then retrieve the top-k documents to get the top matches ids and relevance scores:

# Step 1: Initialize the ColBERT retriever
retriever = retrieve.ColBERT(index=index)

# Step 2: Encode the queries
queries_embeddings = model.encode(
    ["query for document 3", "query for document 1"],
    batch_size=32,
    is_query=True,  #  # Ensure that it is set to False to indicate that these are queries
    show_progress_bar=True,
)

# Step 3: Retrieve top-k documents
scores = retriever.retrieve(
    queries_embeddings=queries_embeddings,
    k=10,  # Retrieve the top 10 matches for each query
)

Reranking

If you only want to use the ColBERT model to perform reranking on top of your first-stage retrieval pipeline without building an index, you can simply use rank function and pass the queries and documents to rerank:

from pylate import rank, models

queries = [
    "query A",
    "query B",
]

documents = [
    ["document A", "document B"],
    ["document 1", "document C", "document B"],
]

documents_ids = [
    [1, 2],
    [1, 3, 2],
]

model = models.ColBERT(
    model_name_or_path="pylate_model_id",
)

queries_embeddings = model.encode(
    queries,
    is_query=True,
)

documents_embeddings = model.encode(
    documents,
    is_query=False,
)

reranked_documents = rank.rerank(
    documents_ids=documents_ids,
    queries_embeddings=queries_embeddings,
    documents_embeddings=documents_embeddings,
)
<!-- ### Direct Usage (Transformers) <details><summary>Click to see the direct usage in Transformers</summary> </details> --> <!-- ### Downstream Usage (Sentence Transformers) You can finetune this model on your own dataset. <details><summary>Click to expand</summary> </details> --> <!-- ### Out-of-Scope Use *List how the model may foreseeably be misused and address what users ought not to do with the model.* -->

Evaluation

Metrics

Py Late Information Retrieval

  • Dataset: ['NanoClimateFEVER', 'NanoDBPedia', 'NanoFEVER', 'NanoFiQA2018', 'NanoHotpotQA', 'NanoMSMARCO', 'NanoNFCorpus', 'NanoNQ', 'NanoQuoraRetrieval', 'NanoSCIDOCS', 'NanoArguAna', 'NanoSciFact', 'NanoTouche2020']
  • Evaluated with <code>pylate.evaluation.pylate_information_retrieval_evaluator.PyLateInformationRetrievalEvaluator</code>

| Metric | NanoClimateFEVER | NanoDBPedia | NanoFEVER | NanoFiQA2018 | NanoHotpotQA | NanoMSMARCO | NanoNFCorpus | NanoNQ | NanoQuoraRetrieval | NanoSCIDOCS | NanoArguAna | NanoSciFact | NanoTouche2020 | |:--------------------|:-----------------|:------------|:-----------|:-------------|:-------------|:------------|:-------------|:-----------|:-------------------|:------------|:------------|:------------|:---------------| | MaxSim_accuracy@1 | 0.36 | 0.86 | 0.96 | 0.58 | 0.98 | 0.6 | 0.58 | 0.62 | 0.92 | 0.48 | 0.24 | 0.7 | 0.8163 | | MaxSim_accuracy@3 | 0.68 | 0.94 | 1.0 | 0.66 | 1.0 | 0.68 | 0.68 | 0.84 | 0.98 | 0.74 | 0.64 | 0.82 | 0.9796 | | MaxSim_accuracy@5 | 0.76 | 0.94 | 1.0 | 0.72 | 1.0 | 0.78 | 0.72 | 0.88 | 1.0 | 0.76 | 0.7 | 0.88 | 0.9796 | | MaxSim_accuracy@10 | 0.88 | 0.98 | 1.0 | 0.82 | 1.0 | 0.9 | 0.76 | 0.9 | 1.0 | 0.9 | 0.9 | 0.92 | 0.9796 | | MaxSim_precision@1 | 0.36 | 0.86 | 0.96 | 0.58 | 0.98 | 0.6 | 0.58 | 0.62 | 0.92 | 0.48 | 0.24 | 0.7 | 0.8163 | | MaxSim_precision@3 | 0.2867 | 0.7333 | 0.3533 | 0.3267 | 0.6 | 0.2267 | 0.4267 | 0.28 | 0.3933 | 0.4067 | 0.2133 | 0.2867 | 0.7279 | | MaxSim_precision@5 | 0.22 | 0.66 | 0.212 | 0.248 | 0.368 | 0.156 | 0.396 | 0.176 | 0.248 | 0.304 | 0.14 | 0.196 | 0.6653 | | MaxSim_precision@10 | 0.148 | 0.584 | 0.11 | 0.148 | 0.186 | 0.09 | 0.316 | 0.096 | 0.128 | 0.204 | 0.09 | 0.102 | 0.5388 | | MaxSim_recall@1 | 0.18 | 0.108 | 0.8967 | 0.3526 | 0.49 | 0.6 | 0.066 | 0.59 | 0.7973 | 0.1027 | 0.24 | 0.675 | 0.0564 | | MaxSim_recall@3 | 0.36 | 0.2161 | 0.9633 | 0.4742 | 0.9 | 0.68 | 0.1036 | 0.78 | 0.932 | 0.2507 | 0.64 | 0.79 | 0.1493 | | MaxSim_recall@5 | 0.429 | 0.2933 | 0.9633 | 0.546 | 0.92 | 0.78 | 0.1297 | 0.81 | 0.9627 | 0.3107 | 0.7 | 0.87 | 0.2241 | | MaxSim_recall@10 | 0.5537 | 0.4273 | 0.98 | 0.6425 | 0.93 | 0.9 | 0.1635 | 0.86 | 0.9727 | 0.4167 | 0.9 | 0.91 | 0.3475 | | MaxSim_ndcg@10 | 0.4511 | 0.7326 | 0.9624 | 0.5786 | 0.9243 | 0.7242 | 0.4055 | 0.7475 | 0.9376 | 0.4124 | 0.562 | 0.802 | 0.6176 | | MaxSim_mrr@10 | 0.5353 | 0.8995 | 0.9767 | 0.6434 | 0.99 | 0.671 | 0.6304 | 0.7342 | 0.954 | 0.6184 | 0.4557 | 0.7717 | 0.8776 | | MaxSim_map@100 | 0.3571 | 0.5806 | 0.9478 | 0.5234 | 0.8945 | 0.6766 | 0.1959 | 0.7036 | 0.9156 | 0.3294 | 0.4584 | 0.7652 | 0.4571 |

Nano BEIR

  • Dataset: NanoBEIR_mean
  • Evaluated with <code>pylate.evaluation.nano_beir_evaluator.NanoBEIREvaluator</code>

| Metric | Value | |:--------------------|:-----------| | MaxSim_accuracy@1 | 0.6689 | | MaxSim_accuracy@3 | 0.8184 | | MaxSim_accuracy@5 | 0.8554 | | MaxSim_accuracy@10 | 0.9184 | | MaxSim_precision@1 | 0.6689 | | MaxSim_precision@3 | 0.4047 | | MaxSim_precision@5 | 0.3069 | | MaxSim_precision@10 | 0.2108 | | MaxSim_recall@1 | 0.3965 | | MaxSim_recall@3 | 0.5569 | | MaxSim_recall@5 | 0.6107 | | MaxSim_recall@10 | 0.6926 | | MaxSim_ndcg@10 | 0.6814 | | MaxSim_mrr@10 | 0.7506 | | MaxSim_map@100 | 0.6004 |

<!-- ## Bias, Risks and Limitations *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.* --> <!-- ### Recommendations *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.* -->

Training Details

Training Dataset

train

  • Dataset: train
  • Size: 640,000 training samples
  • Columns: <code>query_id</code>, <code>document_ids</code>, and <code>scores</code>
  • Approximate statistics based on the first 1000 samples: | | query_id | document_ids | scores | |:--------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------|:------------------------------------| | type | int | list | list | | details | <ul><li>836: ~0.10%</li><li>3582: ~0.10%</li>...</ul> | <ul><li>size: 32 elements</li></ul> | <ul><li>size: 32 elements</li></ul> |
  • Samples: | query_id | document_ids | scores | |:--------------------|:----------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------| | <code>685613</code> | <code>[7546874, 1176459, 197677, 2306318, 8541504, ...]</code> | <code>[0.9999999992804947, 0.24845418756716053, 0.7594154013647826, 0.26644182105618575, 0.390668914839766, ...]</code> | | <code>237784</code> | <code>[6366584, 4034101, 2325374, 6914618, 6042146, ...]</code> | <code>[0.9999999991784339, 0.42233632827946693, 0.5956354295491569, 0.12644415907455164, 0.6636713730105909, ...]</code> | | <code>904294</code> | <code>[448408, 8743975, 49600, 7339401, 2714261, ...]</code> | <code>[0.9999999991841937, 0.877629062381539, 0.8330146583389045, 0.3116634796692611, 0.4633524534142185, ...]</code> |
  • Loss: <code>pylate.losses.distillation.Distillation</code>

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • gradient_accumulation_steps: 2
  • learning_rate: 1e-05
  • num_train_epochs: 1.0
  • bf16: True
  • dataloader_num_workers: 4
  • ddp_find_unused_parameters: False

All Hyperparameters

<details><summary>Click to expand</summary>
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 2
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1.0
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 3
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 4
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: False
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}
</details>

Training Logs

<details><summary>Click to expand</summary>

| Epoch | Step | Training Loss | NanoClimateFEVER_MaxSim_ndcg@10 | NanoDBPedia_MaxSim_ndcg@10 | NanoFEVER_MaxSim_ndcg@10 | NanoFiQA2018_MaxSim_ndcg@10 | NanoHotpotQA_MaxSim_ndcg@10 | NanoMSMARCO_MaxSim_ndcg@10 | NanoNFCorpus_MaxSim_ndcg@10 | NanoNQ_MaxSim_ndcg@10 | NanoQuoraRetrieval_MaxSim_ndcg@10 | NanoSCIDOCS_MaxSim_ndcg@10 | NanoArguAna_MaxSim_ndcg@10 | NanoSciFact_MaxSim_ndcg@10 | NanoTouche2020_MaxSim_ndcg@10 | NanoBEIR_mean_MaxSim_ndcg@10 | |:------:|:-----:|:-------------:|:-------------------------------:|:--------------------------:|:------------------------:|:---------------------------:|:---------------------------:|:--------------------------:|:---------------------------:|:---------------------:|:---------------------------------:|:--------------------------:|:--------------------------:|:--------------------------:|:-----------------------------:|:----------------------------:| | 0.0025 | 50 | 0.0187 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.0275 | 550 | 0.0155 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.0525 | 1050 | 0.0146 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.075 | 1500 | 0.0141 | 0.4530 | 0.7263 | 0.9670 | 0.5786 | 0.9313 | 0.7349 | 0.3994 | 0.7587 | 0.9506 | 0.4292 | 0.5152 | 0.8059 | 0.6139 | 0.6818 | | 0.0775 | 1550 | 0.0139 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.1025 | 2050 | 0.0138 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.1275 | 2550 | 0.0132 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.15 | 3000 | 0.0132 | 0.4562 | 0.7260 | 0.9738 | 0.5756 | 0.9221 | 0.7378 | 0.4021 | 0.7555 | 0.9473 | 0.4276 | 0.5376 | 0.8082 | 0.6206 | 0.6839 | | 0.1525 | 3050 | 0.013 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.1775 | 3550 | 0.0129 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.2025 | 4050 | 0.0129 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.225 | 4500 | 0.0126 | 0.4551 | 0.7381 | 0.9624 | 0.5890 | 0.9238 | 0.7381 | 0.3978 | 0.7522 | 0.9400 | 0.4206 | 0.5455 | 0.8141 | 0.6184 | 0.6842 | | 0.2275 | 4550 | 0.0124 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.2525 | 5050 | 0.0126 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.2775 | 5550 | 0.0123 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.3 | 6000 | 0.012 | 0.4474 | 0.7375 | 0.9635 | 0.5908 | 0.9282 | 0.7416 | 0.4064 | 0.7551 | 0.9424 | 0.4198 | 0.5592 | 0.8074 | 0.6191 | 0.6860 | | 0.3025 | 6050 | 0.0125 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.3275 | 6550 | 0.012 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.3525 | 7050 | 0.0122 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.375 | 7500 | 0.0123 | 0.4534 | 0.7266 | 0.9631 | 0.5875 | 0.9294 | 0.7349 | 0.4012 | 0.7459 | 0.9417 | 0.4195 | 0.5608 | 0.8060 | 0.6205 | 0.6839 | | 0.3775 | 7550 | 0.0118 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.4025 | 8050 | 0.0118 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.4275 | 8550 | 0.0119 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.45 | 9000 | 0.0114 | 0.4537 | 0.7219 | 0.9631 | 0.5837 | 0.9290 | 0.7374 | 0.4032 | 0.7522 | 0.9496 | 0.4134 | 0.5572 | 0.8113 | 0.6190 | 0.6842 | | 0.4525 | 9050 | 0.0117 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.4775 | 9550 | 0.0119 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.5025 | 10050 | 0.0112 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.525 | 10500 | 0.0117 | 0.4541 | 0.7325 | 0.9653 | 0.5803 | 0.9243 | 0.7357 | 0.4092 | 0.7566 | 0.9468 | 0.4169 | 0.5596 | 0.8040 | 0.6177 | 0.6849 | | 0.5275 | 10550 | 0.0116 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.5525 | 11050 | 0.0115 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.5775 | 11550 | 0.0112 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.6 | 12000 | 0.0112 | 0.4606 | 0.7310 | 0.9624 | 0.5862 | 0.9243 | 0.7341 | 0.4085 | 0.7523 | 0.9463 | 0.4192 | 0.5708 | 0.8086 | 0.6201 | 0.6865 | | 0.6025 | 12050 | 0.0116 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.6275 | 12550 | 0.0113 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.6525 | 13050 | 0.0115 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.675 | 13500 | 0.0111 | 0.4505 | 0.7294 | 0.9653 | 0.5796 | 0.9289 | 0.7348 | 0.4063 | 0.7553 | 0.9451 | 0.4205 | 0.5627 | 0.8034 | 0.6173 | 0.6845 | | 0.6775 | 13550 | 0.0112 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.7025 | 14050 | 0.0112 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.7275 | 14550 | 0.0109 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.75 | 15000 | 0.0113 | 0.4544 | 0.7281 | 0.9624 | 0.5785 | 0.9227 | 0.7241 | 0.4081 | 0.7495 | 0.9391 | 0.4158 | 0.5639 | 0.8020 | 0.6195 | 0.6822 | | 0.7525 | 15050 | 0.0112 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.7775 | 15550 | 0.011 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.8025 | 16050 | 0.0106 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.825 | 16500 | 0.0113 | 0.4520 | 0.7354 | 0.9624 | 0.5784 | 0.9279 | 0.7340 | 0.4042 | 0.7505 | 0.9388 | 0.4117 | 0.5630 | 0.8020 | 0.6204 | 0.6831 | | 0.8275 | 16550 | 0.0107 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.8525 | 17050 | 0.0109 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.8775 | 17550 | 0.011 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.9 | 18000 | 0.0109 | 0.4548 | 0.7336 | 0.9624 | 0.5791 | 0.9243 | 0.7313 | 0.4067 | 0.7475 | 0.9376 | 0.4132 | 0.5625 | 0.8094 | 0.6214 | 0.6834 | | 0.9025 | 18050 | 0.011 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.9275 | 18550 | 0.0109 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.9525 | 19050 | 0.0107 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | 0.975 | 19500 | 0.0111 | 0.4511 | 0.7326 | 0.9624 | 0.5786 | 0.9243 | 0.7242 | 0.4055 | 0.7475 | 0.9376 | 0.4124 | 0.5620 | 0.8020 | 0.6176 | 0.6814 | | 0.9775 | 19550 | 0.0112 | - | - | - | - | - | - | - | - | - | - | - | - | - | - |

</details>

Framework Versions

  • Python: 3.13.0
  • Sentence Transformers: 5.1.1
  • PyLate: 1.3.4
  • Transformers: 4.48.3
  • PyTorch: 2.6.0
  • Accelerate: 1.12.0
  • Datasets: 4.4.1
  • Tokenizers: 0.21.0

Citation

BibTeX

ColBERT-Zero

@misc{chaffin2026colbertzeropretrainpretraincolbert,
  title         = {ColBERT-Zero: To Pre-train Or Not To Pre-train ColBERT models}, 
  author        = {Antoine Chaffin and Luca Arnaboldi and Amรฉlie Chatelain and Florent Krzakala},
  year          = {2026},
  eprint        = {2602.16609},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL},
  url           = {https://arxiv.org/abs/2602.16609}, 
}

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084"
}

PyLate

@inproceedings{DBLP:conf/cikm/ChaffinS25,
  author       = {Antoine Chaffin and
                  Rapha{"{e}}l Sourty},
  editor       = {Meeyoung Cha and
                  Chanyoung Park and
                  Noseong Park and
                  Carl Yang and
                  Senjuti Basu Roy and
                  Jessie Li and
                  Jaap Kamps and
                  Kijung Shin and
                  Bryan Hooi and
                  Lifang He},
  title        = {PyLate: Flexible Training and Retrieval for Late Interaction Models},
  booktitle    = {Proceedings of the 34th {ACM} International Conference on Information
                  and Knowledge Management, {CIKM} 2025, Seoul, Republic of Korea, November
                  10-14, 2025},
  pages        = {6334--6339},
  publisher    = {{ACM}},
  year         = {2025},
  url          = {https://github.com/lightonai/pylate},
  doi          = {10.1145/3746252.3761608},
}

Nomic Embed

@article{DBLP:journals/tmlr/NussbaumMMD25,
  author       = {Zach Nussbaum and
                  John Xavier Morris and
                  Andriy Mulyar and
                  Brandon Duderstadt},
  title        = {Nomic Embed: Training a Reproducible Long Context Text Embedder},
  journal      = {Trans. Mach. Learn. Res.},
  volume       = {2025},
  year         = {2025},
  url          = {https://openreview.net/forum?id=IPmzyQSiQE},
  timestamp    = {Fri, 20 Jun 2025 14:19:48 +0200},
  biburl       = {https://dblp.org/rec/journals/tmlr/NussbaumMMD25.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}
<!-- ## Glossary *Clearly define terms in order to be accessible across audiences.* --> <!-- ## Model Card Authors *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.* --> <!-- ## Model Card Contact *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.* -->

Author: lightonai

Likes: 8

Downloads: 0

Tags: PyLate, safetensors, modernbert, ColBERT, sentence-transformers, sentence-similarity, feature-extraction, generated_from_trainer, dataset_size:640000, loss:Distillation, en, arxiv:2602.16609, arxiv:2402.01613, arxiv:1908.10084, license:apache-2.0, model-index, text-embeddings-inference, endpoints_compatible, region:eu

vadimbelsky/emirati-vits-male-1.0


language: ar tags:

  • text-to-speech
  • tts
  • vits
  • emirati
  • gulf-arabic
  • male-voice
  • arabic
  • nemo library_name: nemo license: apache-2.0 pipeline_tag: text-to-speech

Emirati VITS Male TTS Model

Bilingual VITS-based Text-to-Speech model with male voice for Emirati Arabic and English. Fine-tuned on 70 hours of bilingual audio data, this model delivers natural, conversational speech optimized for call center applications and general-purpose TTS in both languages.

Model Overview

Key Features:

  • Bilingual: Native support for both Emirati Arabic and English
  • Single-speaker male voice model
  • 22050 Hz sample rate
  • Emirati Arabic dialect-specific phonemization
  • Seamless Arabic-English code-switching
  • Emirati-specific text normalization (numbers, dates, currencies)
  • Built with NVIDIA NeMo 2.6.1

Model Details

Architecture

  • Model Type: VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech)
  • Framework: NVIDIA NeMo 2.6.1
  • Hidden Channels: 192
  • Filter Channels: 768
  • Number of Layers: 6
  • Attention Heads: 2

Training

  • Epochs: 179
  • Final Loss: 20.47
  • Dataset: 70 hours of bilingual audio
  • Use Case: Call center applications, conversational speech
  • Sample Rate: 22050 Hz
  • Mel Channels: 80

Language Support

  • Bilingual Model: Equal support for Emirati Arabic and English
    • Emirati Arabic: Gulf Arabic dialect with native phonemization
    • English: Full English TTS support
    • Code-switching: Seamless mixing of Arabic and English in the same sentence
  • G2P: EmiratiG2P with dialect-specific phonemization (from custom NeMo fork)
  • Text Tokenizer: IPATokenizer (International Phonetic Alphabet)

Quick Start

macOS Users: See INSTALL_MAC.md for detailed installation instructions with quick start examples.

Prerequisites

Python: 3.10 or later (tested with Python 3.14)

# Install UV (modern Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create Python environment (3.10, 3.11, 3.12, 3.13, or 3.14)
uv venv --python 3.14
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Installation

# Install PyTorch (CPU version shown, for GPU use CUDA-enabled version)
uv pip install torch torchvision torchaudio

# Install NeMo with TTS support
uv pip install nemo_toolkit[tts]

# Install audio libraries
uv pip install soundfile librosa

# Install Pynini (required for text normalization)
uv pip install pynini

# Install custom Emirati text normalization
uv pip install git+https://github.com/VadzimBelski-ScienceSoft/NeMo-text-processing.git

Inference

Using the provided inference.py CLI:

# Download the model files first
git clone https://huggingface.co/vadimbelsky/emirati-vits-male-1.0
cd emirati-vits-male-1.0

# Arabic example
python inference.py \
  --text "ู…ุฑุญุจุงุŒ ูƒูŠู ุญุงู„ูƒ ุงู„ูŠูˆู…ุŸ" \
  --out output_ar.wav \
  --ckpt VITS_emirati_v3--loss_gen_all=20.4726-epoch=179-last.ckpt \
  --hparams hparams.yaml \
  --require-normalize

# English example
python inference.py \
  --text "Hello, how are you today?" \
  --out output_en.wav \
  --ckpt VITS_emirati_v3--loss_gen_all=20.4726-epoch=179-last.ckpt \
  --hparams hparams.yaml \
  --require-normalize

Using Python directly:

from pathlib import Path
import torch
from nemo.collections.tts.models import VitsModel
from bilingual_text_normalizer import BilingualTextNormalizer
import soundfile as sf

# Load model
model = VitsModel.restore_from(
    "VITS_emirati_v3--loss_gen_all=20.4726-epoch=179-last.ckpt",
    map_location=torch.device("cpu")  # or "cuda" for GPU
)
model.eval()

# Initialize bilingual normalizer
normalizer = BilingualTextNormalizer(ar_lang="ar_ae", en_lang="en")

# Synthesize speech
text = "ู…ุฑุญุจุงุŒ ูƒูŠู ุญุงู„ูƒุŸ"  # or English text
normalized = normalizer.normalize(text)

with model.nemo_infer():
    tokens = model.parse(normalized)
    audio = model.convert_text_to_waveform(tokens=tokens)

# Save audio
sf.write("output.wav", audio.squeeze().cpu().numpy(), 22050)

Usage Examples

Example 1: Basic Arabic TTS

arabic_text = "ุงู„ูŠูˆู… ุงู„ุฌูˆ ุฌู…ูŠู„ ููŠ ุฏุจูŠ"
normalized = normalizer.normalize(arabic_text)

with model.nemo_infer():
    tokens = model.parse(normalized)
    audio = model.convert_text_to_waveform(tokens=tokens)

sf.write("weather.wav", audio.squeeze().cpu().numpy(), 22050)

Example 2: Numbers in Emirati Dialect

# Numbers are automatically converted to Emirati pronunciation
text_with_numbers = "ุงู„ุณุนุฑ 1500 ุฏุฑู‡ู…"  # Will pronounce as "ุฃู„ู ูˆุฎู…ุณ ู…ูŠุฉ"
normalized = normalizer.normalize(text_with_numbers)

with model.nemo_infer():
    tokens = model.parse(normalized)
    audio = model.convert_text_to_waveform(tokens=tokens)

sf.write("price.wav", audio.squeeze().cpu().numpy(), 22050)

Example 3: Mixed Arabic-English Code-Switching

mixed_text = "ุฃู†ุง ุฃุนู…ู„ ููŠ Microsoft ููŠ Dubai"
normalized = normalizer.normalize(mixed_text)

with model.nemo_infer():
    tokens = model.parse(normalized)
    audio = model.convert_text_to_waveform(tokens=tokens)

sf.write("mixed.wav", audio.squeeze().cpu().numpy(), 22050)

OpenAI-Compatible TTS Server

For easy integration with applications expecting OpenAI-compatible APIs, we provide a FastAPI server that implements the /v1/audio/speech endpoint.

Installation

# Install server dependencies
uv pip install ".[server]"

Running the Server

# Start the server (default: http://localhost:8000)
python openai_tts_server.py

# Or with custom settings
python openai_tts_server.py \
  --checkpoint VITS_emirati_v3--loss_gen_all=20.4726-epoch=179-last.ckpt \
  --device cuda \
  --host 0.0.0.0 \
  --port 8000

Usage Examples

Using curl:

# Generate speech from Arabic text
curl http://localhost:8000/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "ู…ุฑุญุจุงุŒ ูƒูŠู ุญุงู„ูƒ ุงู„ูŠูˆู…ุŸ",
    "voice": "emirati-male",
    "response_format": "mp3",
    "speed": 1.0
  }' \
  --output speech.mp3

# Generate speech from English text
curl http://localhost:8000/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Hello, how are you today?",
    "voice": "emirati-male",
    "response_format": "wav"
  }' \
  --output speech.wav

Using Python (OpenAI client):

from openai import OpenAI

# Point to local server
client = OpenAI(
    api_key="not-needed",  # API key not required for local server
    base_url="http://localhost:8000/v1"
)

# Generate speech
response = client.audio.speech.create(
    model="tts-1",
    voice="emirati-male",
    input="ู…ุฑุญุจุงุŒ ูƒูŠู ุญุงู„ูƒุŸ",
    response_format="mp3",
    speed=1.0
)

# Save to file
response.stream_to_file("output.mp3")

Using Python (requests):

import requests

url = "http://localhost:8000/v1/audio/speech"
data = {
    "model": "tts-1",
    "input": "ุงู„ุณู„ุงู… ุนู„ูŠูƒู…",
    "voice": "emirati-male",
    "response_format": "wav",
    "speed": 1.0
}

response = requests.post(url, json=data)

with open("output.wav", "wb") as f:
    f.write(response.content)

API Parameters

  • model: Model identifier (use "tts-1" or "tts-1-hd", both map to Emirati VITS)
  • input: Text to synthesize (max 4096 characters)
  • voice: Voice name (only "emirati-male" supported)
  • response_format: Audio format - "mp3", "wav", "flac", "opus", "aac", or "pcm"
  • speed: Playback speed (0.25 to 4.0, default: 1.0)

Server Endpoints

  • POST /v1/audio/speech - Text-to-speech synthesis (OpenAI compatible)
  • GET /health - Health check endpoint
  • GET / - API information
  • GET /docs - Interactive API documentation (Swagger UI)

Audio Samples

Emirati Arabic:

<audio controls src="https://huggingface.co/vadimbelsky/emirati-vits-male-1.0/resolve/main/vits_ar_latest.wav"></audio>

English:

<audio controls src="https://huggingface.co/vadimbelsky/emirati-vits-male-1.0/resolve/main/vits_en_v3.wav"></audio>

Model Configuration

Key parameters from hparams.yaml:

  • Pitch Range: 50-400 Hz (male voice)
  • Segment Size: 12288
  • Mel Frequency Bins: 80
  • FFT Size: 1024
  • Hop Length: 256
  • Window Size: 1024
  • Spectral Normalization: Enabled

Text Normalization Features

The model uses custom Emirati Arabic text normalization (ar_ae language code) with:

  • Sun letter assimilation: Enabled
  • Vowel insertion: Enabled (vowel: 'a')
  • English G2P fallback: Enabled for code-switching
  • Emirati-specific number forms: E.g., "1500" โ†’ "ุฃู„ู ูˆุฎู…ุณ ู…ูŠุฉ"
  • Currency support: 20+ regional and international currencies
  • Date normalization: Emirati dialect date expressions

The bilingual text normalizer (bilingual_text_normalizer.py) automatically:

  • Detects script type (Arabic, English, or mixed)
  • Routes to appropriate normalizer (ar_ae for Arabic, en for English)
  • Sanitizes Unicode punctuation and special characters

Note: Requires the custom NeMo-text-processing library with ar_ae support.

Limitations & Known Issues

  • Dialect specificity: Optimized for Emirati Arabic; may not generalize well to other Arabic dialects
  • Single speaker: Male voice only, no multi-speaker support
  • Audio quality: 22050 Hz sample rate (standard quality, not high-fidelity)
  • Code-switching: Works best with Arabic primary text and occasional English words
  • Platform support: Pynini/OpenFst installation can be challenging on macOS/Windows (Linux recommended)
  • Performance: CPU inference is slow; GPU significantly improves speed

Technical Requirements

  • Python: 3.10 or later (tested with Python 3.14)
  • RAM: Minimum 4GB, recommended 8GB+
  • GPU: Optional but recommended (NVIDIA GPU with CUDA support)
  • Disk space: ~500MB for model + dependencies
  • Operating System: Linux (recommended), macOS (with caveats), Windows (WSL recommended)

License

This model is released under the Apache 2.0 license.

Citation

If you use this model in your research or applications, please cite the VITS paper:

@inproceedings{kim2021conditional,
  title={Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech},
  author={Kim, Jaehyeon and Kong, Jungil and Son, Juhee},
  booktitle={International Conference on Machine Learning (ICML)},
  year={2021}
}

For this specific model:

@misc{emirati-vits-male-2026,
  author={Belsky, Vadim},
  title={Emirati VITS Male TTS Model},
  year={2026},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/vadimbelsky/emirati-vits-male-1.0}}
}

Acknowledgments

Contact & Consultation

If you're looking for consultation on how to modify and fine-tune this model, I provide training and consultation services. Connect with me on LinkedIn or visit my blog.

Author: vadimbelsky

Likes: 5

Downloads: 0

Tags: nemo, text-to-speech, tts, vits, emirati, gulf-arabic, male-voice, arabic, ar, license:apache-2.0, region:us

KittenML/kitten-tts-nano-0.8-int8


license: apache-2.0

Kitten TTS Nano 0.8 ๐Ÿ˜ป

Kitten TTS is an open-source realistic text-to-speech model with 15 million parameters and around 20MB of filesize.

Some users are facing minor issues with this model. We are looking into it. Please report to us if you face any issues.

๐Ÿš€ Quick Start

Installation

pip install https://github.com/KittenML/KittenTTS/releases/download/0.8/kittentts-0.8.0-py3-none-any.whl

Basic Usage

from kittentts import KittenTTS
m = KittenTTS("KittenML/kitten-tts-nano-0.8-int8")
audio = m.generate("This high quality TTS model works without a GPU", voice='Jasper' )
# available_voices : ['Bella', 'Jasper', 'Luna', 'Bruno', 'Rosie', 'Hugo', 'Kiki', 'Leo']
# Save the audio
import soundfile as sf
sf.write('output.wav', audio, 24000)

Acknowledgements

StyleTTS 2 architecture

Author: KittenML

Likes: 5

Downloads: 0

Tags: onnx, license:apache-2.0, region:us

KittenML/kitten-tts-micro-0.8


license: apache-2.0

Kitten TTS Micro 0.8 ๐Ÿ˜ป

Kitten TTS is an open-source realistic text-to-speech model with 40 million parameters and around 40MB of filesize.

๐Ÿš€ Quick Start

Installation

pip install https://github.com/KittenML/KittenTTS/releases/download/0.8/kittentts-0.8.0-py3-none-any.whl

Basic Usage

from kittentts import KittenTTS
m = KittenTTS("KittenML/kitten-tts-micro-0.8")
audio = m.generate("This high quality TTS model works without a GPU", voice='Jasper' )
# available_voices : ['Bella', 'Jasper', 'Luna', 'Bruno', 'Rosie', 'Hugo', 'Kiki', 'Leo']
# Save the audio
import soundfile as sf
sf.write('output.wav', audio, 24000)

Acknowledgements

StyleTTS 2 architecture

Author: KittenML

Likes: 4

Downloads: 0

Tags: onnx, license:apache-2.0, region:us

tomngdev/MiniMax-M2.5-REAP-139B-A10B-GGUF


license: other license_name: modified-mit license_link: https://github.com/MiniMax-AI/MiniMax-M2.5/blob/main/LICENSE base_model:

  • cerebras/MiniMax-M2.5-REAP-139B-A10B pipeline_tag: text-generation

MiniMax-M2.5-REAP-139B-A10B-GGUF

Simple quantizations of cerebras/MiniMax-M2.5-REAP-139B-A10B using default params in llama-quantize. Nothing fancy

Author: tomngdev

Likes: 4

Downloads: 7

Tags: gguf, text-generation, base_model:cerebras/MiniMax-M2.5-REAP-139B-A10B, base_model:quantized:cerebras/MiniMax-M2.5-REAP-139B-A10B, license:other, endpoints_compatible, region:us, conversational

drbaph/FireRed-Image-Edit-1.0_ComfyUI_Quants


license: apache-2.0 language:

  • en
  • zh base_model:
  • FireRedTeam/FireRed-Image-Edit-1.0 pipeline_tag: image-text-to-image tags:
  • comfyui
  • comfy
  • fp8
  • nvfp4
  • quants
  • image
  • edit
  • firered

๐Ÿ—œ๏ธ ComfyUI Quantised Models for FireRed-Image-Edit-1.0

This repo contains the quantised, ComfyUI-ready models for FireRed-Image-Edit-1.0.

| Model | Download | |-------|----------| | firered_image_edit_1.0_fp8_e4m3fn.safetensors | โฌ‡๏ธ Download | | firered_image_edit_1.0_fp8_e5m2.safetensors | โฌ‡๏ธ Download | | firered_image_edit_1.0_nvfp4.safetensors | โฌ‡๏ธ Download |

We've provided a ComfyUI workflow to get you started: โฌ‡๏ธ Download Workflow

Screenshot 2026-02-19 144836

๐Ÿ–ผ๏ธ Sample Results

| Input | Output โ€” "make the bear wear a tuxedo a hat and a jacket, holding his flower" | Output โ€” "make the bear into a panda plushie wearing red top, holding his flower" | |:-----:|:-----:|:-----:| | Sample Input | Sample Output 0 | Sample Output 1 |


<p align="center"> <img src="./assets/logo.png" width="600"/> <p> <p align="center" style="line-height: 1;"> <a href="https://huggingface.co/FireRedTeam" target="_blank"><img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-FireRedTeam-ffc107?color=ffc107&logoColor=white" style="display: inline-block;"/></a> <a href="https://huggingface.co/FireRedTeam/FireRed-Image-Edit-1.0" target="_blank"><img alt="Hugging Face Model" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-FireRed--Image--Edit--1.0-red" style="display: inline-block;"/></a> <a href="https://huggingface.co/spaces/FireRedTeam/FireRed-Image-Edit-1.0" target="_blank"><img alt="Demo" src="https://img.shields.io/badge/%F0%9F%92%BB%20Demo-FireRed--Image--Edit--1.0-red" style="display: inline-block;"/></a> </p> <p align="center" style="line-height: 1;"> ๐Ÿค— <a href="https://huggingface.co/FireRedTeam/FireRed-Image-Edit-1.0">HuggingFace</a> | ๐Ÿ–ฅ๏ธ <a href="https://huggingface.co/spaces/FireRedTeam/FireRed-Image-Edit-1.0"> Demo</a> | ๐Ÿ“„ <a href="https://github.com/FireRedTeam/FireRed-Image-Edit/blob/main/assets/FireRed_Image_Edit_1_0_Techinical_Report.pdf">Technical Report</a> </p> <p align="center"> <img src="./assets/teaser.png" width="800"/> <p>

๐Ÿ”ฅ FireRed-Image-Edit

FireRed-Image-Edit is a general-purpose image editing model that delivers high-fidelity and consistent editing across a wide range of scenarios.

โœจ Key Features

  • Strong Editing Performance: FireRed-Image-Edit delivers leading open-source results with accurate instruction following, high image quality, and consistent visual coherence.
  • Native Editing Capability: Built directly from text-to-image foundation model and endowed with editing capabilities.
  • Text Style Preservation: Maintains text styles with high fidelity, achieving performance comparable to closed-source solutions.
  • Photo Restoration: High-quality old photo restoration and enhancement.
  • Multi-Image Editing: Flexible editing of multiple images such as virtual try-on.

๐Ÿ“ฐ News

  • 2026.02.14: We released FireRed-Image-Edit-1.0 model weights. Check more details in the Model Zoo section.
  • 2026.02.10: We released the Technical Report of FireRed-Image-Edit-1.0.

๐ŸŽจ Showcase

Some real outputs produced by FireRed-Image-Edit across genearl editing.

<p align="center"> <img src="./assets/showcase.png" width="800"/> <p>

๐Ÿ—‚๏ธ Model Zoo

<div style="overflow-x: auto; margin-bottom: 16px;"> <table style="border-collapse: collapse; width: 100%;"> <thead> <tr> <th style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de; background-color: #f6f8fa;">Models</th> <th style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de; background-color: #f6f8fa;">Task</th> <th style="padding: 8px; border: 1px solid #d0d7de; background-color: #f6f8fa;">Description</th> <th style="padding: 8px; border: 1px solid #d0d7de; background-color: #f6f8fa;">Download Link</th> </tr> </thead> <tbody> <tr> <td style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de;">FireRed-Image-Edit-1.0</td> <td style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de;">Image-Editing</td> <td style="padding: 8px; border: 1px solid #d0d7de;">General-purpose image editing model</td> <td style="padding: 8px; border: 1px solid #d0d7de;"> <span style="white-space: nowrap;">๐Ÿค—&nbsp;<a href="https://huggingface.co/FireRedTeam/FireRed-Image-Edit-1.0">HuggingFace</a></span> </td> </tr> <tr> <td style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de;">FireRed-Image-Edit-1.0-Distilled</td> <td style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de;">Image-Editing</td> <td style="padding: 8px; border: 1px solid #d0d7de;">Distilled version of FireRed-Image-Edit-1.0 for faster inference</td> <td style="padding: 8px; border: 1px solid #d0d7de;"> <span style="white-space: nowrap;">To be released</span> </td> </tr> <tr> <td style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de;">FireRed-Image</td> <td style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de;">Text-to-Image</td> <td style="padding: 8px; border: 1px solid #d0d7de;">High-quality text-to-image generation model</td> <td style="padding: 8px; border: 1px solid #d0d7de;"> <span style="white-space: nowrap;">To be released</span> </td> </tr> </tbody> </table> </div>

๐Ÿ—๏ธ Model Architecture

<p align="center"> <img src="./assets/architecture.png" width="800"/> <p>

๐Ÿ–Š๏ธ Citation

We kindly encourage citation of our work if you find it useful.

@article{firered2026rededit,
      title={FireRed-Image-Edit: A General-Purpose Image Editing Model}, 
      author={Super Intelligence Team},
      year={2026},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/xxxx.xxxxx}, 
}

โš ๏ธ Ethics Statement

FireRed-Image-Edit has not been specifically designed or comprehensively evaluated for every possible downstream application. Users should be aware of the potential risks and ethical considerations when using this project, and should use it responsibly and in compliance with all applicable laws and regulations.

  • Prohibited Use: This project must not be used to generate content that is illegal, defamatory, pornographic, harmful, or that violates the privacy, rights, or interests of individuals or organizations.
  • User Responsibility: Users are solely responsible for any content generated using this project. The authors and contributors assume no responsibility or liability for any misuse of the codebase or for any consequences resulting from its use.

๐Ÿค Acknowledgements

We would like to thank the developers of the amazing open-source projects, including Qwen-Image, Diffusers and HuggingFace

โญ Star History

Star History Chart

Author: drbaph

Likes: 3

Downloads: 0

Tags: comfyui, comfy, fp8, nvfp4, quants, image, edit, firered, image-text-to-image, en, zh, base_model:FireRedTeam/FireRed-Image-Edit-1.0, base_model:finetune:FireRedTeam/FireRed-Image-Edit-1.0, license:apache-2.0, region:us

BennyDaBall/MiniMax-M2.5-REAP-139B-A10B-GGUF


license: other base_model:

  • tomngdev/MiniMax-M2.5-REAP-139B-A10B-GGUF language:
  • en tags:
  • gguf
  • minimax
  • moe
  • reap
  • text-generation pipeline_tag: text-generation

MiniMax-M2.5-REAP-139B-A10B-GGUF

This is the REAP model in practical pants: high quality GGUF quants for local inference without setting your workstation on fire.

Built from:

  • Base: MiniMaxAI/MiniMax-M2.5
  • REAP source: tomngdev/MiniMax-M2.5-REAP-139B-A10B-GGUF (BF16 split)
  • Quantized locally with llama.cpp on Strix Halo + high RAM mode.

Available Quants

| Quant | Status | Size (GiB) | Notes | |---|---|---:|---| | Q8_0 | uploaded | 137.78 | Highest quality quant in this pack | | Q5_K_M | uploading | 92.33 | Better quality/size balance | | Q4_K_M | uploaded | 78.83 | Strong practical default |

File Layout

All quants are split GGUF sets (00001-of-00007 etc.) for safer handling of very large models.

Quality Notes

  • These are generated from BF16 REAP GGUF, not requantized from lower precision.
  • Token embedding and output tensors are kept at Q8_0 during quantization for quality retention.

Usage

Use any first shard with llama.cpp; it auto-discovers sibling shards:

llama-cli -m MiniMax-M2.5-REAP-Q4_K_M-00001-of-00007.gguf -ngl 0 -c 8192

Credits

  • MiniMaxAI for MiniMax-M2.5
  • tomngdev for the BF16 REAP GGUF release
  • BennyDaBall for this quant pack

Disclaimer

You are responsible for your own use, outputs, and compliance with applicable laws and platform policies.

Author: BennyDaBall

Likes: 3

Downloads: 0

Tags: gguf, minimax, moe, reap, text-generation, en, base_model:tomngdev/MiniMax-M2.5-REAP-139B-A10B-GGUF, base_model:quantized:tomngdev/MiniMax-M2.5-REAP-139B-A10B-GGUF, license:other, endpoints_compatible, region:us, conversational