MiMo v2.5

Model Information

Display Name: MiMo v2.5

API Model ID: xiaomi/mimo-v2.5

Category: Image To Text

Description: MiMo v2.5 is Xiaomi's omnimodal reasoning model with 310 billion total parameters and 15 billion active per token. It stands out as the only model in the MiMo family with native multimodal understanding — processing text, images, video, and audio in a unified architecture. **Architecture:** - 310B total parameters (sparse MoE), 15B active per token - 48 layers (1 dense + 47 MoE) with 256 routed experts, 8 selected per token - Hybrid attention: 9 full attention + 39 sliding window layers - Native Vision Encoder: 729M-param ViT (28 layers: 24 SWA + 4 Full) - Native Audio Encoder: 261M-param Audio Transformer (24 layers: 12 SWA + 12 Full) - Multi-Token Prediction: 329M parameters across 3 layers - Trained on ~48T tokens with FP8 mixed precision **Key Features:** - 1M token context window - Full omnimodal: text, image, video, and audio understanding - Function calling, tool use, and structured JSON outputs - Agentic RL post-training for reasoning tasks - OpenAI-compatible API - MIT License (fully open-source) **Best For:** - Multimodal applications (text + vision + audio) - General-purpose text generation at low cost - Code assistance and generation - Document analysis and summarization - High-volume API usage with excellent cost-efficiency

Context Window: 1,000,000 tokens

Max Output: 32,768 tokens

How to Use This Model

To use MiMo v2.5 via the HInow.ai API, use the model ID: xiaomi/mimo-v2.5

API Request Example (Chat/Text)


POST https://api.hinow.ai/v1/chat/completions
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json

{
  "model": "xiaomi/mimo-v2.5",
  "messages": [
    {"role": "user", "content": "Your message here"}
  ]
}
              

API Request Example (Image Generation)


POST https://api.hinow.ai/v1/images
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json

{
  "model": "xiaomi/mimo-v2.5",
  "prompt": "Your image description here"
}
              

Pricing

  • input: $0.182
  • output: $0.364

Available Parameters

  • temperature: Controls randomness (0-2). Default: 0.7 (Options: 0, 0.3, 0.5, 0.7, 1.0, 1.5, 2.0)
  • top_p: Nucleus sampling (0-1). Default: 0.9 (Options: 0.1, 0.5, 0.7, 0.9, 0.95, 1.0)
  • max_tokens: Max tokens to generate (1-32768) (Options: 512, 1024, 2048, 4096, 8192, 16384, 32768)
  • frequency_penalty: Reduce token repetition (0-2). Default: 0 (Options: 0, 0.3, 0.5, 0.7, 1.0, 1.5, 2.0)
  • presence_penalty: Penalize repeated topics (0-2). Default: 0 (Options: 0, 0.3, 0.5, 0.7, 1.0, 1.5, 2.0)
  • response_format: Output format (Options: text, json_object)

Quick Reference

To use this model, set: "model": "xiaomi/mimo-v2.5"

Featured: No

Documentation: https://hinow.ai/models/xiaomi/mimo-v2.5

API Endpoint: https://api.hinow.ai/v1

Back to Models

MiMo v2.5

xiaomi/mimo-v2.5

$0.182 / $0.364
per 1M tokens (in/out)

About

MiMo v2.5 is Xiaomi's omnimodal reasoning model with 310 billion total parameters and 15 billion active per token. It stands out as the only model in the MiMo family with native multimodal understanding — processing text, images, video, and audio in a unified architecture.

Architecture:

  • 310B total parameters (sparse MoE), 15B active per token
  • 48 layers (1 dense + 47 MoE) with 256 routed experts, 8 selected per token
  • Hybrid attention: 9 full attention + 39 sliding window layers
  • Native Vision Encoder: 729M-param ViT (28 layers: 24 SWA + 4 Full)
  • Native Audio Encoder: 261M-param Audio Transformer (24 layers: 12 SWA + 12 Full)
  • Multi-Token Prediction: 329M parameters across 3 layers
  • Trained on ~48T tokens with FP8 mixed precision

Key Features:

  • 1M token context window
  • Full omnimodal: text, image, video, and audio understanding
  • Function calling, tool use, and structured JSON outputs
  • Agentic RL post-training for reasoning tasks
  • OpenAI-compatible API
  • MIT License (fully open-source)

Best For:

  • Multimodal applications (text + vision + audio)
  • General-purpose text generation at low cost
  • Code assistance and generation
  • Document analysis and summarization
  • High-volume API usage with excellent cost-efficiency

Capabilities

Image To TextText To Text
Context1000K tokens
Max Output33K tokens

Parameters

temperature

Controls randomness (0-2). Default: 0.7

00.30.50.71.01.52.0
top_p

Nucleus sampling (0-1). Default: 0.9

0.10.50.70.90.951.0
max_tokens

Max tokens to generate (1-32768)

51210242048409681921638432768
frequency_penalty

Reduce token repetition (0-2). Default: 0

00.30.50.71.01.52.0
presence_penalty

Penalize repeated topics (0-2). Default: 0

00.30.50.71.01.52.0
response_format

Output format

textjson_object

Code Examples

curl -X POST https://api.hinow.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $HINOW_API_KEY" \
  -d '{
    "model": "xiaomi/mimo-v2.5",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "Describe this image"},
          {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
        ]
      }
    ],
    "parameters": {
      "temperature": "0",
      "top_p": "0.1",
      "max_tokens": "512",
      "frequency_penalty": "0",
      "presence_penalty": "0",
      "response_format": "text"
    }
  }'