MiMo v2 Flash

Model Information

Display Name: MiMo v2 Flash

API Model ID: xiaomi/mimo-v2-flash

Category: Text To Text

Description: MiMo v2 Flash is Xiaomi's fastest and most cost-effective reasoning model. With 309 billion total parameters and 15 billion active per token, it is optimized for efficient inference through hybrid attention and multi-token prediction, reducing KV-cache usage by ~6x. **Architecture:** - 309B total parameters (sparse MoE), 15B active per token - Hybrid attention with 5:1 SWA-to-Global ratio and 128-token window (~6x KV-cache reduction) - Multi-Token Prediction: 0.33B parameters per block with dense FFNs - Trained on 27T tokens with FP8 mixed precision - Optimized for efficient inference without sacrificing reasoning quality **Key Features:** - 256K token context window - Ultra-fast inference with reduced memory footprint - Large-scale agentic RL training (100K+ verifiable code tasks) - Function calling, tool use, and structured JSON outputs - OpenAI-compatible API - MIT License (fully open-source) **Best For:** - Low-latency applications requiring fast responses - High-throughput and high-volume workloads - Quick coding assistance and code completion - Budget-friendly AI integration - Real-time chat and interactive applications - Edge and resource-constrained deployments

Context Window: 256,000 tokens

Max Output: 32,768 tokens

How to Use This Model

To use MiMo v2 Flash via the HInow.ai API, use the model ID: xiaomi/mimo-v2-flash

API Request Example (Chat/Text)


POST https://api.hinow.ai/v1/chat/completions
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json

{
  "model": "xiaomi/mimo-v2-flash",
  "messages": [
    {"role": "user", "content": "Your message here"}
  ]
}
              

Pricing

  • input: $0.13
  • output: $0.39

Available Parameters

  • temperature: Controls randomness (0-2). Default: 0.7 (Options: 0, 0.3, 0.5, 0.7, 1.0, 1.5, 2.0)
  • top_p: Nucleus sampling (0-1). Default: 0.9 (Options: 0.1, 0.5, 0.7, 0.9, 0.95, 1.0)
  • max_tokens: Max tokens to generate (1-32768) (Options: 512, 1024, 2048, 4096, 8192, 16384, 32768)
  • frequency_penalty: Reduce token repetition (0-2). Default: 0 (Options: 0, 0.3, 0.5, 0.7, 1.0, 1.5, 2.0)
  • presence_penalty: Penalize repeated topics (0-2). Default: 0 (Options: 0, 0.3, 0.5, 0.7, 1.0, 1.5, 2.0)
  • response_format: Output format (Options: text, json_object)

Quick Reference

To use this model, set: "model": "xiaomi/mimo-v2-flash"

Featured: No

Documentation: https://hinow.ai/models/xiaomi/mimo-v2-flash

API Endpoint: https://api.hinow.ai/v1

Back to Models

MiMo v2 Flash

xiaomi/mimo-v2-flash

$0.130 / $0.390
per 1M tokens (in/out)

About

MiMo v2 Flash is Xiaomi's fastest and most cost-effective reasoning model. With 309 billion total parameters and 15 billion active per token, it is optimized for efficient inference through hybrid attention and multi-token prediction, reducing KV-cache usage by ~6x.

Architecture:

  • 309B total parameters (sparse MoE), 15B active per token
  • Hybrid attention with 5:1 SWA-to-Global ratio and 128-token window (~6x KV-cache reduction)
  • Multi-Token Prediction: 0.33B parameters per block with dense FFNs
  • Trained on 27T tokens with FP8 mixed precision
  • Optimized for efficient inference without sacrificing reasoning quality

Key Features:

  • 256K token context window
  • Ultra-fast inference with reduced memory footprint
  • Large-scale agentic RL training (100K+ verifiable code tasks)
  • Function calling, tool use, and structured JSON outputs
  • OpenAI-compatible API
  • MIT License (fully open-source)

Best For:

  • Low-latency applications requiring fast responses
  • High-throughput and high-volume workloads
  • Quick coding assistance and code completion
  • Budget-friendly AI integration
  • Real-time chat and interactive applications
  • Edge and resource-constrained deployments

Capabilities

Text To Text
Context256K tokens
Max Output33K tokens

Parameters

temperature

Controls randomness (0-2). Default: 0.7

00.30.50.71.01.52.0
top_p

Nucleus sampling (0-1). Default: 0.9

0.10.50.70.90.951.0
max_tokens

Max tokens to generate (1-32768)

51210242048409681921638432768
frequency_penalty

Reduce token repetition (0-2). Default: 0

00.30.50.71.01.52.0
presence_penalty

Penalize repeated topics (0-2). Default: 0

00.30.50.71.01.52.0
response_format

Output format

textjson_object

Code Examples

curl -X POST https://api.hinow.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $HINOW_API_KEY" \
  -d '{
    "model": "xiaomi/mimo-v2-flash",
    "messages": [
      {"role": "user", "content": "Hello! How are you?"}
    ],
    "parameters": {
      "temperature": "0",
      "top_p": "0.1",
      "max_tokens": "512",
      "frequency_penalty": "0",
      "presence_penalty": "0",
      "response_format": "text"
    }
  }'