MiMo v2 Flash
Model Information
Display Name: MiMo v2 Flash
API Model ID: xiaomi/mimo-v2-flash
Category: Text To Text
Description: MiMo v2 Flash is Xiaomi's fastest and most cost-effective reasoning model. With 309 billion total parameters and 15 billion active per token, it is optimized for efficient inference through hybrid attention and multi-token prediction, reducing KV-cache usage by ~6x. **Architecture:** - 309B total parameters (sparse MoE), 15B active per token - Hybrid attention with 5:1 SWA-to-Global ratio and 128-token window (~6x KV-cache reduction) - Multi-Token Prediction: 0.33B parameters per block with dense FFNs - Trained on 27T tokens with FP8 mixed precision - Optimized for efficient inference without sacrificing reasoning quality **Key Features:** - 256K token context window - Ultra-fast inference with reduced memory footprint - Large-scale agentic RL training (100K+ verifiable code tasks) - Function calling, tool use, and structured JSON outputs - OpenAI-compatible API - MIT License (fully open-source) **Best For:** - Low-latency applications requiring fast responses - High-throughput and high-volume workloads - Quick coding assistance and code completion - Budget-friendly AI integration - Real-time chat and interactive applications - Edge and resource-constrained deployments
Context Window: 256,000 tokens
Max Output: 32,768 tokens
How to Use This Model
To use MiMo v2 Flash via the HInow.ai API, use the model ID: xiaomi/mimo-v2-flash
API Request Example (Chat/Text)
POST https://api.hinow.ai/v1/chat/completions
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json
{
"model": "xiaomi/mimo-v2-flash",
"messages": [
{"role": "user", "content": "Your message here"}
]
}
Pricing
- input: $0.13
- output: $0.39
Available Parameters
- temperature: Controls randomness (0-2). Default: 0.7 (Options: 0, 0.3, 0.5, 0.7, 1.0, 1.5, 2.0)
- top_p: Nucleus sampling (0-1). Default: 0.9 (Options: 0.1, 0.5, 0.7, 0.9, 0.95, 1.0)
- max_tokens: Max tokens to generate (1-32768) (Options: 512, 1024, 2048, 4096, 8192, 16384, 32768)
- frequency_penalty: Reduce token repetition (0-2). Default: 0 (Options: 0, 0.3, 0.5, 0.7, 1.0, 1.5, 2.0)
- presence_penalty: Penalize repeated topics (0-2). Default: 0 (Options: 0, 0.3, 0.5, 0.7, 1.0, 1.5, 2.0)
- response_format: Output format (Options: text, json_object)
Quick Reference
To use this model, set: "model": "xiaomi/mimo-v2-flash"
Featured: No
Documentation: https://hinow.ai/models/xiaomi/mimo-v2-flash
API Endpoint: https://api.hinow.ai/v1


