← All models
Moonshot AI · mit
Kimi K2.5
Kimi K2.5 by Moonshot AI appears in 1 source with Reasoning at 49.83. Best read for Code quality, Multimodal, Open weights.
CreatorMoonshot AI
Release date2026-01-27
Knowledge cutoffNot published
ContextUnknown
Input priceUnknown
Output priceUnknown
Modalitytext + vision
CountryCN
Metrics
All source-backed metrics
LLM Stats Rank15ranking · source_rank
Reasoning49.83reasoning · index_reasoning
Math47.55math · index_math
Coding31.45coding · index_code
Research29.97research · index_search
Vision35.3multimodal · index_vision
Tool calling13.7tool_calling · index_tool_calling
Long context38.17long_context · index_long_context
Finance46.8domain · index_finance
Legal46.79domain · index_legal
Healthcare47.98domain · index_healthcare
GPQA87.6 %reasoning · gpqa_score
AIME 202596.1 %math · aime_2025_score
SWE-bench Verified76.8 %coding · swe_bench_verified_score
Code Arena1,462.16 %coding · coding_arena_score
Humanity Last Exam50.2 %reasoning · hle_score
MMMU-Pro78.5 %multimodal · mmmu_pro_score
BrowseComp74.9 %research · browsecomp_score
SciCode48.7 %coding · scicode_score
SWE-bench Pro50.7 %coding · swe_bench_pro_score
CharXiv-R77.5 %multimodal · charxiv_r_score
Parameters1,000,000,000,000 paramsmodel · params
Evidence
Citations and source overlap
FAQ
How should I read this profile?
Treat this as a source-backed model dossier, not an EvalKit-run verification. The public values are replicated from linked sources and kept source-scoped.
Is Kimi K2.5 verified by EvalKit?
No. EvalKit currently shows 0 verified rows until real run evidence exists.
Why can metrics disagree?
Different sources test different tasks, dates, prompts, and aggregation methods. EvalKit keeps those differences visible instead of merging them into a fake universal score.