← All models
Alibaba Cloud / Qwen Team · proprietary
Qwen3.7 Max
Qwen3.7 Max by Alibaba Cloud / Qwen Team appears in 1 source with Reasoning at 60.25. Best read for Code quality, LLM, Long context.
CreatorAlibaba Cloud / Qwen Team
Release date2026-05-19
Knowledge cutoffNot published
Context1M tokens
Input price$1.25/M tokens
Output price$3.75/M tokens
Modalitytext
CountryCN
Metrics
All source-backed metrics
LLM Stats Rank25ranking · source_rank
Reasoning60.25reasoning · index_reasoning
Math54.6math · index_math
Coding47.86coding · index_code
Vision33.82multimodal · index_vision
Tool calling37.32tool_calling · index_tool_calling
Finance60.03domain · index_finance
Legal60.04domain · index_legal
Healthcare59.46domain · index_healthcare
GPQA92.4 %reasoning · gpqa_score
SWE-bench Verified80.4 %coding · swe_bench_verified_score
Code Arena1,212.62 %coding · coding_arena_score
Humanity Last Exam41.4 %reasoning · hle_score
MMMLU90.3 %reasoning · mmmlu_score
MCP Atlas76.4 %tool_calling · mcp_atlas_score
SciCode53.5 %coding · scicode_score
SWE-bench Pro60.6 %coding · swe_bench_pro_score
Context1,000,000 tokenscontext · context
Speed173.63 c/sperformance · throughput
Latency15,443.83 msperformance · latency
Input price1.25 $/Mpricing · input_price
Output price3.75 $/Mpricing · output_price
Evidence
Citations and source overlap
FAQ
How should I read this profile?
Treat this as a source-backed model dossier, not an EvalKit-run verification. The public values are replicated from linked sources and kept source-scoped.
Is Qwen3.7 Max verified by EvalKit?
No. EvalKit currently shows 0 verified rows until real run evidence exists.
Why can metrics disagree?
Different sources test different tasks, dates, prompts, and aggregation methods. EvalKit keeps those differences visible instead of merging them into a fake universal score.