← All models
DeepSeek · mit
DeepSeek-V4-Flash-Max
DeepSeek-V4-Flash-Max by DeepSeek appears in 1 source with Reasoning at 51.86. Best read for Code quality, Long context, Low cost.
CreatorDeepSeek
Release date2026-04-23
Knowledge cutoffNot published
Context1M tokens
Input price$0.14/M tokens
Output price$0.28/M tokens
Modalitytext
CountryCN
Metrics
All source-backed metrics
LLM Stats Rank36ranking · source_rank
Reasoning51.86reasoning · index_reasoning
Math51.2math · index_math
Coding37.6coding · index_code
Research23.7research · index_search
Vision31.04multimodal · index_vision
Tool calling26.87tool_calling · index_tool_calling
Long context7.57long_context · index_long_context
Finance35domain · index_finance
Legal35.14domain · index_legal
Healthcare44.17domain · index_healthcare
GPQA88.1 %reasoning · gpqa_score
SWE-bench Verified79 %coding · swe_bench_verified_score
Code Arena1,126.85 %coding · coding_arena_score
Humanity Last Exam45.1 %reasoning · hle_score
SimpleQA34.1 %knowledge · simpleqa_score
BrowseComp73.2 %research · browsecomp_score
Toolathlon47.8 %tool_calling · toolathlon_score
MCP Atlas69 %tool_calling · mcp_atlas_score
SWE-bench Pro52.6 %coding · swe_bench_pro_score
Context1,048,576 tokenscontext · context
Speed175.82 c/sperformance · throughput
Latency22,592.92 msperformance · latency
Input price0.14 $/Mpricing · input_price
Output price0.28 $/Mpricing · output_price
Parameters284,000,000,000 paramsmodel · params
Evidence
Citations and source overlap
FAQ
How should I read this profile?
Treat this as a source-backed model dossier, not an EvalKit-run verification. The public values are replicated from linked sources and kept source-scoped.
Is DeepSeek-V4-Flash-Max verified by EvalKit?
No. EvalKit currently shows 0 verified rows until real run evidence exists.
Why can metrics disagree?
Different sources test different tasks, dates, prompts, and aggregation methods. EvalKit keeps those differences visible instead of merging them into a fake universal score.