← All models
Anthropic · proprietary
Claude Sonnet 4.6
Claude Sonnet 4.6 by Anthropic appears in 2 sources with Reasoning at 52.16. Best read for Code quality, Coding, High intelligence.
CreatorAnthropic
Release date2026-02-17
Knowledge cutoffNot published
Context200K tokens
Input price$3/M tokens
Output price$15/M tokens
Modalitytext + vision
CountryUS
Metrics
All source-backed metrics
LLM Stats Rank8ranking · source_rank
Reasoning52.16reasoning · index_reasoning
Math44.18math · index_math
Coding36.51coding · index_code
Research24.59research · index_search
Writing35.55writing · index_communication
Vision33.25multimodal · index_vision
Tool calling28.76tool_calling · index_tool_calling
Long context26.42long_context · index_long_context
Finance37.5domain · index_finance
Legal35.41domain · index_legal
Healthcare14.15domain · index_healthcare
GPQA89.9 %reasoning · gpqa_score
SWE-bench Verified79.6 %coding · swe_bench_verified_score
Code Arena1,700.63 %coding · coding_arena_score
Humanity Last Exam49 %reasoning · hle_score
ARC-AGI v258.3 %reasoning · arc_agi_v2_score
MMMLU89.3 %reasoning · mmmlu_score
MMMU-Pro75.6 %multimodal · mmmu_pro_score
OSWorld72.5 %agent · osworld_score
BrowseComp74.7 %research · browsecomp_score
MCP Atlas61.3 %tool_calling · mcp_atlas_score
Context200,000 tokenscontext · context
Speed120.6 c/sperformance · throughput
Latency8,045.6 msperformance · latency
Input price3 $/Mpricing · input_price
Output price15 $/Mpricing · output_price
Arena Rating1,453.06metric
Arena Rank20metric
Vote Count19,224metric
Evidence
Citations and source overlap
FAQ
How should I read this profile?
Treat this as a source-backed model dossier, not an EvalKit-run verification. The public values are replicated from linked sources and kept source-scoped.
Is Claude Sonnet 4.6 verified by EvalKit?
No. EvalKit currently shows 0 verified rows until real run evidence exists.
Why can metrics disagree?
Different sources test different tasks, dates, prompts, and aggregation methods. EvalKit keeps those differences visible instead of merging them into a fake universal score.