← All models
Anthropic · proprietary
Claude Opus 4.5
Claude Opus 4.5 by Anthropic appears in 2 sources with Reasoning at 54.19. Best read for Code quality, Coding, LLM.
CreatorAnthropic
Release date2025-11-24
Knowledge cutoffNot published
ContextUnknown
Input priceUnknown
Output priceUnknown
Modalitytext + vision
CountryUS
Metrics
All source-backed metrics
LLM Stats Rank9ranking · source_rank
Reasoning54.19reasoning · index_reasoning
Math43.46math · index_math
Coding39.76coding · index_code
Writing35.61writing · index_communication
Vision29.2multimodal · index_vision
Tool calling29.67tool_calling · index_tool_calling
Healthcare24.38domain · index_healthcare
GPQA87 %reasoning · gpqa_score
SWE-bench Verified80.9 %coding · swe_bench_verified_score
Code Arena1,613.89 %coding · coding_arena_score
ARC-AGI v237.6 %reasoning · arc_agi_v2_score
MMMLU90.8 %reasoning · mmmlu_score
OSWorld66.3 %agent · osworld_score
MCP Atlas62.3 %tool_calling · mcp_atlas_score
Speed22.77 c/sperformance · throughput
Latency792.75 msperformance · latency
SWE-Bench Verified80.9 %metric
Evidence
Citations and source overlap
FAQ
How should I read this profile?
Treat this as a source-backed model dossier, not an EvalKit-run verification. The public values are replicated from linked sources and kept source-scoped.
Is Claude Opus 4.5 verified by EvalKit?
No. EvalKit currently shows 0 verified rows until real run evidence exists.
Why can metrics disagree?
Different sources test different tasks, dates, prompts, and aggregation methods. EvalKit keeps those differences visible instead of merging them into a fake universal score.