← All models

OpenAI · proprietary

GPT-5.4 mini

GPT-5.4 mini by OpenAI appears in 1 source with Reasoning at 45.79. Best read for LLM, Low cost, Multimodal.

CreatorOpenAI
Release date2026-03-17
Knowledge cutoffNot published
Context400K tokens
Input price$0.75/M tokens
Output price$4.5/M tokens
Modalitytext + vision
CountryUS

Metrics

All source-backed metrics

Open in leaderboard
LLM Stats Rank17ranking · source_rank
Reasoning45.79reasoning · index_reasoning
Math35.73math · index_math
Coding34coding · index_code
Writing22.19writing · index_communication
Vision28.58multimodal · index_vision
Tool calling23.87tool_calling · index_tool_calling
Long context19.84long_context · index_long_context
Healthcare37.74domain · index_healthcare
GPQA88 %reasoning · gpqa_score
Code Arena1,388.34 %coding · coding_arena_score
Humanity Last Exam28.2 %reasoning · hle_score
MMMU-Pro76.6 %multimodal · mmmu_pro_score
Toolathlon42.9 %tool_calling · toolathlon_score
MCP Atlas57.7 %tool_calling · mcp_atlas_score
MRCR v233.6 %long_context · mrcr_v2_score
SWE-bench Pro54.4 %coding · swe_bench_pro_score
Context400,000 tokenscontext · context
Speed283.13 c/sperformance · throughput
Latency608.17 msperformance · latency
Input price0.75 $/Mpricing · input_price
Output price4.5 $/Mpricing · output_price

Evidence

Citations and source overlap

FAQ

How should I read this profile?

Treat this as a source-backed model dossier, not an EvalKit-run verification. The public values are replicated from linked sources and kept source-scoped.

Is GPT-5.4 mini verified by EvalKit?

No. EvalKit currently shows 0 verified rows until real run evidence exists.

Why can metrics disagree?

Different sources test different tasks, dates, prompts, and aggregation methods. EvalKit keeps those differences visible instead of merging them into a fake universal score.