← All models
OpenAI · apache_2_0
GPT OSS 120B High
GPT OSS 120B High by OpenAI appears in 1 source with Reasoning at 31.76. Best read for Low cost, Math, Open weights.
CreatorOpenAI
Release date2025-08-05
Knowledge cutoffNot published
Context131K tokens
Input price$0.1/M tokens
Output price$0.5/M tokens
Modalitytext
CountryUS
Metrics
All source-backed metrics
LLM Stats Rank79ranking · source_rank
LLM Stats Code Index (estimated from arena)27.45 scoremetric
Reasoning31.76reasoning · index_reasoning
Math28.9math · index_math
Tool calling1.76tool_calling · index_tool_calling
Finance26.53domain · index_finance
Legal26.52domain · index_legal
Healthcare26.24domain · index_healthcare
GPQA80.9 %reasoning · gpqa_score
AIME 202592.5 %math · aime_2025_score
Code Arena718.47 %coding · coding_arena_score
MMMLU83.8 %reasoning · mmmlu_score
Context131,072 tokenscontext · context
Speed103.33 c/sperformance · throughput
Latency45,928.57 msperformance · latency
Input price0.1 $/Mpricing · input_price
Output price0.5 $/Mpricing · output_price
Parameters116,800,000,000 paramsmodel · params
Evidence
Citations and source overlap
FAQ
How should I read this profile?
Treat this as a source-backed model dossier, not an EvalKit-run verification. The public values are replicated from linked sources and kept source-scoped.
Is GPT OSS 120B High verified by EvalKit?
No. EvalKit currently shows 0 verified rows until real run evidence exists.
Why can metrics disagree?
Different sources test different tasks, dates, prompts, and aggregation methods. EvalKit keeps those differences visible instead of merging them into a fake universal score.