EvalKit

LLM Leaderboard

Compare LLMs on reasoning, coding, speed, context, and price. Every row is citation-backed — sourced from public benchmarks and traceable to its origin.

How EvalKit scores work

The EvalKit Score is a composite of reasoning (40%), coding (35%), and agent capability (15%), blended across public benchmark data. Rows without a direct public metric show an estimate — marked with ~ — derived from provider or model-family medians. Source citations accompany every row.

Weekly refresh 8 public sourcesCitation policy

			Reasoning	Coding	Agent	Code arena	Context	Speed	Pricing $/M
#1	GPT-5.5OpenAICreated by OpenAI	64.1	62.3	51	75.3	1,948	1.1M	175/s	$5
#2	GPT-5.5 ProOpenAICreated by OpenAI	50.4	61.6	~26.1	~44.5	~1,613	~400K	~230/s	~$1.25
#3	GPT-5.5 InstantOpenAICreated by OpenAI	48.8	42.5	43.4	~43.7	1,361	400K	227/s	$5
#4	GPT-5.5 (xhigh)OpenAICreated by OpenAI	39.7	~38.6	~26.1	~34.1	~1,613	~400K	~230/s	~$1.25
#5	GPT-5.5 (high)OpenAICreated by OpenAI	39.7	~38.6	~26.1	~34.1	~1,613	~400K	~230/s	~$1.25
#6	GPT-5.5 Thinking xHigh EffortOpenAICreated by OpenAI	39.7	~38.6	~26.1	~34.1	~1,613	~400K	~230/s	~$1.25
#7	gpt-5.5-highOpenAICreated by OpenAI	39.7	~38.6	~26.1	~34.1	1,469	~400K	~230/s	~$1.25
#8	gpt-5.5OpenAICreated by OpenAI	39.7	~38.6	~26.1	~34.1	1,463	~400K	~230/s	~$1.25
#9	gpt-5.5-instantOpenAICreated by OpenAI	39.7	~38.6	~26.1	~34.1	1,418	~400K	~230/s	~$1.25
#10	GPT-5.5OpenAICreated by OpenAI	39.7	~38.6	~26.1	~34.1	~1,613	~400K	~230/s	~$1.25
#11	GPT-5.5 ProOpenAICreated by OpenAI	39.7	~38.6	~26.1	~34.1	~1,613	~400K	~230/s	~$1.25

Reasoning

Coding

Agent

Code arena

Context

Speed

Pricing $/M

License

GPT-5.5OpenAICreated by OpenAI

64.1

62.3

75.3

1,948

1.1M

175/s

GPT-5.5 ProOpenAICreated by OpenAI

50.4

61.6

~26.1

~44.5

~1,613

~400K

~230/s

~$1.25

GPT-5.5 InstantOpenAICreated by OpenAI

48.8

42.5

43.4

~43.7

1,361

400K

227/s

GPT-5.5 (xhigh)OpenAICreated by OpenAI

39.7

~38.6

~26.1

~34.1

~1,613

~400K

~230/s

~$1.25

GPT-5.5 (high)OpenAICreated by OpenAI

39.7

~38.6

~26.1

~34.1

~1,613

~400K

~230/s

~$1.25

GPT-5.5 Thinking xHigh EffortOpenAICreated by OpenAI

39.7

~38.6

~26.1

~34.1

~1,613

~400K

~230/s

~$1.25

gpt-5.5-highOpenAICreated by OpenAI

39.7

~38.6

~26.1

~34.1

1,469

~400K

~230/s

~$1.25

gpt-5.5OpenAICreated by OpenAI

39.7

~38.6

~26.1

~34.1

1,463

~400K

~230/s

~$1.25

gpt-5.5-instantOpenAICreated by OpenAI

39.7

~38.6

~26.1

~34.1

1,418

~400K

~230/s

~$1.25

#10

GPT-5.5OpenAICreated by OpenAI

39.7

~38.6

~26.1

~34.1

~1,613

~400K

~230/s

~$1.25

#11

GPT-5.5 ProOpenAICreated by OpenAI

39.7

~38.6

GPT-5.5

OpenAI · Created by OpenAI · RAG

Score39.7

Reasoning~38.6

Coding~26.1

Agent~34.1

Replicated from public sourceVellum

#11

GPT-5.5 Pro

OpenAI · Created by OpenAI · RAG

Score39.7

Reasoning~38.6

Coding~26.1

Agent~34.1

Replicated from public sourceVellum