LLM intelligence hub

Track the frontier of LLMs without losing the evidence.

EvalKit brings public leaderboards, model profiles, benchmark guides, and curated LLM news into one citation-first workspace.

Leaderboard rows464
Model profiles410
Public sources8
Verified claims0

Current leaders

Start with the models people are already comparing.

Open full explorer →

LLM news

Curated releases, research, and benchmark shifts.

Read news →

Infuriating Google commercial imagines the founding fathers embracing AI

"Group project, but make it 1776." That's how a new commercial for Google Workspace opens. And things only get cringier from there. The clip imagines what it would be like if the founding fathers turned to Google's collaboration tools and...

The Verge AISource

Alibaba reportedly bans employees from using Claude Code

Alibaba has reportedly classified Claude Code as high-risk software.

TechCrunch AISource

Midjourney wants Hollywood studios to reveal the details of their AI usage

As part of an ongoing legal dispute with three Hollywood studios, Midjourney is seeking to compel those studios to reveal how they use AI themselves.

TechCrunch AISource

New Google commercial imagines a Declaration of Independence written with help from AI

Two hundred and fifty years after the signing of the Declaration of Independence, a new commercial asks: What if the Founding Fathers had access to Google Workspace?

TechCrunch AISource

Benchmark guide

Read scores like a product decision, not a scoreboard.

Open guide →

Trust policy

No fake “tested by us” claims.

Public rows are labeled as replicated public-source data or editorial context. “Verified by EvalKit” stays at 0 until there is real run evidence attached.

Read citation policy