← All models
Zhipu AI · mit
GLM-4.7
GLM-4.7 by Zhipu AI appears in 2 sources with Reasoning at 43.81. Best read for Code quality, Coding, High intelligence.
CreatorZhipu AI
Release date2025-12-22
Knowledge cutoffNot published
ContextUnknown
Input priceUnknown
Output priceUnknown
Modalitytext + vision
CountryCN
Metrics
All source-backed metrics
LLM Stats Rank40ranking · source_rank
Reasoning43.81reasoning · index_reasoning
Math43.9math · index_math
Coding22.42coding · index_code
Research16.86research · index_search
Vision29.81multimodal · index_vision
Tool calling11.89tool_calling · index_tool_calling
Finance36.18domain · index_finance
Legal36.16domain · index_legal
Healthcare35.94domain · index_healthcare
GPQA85.7 %reasoning · gpqa_score
AIME 202595.7 %math · aime_2025_score
SWE-bench Verified73.8 %coding · swe_bench_verified_score
Code Arena1,064.5 %coding · coding_arena_score
Humanity Last Exam42.8 %reasoning · hle_score
BrowseComp52 %research · browsecomp_score
Terminal Bench33.3 %coding · terminal_bench_score
Parameters358,000,000,000 paramsmodel · params
Arena Rating1,436.24metric
Arena Rank44metric
Vote Count12,136metric
Evidence
Citations and source overlap
FAQ
How should I read this profile?
Treat this as a source-backed model dossier, not an EvalKit-run verification. The public values are replicated from linked sources and kept source-scoped.
Is GLM-4.7 verified by EvalKit?
No. EvalKit currently shows 0 verified rows until real run evidence exists.
Why can metrics disagree?
Different sources test different tasks, dates, prompts, and aggregation methods. EvalKit keeps those differences visible instead of merging them into a fake universal score.