Benchmark
ARC-Challenge
reasoningGrade-school science reasoning benchmark. Tests common-sense reasoning and scientific knowledge on multiple-choice questions.
Interpretation
ARC-Challenge is a reasoning benchmark evaluating reasoning and problem-solving capabilities. It ranks 13 models from GPT-5.4 (98.5) to Llama 4 Maverick (95). This benchmark contributes to the reasoning scoring on model pages and rankings.
Methodology: 2,590 challenging science questions from grade-school exams. Requires both knowledge retrieval and reasoning.
Source: https://allenai.org/data/arc