Benchmark
GSM8K
reasoningGrade School Math 8K. Tests basic mathematical reasoning with 8,000 grade-school math word problems.
Interpretation
GSM8K is a reasoning benchmark evaluating reasoning and problem-solving capabilities. It ranks 15 models from GPT-5.4 (98.5) to Command R+ 2026 (88). This benchmark contributes to the reasoning scoring on model pages and rankings.
Methodology: 8,500 grade-school math word problems requiring multi-step reasoning. Standard benchmark for basic mathematical capability.