Benchmark

GSM8K

reasoning

Grade School Math 8K. Tests basic mathematical reasoning with 8,000 grade-school math word problems.

Interpretation

GSM8K is a reasoning benchmark evaluating reasoning and problem-solving capabilities. It ranks 15 models from GPT-5.4 (98.5) to Command R+ 2026 (88). This benchmark contributes to the reasoning scoring on model pages and rankings.

Methodology: 8,500 grade-school math word problems requiring multi-step reasoning. Standard benchmark for basic mathematical capability.

Source: https://github.com/openai/grade-school-math