Project Peak
Studies

Controlled comparisons

Each study holds everything fixed but one variable and shows how it moves the result. The contenders are ranked by the hardest step each one holds, with a written analysis.

4 levels model analysis

The GPT-5 family, ranked by capability

How the GPT-5 variants compare on the capability ceiling for long-context code reasoning, from nano up to the full model and the Codex variant. Same instrument and scoring; only the model changes.

Read the study →
4 levels reasoning effort analysis

gpt-5-nano · reasoning effort

Same model and task; only the reasoning-effort setting changes. It shows how much long-context capability gpt-5-nano gains from more reasoning, from minimal (which cannot do the task at all) up to high.

Read the study →