Controlled comparisons
Each study holds everything fixed but one variable and shows how it moves the result. The contenders are ranked by the hardest step each one holds, with a written analysis.
4 levels model analysis
The GPT-5 family, ranked by capability
How the GPT-5 variants compare on the capability ceiling for long-context code reasoning, from nano up to the full model and the Codex variant. Same instrument and scoring; only the model changes.
Read the study → 4 levels reasoning effort analysis
gpt-5-nano · reasoning effort
Same model and task; only the reasoning-effort setting changes. It shows how much long-context capability gpt-5-nano gains from more reasoning, from minimal (which cannot do the task at all) up to high.
Read the study →