The AI Desk
WEDNESDAY, 26 FEBRUARY 2025 From the desk of Amit Singhal Vol. I · The ChatGPT Era
All news RESEARCH

Anthropic introduces Claude 3.7 Sonnet and reasoning stops being a separate product tier

Hybrid-thinking is the design choice that mattered. The model can extend its reasoning at inference time on a setting, and the field's product structure quietly resets.

Anthropic introduces Claude 3.7 Sonnet and reasoning stops being a separate product tier

On 26 February 2025, Anthropic released Claude 3.7 Sonnet. The model upgrade itself was steady-state: incremental gains across coding, mathematics and reasoning benchmarks. The structural feature was new. Claude 3.7 Sonnet was the first frontier model to ship hybrid thinking, a single model with two operating modes: standard inference, and extended thinking, which the API caller (or the Claude.ai user) could toggle. Extended thinking allocated additional inference compute to a chain-of-thought process and produced visibly better answers on hard problems.

The pricing kept the Sonnet middle-tier price unchanged, with extended-thinking outputs billed at the same rate. As reported by The Information and Wired in coverage that week, this was a deliberate choice. The o1-style model split, where reasoning was a separate paid model, was undone in a single product decision.

Why hybrid thinking won

The structural argument, articulated by Anthropic chief executive Dario Amodei in a same-day post, was that reasoning was not a different model so much as a different setting on the same model. Maintaining separate model tiers for reasoning, in this view, was a transitional artifact of how the field had productised the early reinforcement-learning-from-chain-of-thought work. The simpler design was a single model with a depth-of-reasoning toggle.

Coding benchmarks, February 2025
SWE-bench Verified, percent solved
Claude 3.7 Sonnet (extended) 70.3 % Claude 3.7 Sonnet 49 % OpenAI o3-mini (high) 49.3 % Claude 3.5 Sonnet (new) 49 % GPT-4o 33.2 %
SWE-bench Verified scores at launch, public reporting.
Reasoning stopped being a separate tier and started being a setting. The field followed within six months.

The other frontier labs adopted the architecture across the second and third quarters of 2025. OpenAI's GPT-5 in August unified the o-line and GPT-line into a single model with internal reasoning routing. Google's Gemini 2.5 in mid-2025 followed a comparable pattern. By the end of 2025, separate reasoning-tier model SKUs had effectively disappeared from frontier-lab product lines. The Claude 3.7 release, in retrospect, was the moment that consolidation began.

Originally reported by Anthropic (Anthropic) on 26 February 2025. Read the original report →
← Previous
Paris hosts the third AI summit and the regulatory consensus that emerged at Bletchley quietly fractures
Next →
OpenAI releases GPT-4.1 and the model-naming chapter starts to look ridiculous

Discussion

Email used only for your avatar. Never shown, never stored in plain text.