The AI Desk
MONDAY, 4 MARCH 2024 From the desk of Amit Singhal Vol. I · The ChatGPT Era
All news RESEARCH

Anthropic ships the Claude 3 family and reclaims a credible frontier position

Three models, one set of benchmarks, and the first sustained challenge to OpenAI's perceived capability lead since GPT-4. The most consequential model in the line-up turned out not to be the headline one.

Anthropic ships the Claude 3 family and reclaims a credible frontier position

On 4 March 2024, Anthropic released the Claude 3 model family. The line-up was deliberate: Haiku at the speed-and-cost end, Sonnet in the working-model middle, Opus at the capability frontier. The launch post led with benchmark numbers showing Opus competitive with or exceeding GPT-4 on most public evaluations, including MMLU, GSM8K, MATH, HumanEval and the Hellaswag commonsense suite.

What the benchmarks did not capture was the more durable shift. As reported by Wired and The Verge in the days following launch, Sonnet at its midrange price point became the working model for a large set of teams that had been on GPT-3.5 or GPT-4. The price-performance line, not the absolute frontier, was what changed. Opus got the headlines; Sonnet got the deployments.

Three architectural columns, abstract
Haiku, Sonnet, Opus: the three-tier line-up that has since become standard.Photo: An. K. / Unsplash

Benchmark scores at launch

Selected benchmark scores (March 2024)
headline percent / pass-at-1, public reported
Claude 3 Opus 86.8 % GPT-4 (Mar 2023) 86.4 % Gemini 1.0 Ultra 83.7 % Claude 3 Sonnet 79 % Claude 3 Haiku 75.2 %
MMLU 5-shot scores, as reported by Anthropic on the launch page. Comparable but not identical evaluation harnesses.

Claude 3 also introduced what Anthropic called "vision" capabilities across the family. Image inputs worked at a quality similar to GPT-4V's. Tool calling, in the structured-JSON form that had become an emerging API standard, landed in a beta a fortnight later.

Opus got the headlines. Sonnet got the deployments.

The release pattern, three differentiated models on the same architecture rather than a single flagship, has since become the default. OpenAI followed with the GPT-4o-mini split in mid-2024. Google's Gemini 1.5 family adopted the same shape. By 2026, frontier-lab releases without an explicit price-performance tier are the exception, not the rule.

Originally reported by Anthropic (Anthropic) on 4 March 2024. Read the original report →
← Previous
EU negotiators emerge from a 38-hour trilogue with a political agreement on the AI Act
Next →
Meta releases Llama 3 and the open-weight thesis gets its strongest evidence yet

Discussion

Email used only for your avatar. Never shown, never stored in plain text.