All news RESEARCH

Anthropic ships the Claude 3 family and reclaims a credible frontier position

Three models, one set of benchmarks, and the first sustained challenge to OpenAI's perceived capability lead since GPT-4. The most consequential model in the line-up turned out not to be the headline one.

MONDAY, 4 MARCH 2024 By The AI Desk

Anthropic ships the Claude 3 family and reclaims a credible frontier position

On 4 March 2024, Anthropic released the Claude 3 model family. The line-up was deliberate: Haiku at the speed-and-cost end, Sonnet in the working-model middle, Opus at the capability frontier. The launch post led with benchmark numbers showing Opus competitive with or exceeding GPT-4 on most public evaluations, including MMLU, GSM8K, MATH, HumanEval and the Hellaswag commonsense suite.

What the benchmarks did not capture was the more durable shift. As reported by Wired and The Verge in the days following launch, Sonnet at its midrange price point became the working model for a large set of teams that had been on GPT-3.5 or GPT-4. The price-performance line, not the absolute frontier, was what changed. Opus got the headlines; Sonnet got the deployments.

Three architectural columns, abstract — Haiku, Sonnet, Opus: the three-tier line-up that has since become standard.Photo: An. K. / Unsplash

Benchmark scores at launch

Selected benchmark scores (March 2024)

headline percent / pass-at-1, public reported

MMLU 5-shot scores, as reported by Anthropic on the launch page. Comparable but not identical evaluation harnesses.

Claude 3 also introduced what Anthropic called "vision" capabilities across the family. Image inputs worked at a quality similar to GPT-4V's. Tool calling, in the structured-JSON form that had become an emerging API standard, landed in a beta a fortnight later.

Opus got the headlines. Sonnet got the deployments.

The release pattern, three differentiated models on the same architecture rather than a single flagship, has since become the default. OpenAI followed with the GPT-4o-mini split in mid-2024. Google's Gemini 1.5 family adopted the same shape. By 2026, frontier-lab releases without an explicit price-performance tier are the exception, not the rule.

Originally reported by Anthropic (Anthropic) on 4 March 2024. Read the original report →

← Previous

EU negotiators emerge from a 38-hour trilogue with a political agreement on the AI Act

Meta releases Llama 3 and the open-weight thesis gets its strongest evidence yet

Anthropic ships the Claude 3 family and reclaims a credible frontier position

Benchmark scores at launch

Discussion

The AI Desk, in your inbox.

More from RESEARCH