The AI Desk
THURSDAY, 12 SEPTEMBER 2024 From the desk of Amit Singhal Vol. I · The ChatGPT Era
All news RESEARCH

OpenAI launches o1 and reasoning becomes a paid product category

A new model family with chain-of-thought reasoning baked in at inference time, priced as a separate tier, and benchmarked on tests no large language model had previously cleared.

OpenAI launches o1 and reasoning becomes a paid product category

On 12 September 2024, OpenAI launched o1, the first model in a new family separate from the GPT line. The model, internally referred to as Strawberry during its development, was trained to perform reinforcement-learning-driven chain-of-thought reasoning before producing an answer. The compute cost of inference rose accordingly. The output quality, on tests requiring multi-step problem solving, rose by a substantially larger factor.

The headline result, reported in the launch post and corroborated independently by the Stanford-led Holistic Evaluation of Language Models update later that month, was on the AIME 2024 mathematics olympiad qualifier. o1-preview cleared 83 per cent of the problems, against GPT-4o's 12 per cent. On the Codeforces competitive-programming benchmark, o1 reached an Elo rating in the upper Codeforces master range, putting it above the eighty-fifth percentile of human competitors. As Wired and Ars Technica noted in coverage that week, no language model had previously broken the human-expert ceiling on either benchmark.

Chalkboard covered in mathematical equations
o1 cleared problem sets that GPT-4o failed at the tenth-percentile rate.Photo: Vitaly Gariev / Unsplash

Pricing and product positioning

AIME 2024 mathematics scores, September 2024
Per-problem pass rate, public reporting
o1 (full) 83.3 % o1-preview 56.7 % GPT-4o 13.4 % Claude 3.5 Sonnet 16 % Llama 3.1 405B 17.6 %
AIME pass-rate via OpenAI launch post and independent reproduction. Inference cost on o1 is 10-30x GPT-4o on equivalent prompts.

The pricing was the news, more than the capability. o1-preview was set at fifteen US dollars per million input tokens and sixty on output, against GPT-4o's two-fifty in and ten out. The full o1 model, available only through ChatGPT Plus and the API at higher rate-limit tiers, was priced higher still. As The Information detailed the following week, the price point was deliberately set to position reasoning as a separate paid tier, not as a free upgrade.

Inference-time compute as a paid axis is now the default model of the field. o1 was the first product to make that explicit.

Within ninety days, Anthropic had announced extended-thinking support in its existing Claude 3.5 Sonnet via a beta API parameter, and Google had previewed reasoning capabilities in Gemini. By February 2025, Anthropic's Claude 3.7 Sonnet had collapsed reasoning into the standard model with an inference-time toggle. By August 2025, OpenAI's GPT-5 had done the same. The o1 launch is now studied as a year-long inflection point: reasoning shipped as a paid tier, then as a feature, then as a default.

Originally reported by OpenAI (OpenAI) on 12 September 2024. Read the original report →
← Previous
The EU AI Act enters into force, and three years of implementation work begin in earnest
Next →
California's Governor Newsom vetoes SB 1047 and the most consequential US AI bill of the year dies short of his desk

Discussion

Email used only for your avatar. Never shown, never stored in plain text.