OpenAI launches o1 and reasoning becomes a paid product category
A new model family with chain-of-thought reasoning baked in at inference time, priced as a separate tier, and benchmarked on tests no large language model had previously cleared.

On 12 September 2024, OpenAI launched o1, the first model in a new family separate from the GPT line. The model, internally referred to as Strawberry during its development, was trained to perform reinforcement-learning-driven chain-of-thought reasoning before producing an answer. The compute cost of inference rose accordingly. The output quality, on tests requiring multi-step problem solving, rose by a substantially larger factor.
The headline result, reported in the launch post and corroborated independently by the Stanford-led Holistic Evaluation of Language Models update later that month, was on the AIME 2024 mathematics olympiad qualifier. o1-preview cleared 83 per cent of the problems, against GPT-4o's 12 per cent. On the Codeforces competitive-programming benchmark, o1 reached an Elo rating in the upper Codeforces master range, putting it above the eighty-fifth percentile of human competitors. As Wired and Ars Technica noted in coverage that week, no language model had previously broken the human-expert ceiling on either benchmark.
Pricing and product positioning
The pricing was the news, more than the capability. o1-preview was set at fifteen US dollars per million input tokens and sixty on output, against GPT-4o's two-fifty in and ten out. The full o1 model, available only through ChatGPT Plus and the API at higher rate-limit tiers, was priced higher still. As The Information detailed the following week, the price point was deliberately set to position reasoning as a separate paid tier, not as a free upgrade.
Inference-time compute as a paid axis is now the default model of the field. o1 was the first product to make that explicit.
Within ninety days, Anthropic had announced extended-thinking support in its existing Claude 3.5 Sonnet via a beta API parameter, and Google had previewed reasoning capabilities in Gemini. By February 2025, Anthropic's Claude 3.7 Sonnet had collapsed reasoning into the standard model with an inference-time toggle. By August 2025, OpenAI's GPT-5 had done the same. The o1 launch is now studied as a year-long inflection point: reasoning shipped as a paid tier, then as a feature, then as a default.



Discussion