The AI Desk
THURSDAY, 23 APRIL 2026 From the desk of Amit Singhal Vol. I · The ChatGPT Era
All news PRODUCTS

OpenAI releases GPT-5.5 and the reliability gap to Claude 4.7 finally closes

An incremental version-bump in the naming, a non-incremental shift in what the model can finish without supervision. SWE-bench scores at parity, agentic-task durations comparable, multi-modal capabilities broader.

OpenAI releases GPT-5.5 and the reliability gap to Claude 4.7 finally closes

On 23 April 2026, OpenAI released GPT-5.5. The product naming had, by this point, become so noisy that the launch post was structured primarily around two reliability claims rather than benchmark numbers. The first: SWE-bench Verified at 81.9 per cent, within a fifth of a per cent of Claude 4.7 Opus and effectively the same number to most observers. The second: average successful Operator-task duration of one hour and forty-six minutes, ahead of Anthropic's Dispatch-mediated agentic-task numbers from February.

The release closed the perceived gap that had opened in late January when Claude 4.7 took the public coding frontier above eighty per cent. As Wired and The Verge wrote in coverage that week, the timing was, again, not coincidental. The frontier-AI release calendar of late 2025 and early 2026 had become tightly choreographed, with each major release timed to respond to a specific competitive movement.

Where the actual differences sit, in 2026

Frontier-class reliability metrics, March 2026
Selected agentic and coding benchmarks
Claude 4.7 Opus (SWE-V) 82.4 % GPT-5.5 (SWE-V) 81.9 % Claude 4.7 Sonnet (SWE-V) 81.7 % Gemini 3 Pro DT (SWE-V) 73 % GPT-5.5 (OSWorld) 67 % Claude 4.7 (OSWorld) 71.5 %

On a head-to-head basis, the field had reached a state where capability differences between top models were, by the standards of most enterprise procurement, smaller than the noise from prompt engineering, model-specific routing, and the choice of evaluation harness. The procurement decisions of large customers, by the second half of Q1 2026, were starting to be made on integration, governance, latency-and-throughput, and pricing rather than on capability deltas.

Two-tenths of a per cent on a benchmark that did not exist in 2023. The frontier is now measured in fingernails.

The naming question, increasingly absurd by the time of GPT-5.5, was acknowledged in the launch post. OpenAI committed to retiring all numerical version naming with the next major release, which, as of the writing of this piece, has not yet been announced. Whether the next OpenAI flagship is called GPT-6 or something more product-like is, in mid-2026, a small and revealing window into how frontier-AI marketing is settling.

Originally reported by OpenAI (OpenAI) on 23 April 2026. Read the original report →
← Previous
Anthropic ships Claude Design and the AI-augmented design tool gets its first credible incumbent challenger
Next →
Elon Musk Files Lawsuit Against OpenAI Over Mission Allegations

Discussion

Email used only for your avatar. Never shown, never stored in plain text.