The AI Desk
FRIDAY, 12 JUNE 2026 From the desk of Amit Singhal Vol. I · The ChatGPT Era
All news

Research AI news

AI research news, model releases, papers, and the capability shifts worth tracking.

Anthropic ships Claude 4.7 with a one-million-token context window for Sonnet and Opus
RESEARCH

Anthropic ships Claude 4.7 with a one-million-token context window for Sonnet and Opus

A million tokens of context across the Sonnet and Opus tiers, agentic-task improvements, and the first publicly-available frontier model that can hold an entire mid-sized codebase in working memory.

16 APR 2026 CNBC 7 comments · 12 likes
Anthropic releases Claude 4.5 Sonnet and the working frontier shifts again
RESEARCH

Anthropic releases Claude 4.5 Sonnet and the working frontier shifts again

An incremental version-bump in the naming, a non-incremental capability shift in the substance. Claude 4.5 Sonnet's coding-benchmark numbers extended Anthropic's lead on agentic software work.

29 SEPT 2025 Anthropic 7 comments · 13 likes
Anthropic ships Claude 4 Opus and Sonnet, and the coding-benchmark frontier moves a non-trivial amount
RESEARCH

Anthropic ships Claude 4 Opus and Sonnet, and the coding-benchmark frontier moves a non-trivial amount

A two-model release, headline numbers on SWE-bench Verified above seventy per cent, and a Claude Code product that turned the model into a developer-tool primitive.

22 MAY 2025 Anthropic 7 comments · 30 likes
Anthropic introduces Claude 3.7 Sonnet and reasoning stops being a separate product tier
RESEARCH

Anthropic introduces Claude 3.7 Sonnet and reasoning stops being a separate product tier

Hybrid-thinking is the design choice that mattered. The model can extend its reasoning at inference time on a setting, and the field's product structure quietly resets.

26 FEB 2025 Anthropic 6 comments · 27 likes
OpenAI's o3 announcement clears the ARC-AGI benchmark and the field's headline metric is briefly retired
RESEARCH

OpenAI's o3 announcement clears the ARC-AGI benchmark and the field's headline metric is briefly retired

A model not yet shipping, a benchmark designed to be hard, and a per-query inference cost that became the most-discussed number in AI research that month.

20 DEC 2024 OpenAI 7 comments · 34 likes
OpenAI launches o1 and reasoning becomes a paid product category
RESEARCH

OpenAI launches o1 and reasoning becomes a paid product category

A new model family with chain-of-thought reasoning baked in at inference time, priced as a separate tier, and benchmarked on tests no large language model had previously cleared.

12 SEPT 2024 OpenAI 7 comments · 9 likes
Meta releases Llama 3.1 405B and the open-weight frontier sits, briefly, at parity
RESEARCH

Meta releases Llama 3.1 405B and the open-weight frontier sits, briefly, at parity

An almost half-trillion-parameter model with downloadable weights, benchmark numbers competitive with GPT-4o and Claude 3.5, and a 92-page paper that left almost nothing about the recipe to the reader's imagination.

23 JUL 2024 Meta AI 7 comments · 15 likes
Meta releases Llama 3 and the open-weight thesis gets its strongest evidence yet
RESEARCH

Meta releases Llama 3 and the open-weight thesis gets its strongest evidence yet

Two model sizes, a permissive licence, and benchmarks competitive with closed-API leaders from a year earlier. Meta's bet on open-weights stops looking eccentric.

18 APR 2024 Meta 7 comments · 9 likes
Anthropic ships the Claude 3 family and reclaims a credible frontier position
RESEARCH

Anthropic ships the Claude 3 family and reclaims a credible frontier position

Three models, one set of benchmarks, and the first sustained challenge to OpenAI's perceived capability lead since GPT-4. The most consequential model in the line-up turned out not to be the headline one.

4 MAR 2024 Anthropic 7 comments · 11 likes
GPT-4 arrives with a paper that says less than its capabilities suggest
RESEARCH

GPT-4 arrives with a paper that says less than its capabilities suggest

OpenAI's flagship model passed the bar exam, scored in the top ten per cent on the LSAT and looked at images. The accompanying technical report declined to say how big it was, what it was trained on, or how the safety work was done.

14 MAR 2023 OpenAI 10 comments · 13 likes