Live translation arrives at OS scale - six years later

Apple's WWDC demo of system-level live translation was the highlight of the announcement. Watching it was strange for me, because six years ago I worked alongside Chenshu Yi and Arpan Nagdeve on the same functional capability at Zendesk. What we shipped then was genuinely useful, narrow in scope, and invisible to most of the world. What Apple shipped now is technically comparable in its core mechanics and wildly more visible because of where it sits. The gap between the two tells a story about how technical innovation actually reaches scale.

The Zendesk version, built for customer-support contexts, translated messages in real time between support agents and customers speaking different languages. It worked. It reduced resolution times on multilingual support tickets by a meaningful margin. It was deployed to thousands of organisations. It changed how some customer-service teams operated. It did not make headlines. The people building it were aware of the capability's reach but had no illusion about its cultural visibility. That was fine. The impact was real, even if the press wasn't.

The platform-access difference

The specific difference between what we shipped and what Apple is shipping is not the underlying technical capability. Real-time translation has been commercially reliable for five or six years now. The difference is the platform. A translation feature inside a customer-support workflow serves a specific user in a specific context. A translation feature inside the operating system of a billion devices serves everyone with access to the device in any context. The multiplication is not in the capability. It's in the distribution.

This is the pattern I've watched for two decades. Technical capability is usually invented multiple times in parallel across different companies. Each version operates within the constraints of its deployment environment, its users, and its company's ambitions. The version that reaches mass consciousness is almost never the first or technically best version. It's the one that gets access to the right distribution platform at roughly the right time. For consumer AI capabilities specifically, Apple and Google have a structural advantage that no smaller company can replicate, because they own the phone layer.

What's actually harder at OS scale

The engineering underneath OS-level translation is not trivially the same as application-level translation. At the OS layer, the feature has to handle context-switching between applications, preserve input focus and cursor position, manage privacy boundaries across app domains, integrate with system accessibility APIs, and make latency-sensitive decisions about when to route to cloud inference versus on-device. Each of these is substantial engineering work. The core 'translate this text' capability is the simplest part of the problem.

This is a useful calibration for anyone who watched the demo and concluded translation is a solved problem. The model is good. The integration is the hard part, and the integration is specifically what Apple and Google are positioned to do uniquely well because they own both the device and the operating system. Independent startups will continue to innovate at the application layer; the integration-at-OS-layer innovation will reliably be done by the platform owners. This is also why the most interesting enterprise deployments are moving toward operating-system-level AI access rather than application-layer AI tools.

The privacy dimension

Apple's pitch for OS-level translation is inseparable from its on-device processing story. For European enterprises specifically - banks, healthcare systems, government - cloud-based translation creates a cross-border data transfer problem that is difficult to resolve under current regulation. An on-device translation layer largely dissolves that concern. This is not a coincidence in positioning; it's architecture serving regulatory strategy. Apple has been building toward this kind of architectural position for five years, and the live-translation demo is one visible surface of that longer trajectory.

The practical implication for product teams is that OS-level translation differentially unlocks use cases that couldn't justify cloud-based translation. Customer-service workflows at banks. Healthcare multilingual communication. Legal translation in sensitive contexts. Enterprise call centres. These are substantial markets. They have been underserved by cloud translation specifically because of the regulatory friction, and the OS-level approach changes the calculus for each of them. That adoption wave is coming over the next eighteen months and it's broader than the consumer demo suggested.

The capability improvements are visible. The linguistic challenges underneath are not solved on a year's roadmap.

What linguistic AI still can't do

The demo glosses over the remaining failure modes, which are specific and important. Japanese to English translation still struggles with honorific mismatches and context-dependent pronouns - producing grammatically correct output that's socially inappropriate in subtle ways. Arabic to English maintains many of the same issues plus right-to-left interaction complications. Languages with fewer digital resources - Yoruba, Tamil, Khmer, most minority Indian languages - perform much worse than the demo languages, and this gap is not closing evenly with frontier model improvements.

For professional use of OS-level translation, the interesting product question is therefore when users trust the translation enough to act on it without checking. Research I've seen suggests that users currently trust it immediately in casual contexts (travel, small-talk, simple commerce) and still prefer to verify in professional contexts (contracts, medical information, financial decisions). The split is likely to narrow over time but the two categories are unlikely to converge fully. Product design that ignores this distinction produces tools that are either over-cautious for casual use or over-trusted for critical use.

The retrospective satisfaction

On a personal note, watching the demo was complicated. The emotional component is not quite pride - the work we did at Zendesk was narrower and served a specific audience that Apple's version doesn't serve the same way. It's more like validation of direction, tempered by recognition that timing and platform matter as much as insight. The people who ship capabilities to a billion users are not always the ones who first understood the capability. That's neither a complaint nor a regret. It's the shape of how technology actually arrives at scale.

The practical lesson, if there is one: if you're working on something that feels important but invisible today, the work is not wasted. The specific capability you're building is likely part of a pattern that will reach scale through a larger platform in five or ten years. Your technical conviction is almost certainly correct in direction. The timeline is longer, the distribution will probably not be yours, and the visible version will belong to a company that didn't exist when you started. None of this diminishes the value of the work you're doing now. It does change how you frame the emotional relationship to eventual public recognition of the capability, and it's worth making peace with early.

← Previous

I built a small app on my coffee break

Why I wrote The Decade Intelligence Changed