All posts
18 FEBRUARY 2024 · · 6 MIN

Sora and the quiet economics of generated video

Sora and the quiet economics of generated video
The announcement of Sora, and the broader wave of state-of-the-art video generation models, has been framed primarily as a creativity story. The output is impressive. The first reaction of most viewers is astonishment. Sitting with it for longer, though, the creative angle is less interesting than the economic one. What's being quietly restructured is the market for visual content, and the specific markets affected are larger than the public discussion acknowledges.

The immediate-effect markets are three. Stock footage - the library of B-roll that sits behind 60-70% of corporate video, advertising, and documentary work. Drone operation - the professional services market for aerial imagery of specific locations. Videography - the day-rate work of producing visual content for commercial clients. All three markets have been structured around scarcity. Generative video collapses the scarcity assumption. The market structure that assumes scarcity will therefore either adapt or disappear, and the adaptation is already underway in ways that haven't yet shown up in public announcements.

The stock footage repricing

Stock footage libraries - Getty Images, Shutterstock, Pond5, and a long tail of smaller players - have pricing models that assume the library's content is expensive to produce and therefore scarce enough to charge for. A generic sunset over a city skyline has been a licensable asset with a three-figure price tag. Sora-class models can produce a comparable asset in thirty seconds. The same asset, generated on demand, has a marginal cost of fractions of a penny. The price structure that survives this transition is not the same as the one entering it.

The strategic responses being negotiated behind closed doors are variations on the same theme. Either the libraries become AI-first aggregators - training on their own libraries to produce licensed generative variants that preserve the rights relationships - or they lose substantial market share to users who bypass them entirely. The valuation implications are significant and mostly unpriced. Getty's commercial conversations with AI labs over the last eighteen months have been a leading indicator of how this is actually going. Public market analysts are still catching up.

The training data underbelly

The less-discussed part of the story is the provenance of the training data that makes Sora-class models possible. Training state-of-the-art video generation requires enormous volumes of labelled video. The provenance of that data is murkier than for text. Several pending copyright cases in the US, UK, and EU are working through what the legal status of video training datasets actually is, and the rulings will reshape which business models are viable.

The interesting outcome is that whoever wins the legal fights, the winning model providers will end up with contractual relationships with major video owners - studios, agencies, broadcasters. The cost of training a truly competitive video model will rise accordingly, and the concentration of the market will increase. The open-source equivalent of Sora will arrive, but it will arrive with noticeably smaller training sets and visibly different output quality. The frontier-model providers will consolidate their position specifically through data partnerships, which is the same pattern we're seeing in text-model training.

The trust-critical vs trust-neutral split

User-research data on AI-generated video detection is worth paying attention to. Currently, users distinguish AI video from real video with about 85% accuracy after some exposure. That will decline as models improve. The commercial implication is that generated video is rapidly splitting into two categories with quite different economics: trust-neutral (advertising, entertainment, creative work) where the authenticity of the source doesn't materially affect the value, and trust-critical (news, legal evidence, documentary) where it does.

The trust-neutral market will absorb AI video quickly because there's no structural objection. The trust-critical market is already confronting the question of when AI-generated content is acceptable and what disclosure is required. EU regulators are converging on mandatory labelling for trust-critical uses. US and UK frameworks are slower but moving in the same direction. The specific moment I'm watching for is when a major news organisation uses AI-generated video - labelled or not - in standard journalism. That event will set the norms faster than any regulator. It hasn't happened yet. It will, almost certainly, within the next year.

The creative angle is the headline. The cost-reduction angle is the business model.

The emerging-market step change

Geographic unevenness matters here. For founders and small businesses in markets without affordable professional video production, Sora-class models are a categorically new capability. A product demo video that would have cost several thousand dollars to produce in a Lagos or Manila studio can now be produced for a few dollars of compute. That's not an incremental cost reduction. It's a step change in what's economically viable for a small company with a small budget.

The competitive implication is that markets with weaker traditional production infrastructure will adopt generative video faster, because the alternative wasn't affordable anyway. The early commercial use cases at scale will come from markets that public commentators don't usually watch. This has happened before with mobile payments, with cloud deployment, and with certain kinds of e-commerce infrastructure. AI video is likely to follow the same pattern, and by the time the pattern becomes visible to northern hemisphere analysts, the usage base in other markets will already be large.

The industrial application nobody's discussing

The application I'd pay attention to is not consumer or creative. It's synthetic training data for computer vision. Sora-class models, applied to generating plausible labelled video at scale, change the economics of training perception systems for robotics, autonomous vehicles, industrial inspection, and any domain that requires expensive real-world data collection. The total addressable market for synthetic training data is larger than the creative-industry market by several orders of magnitude. It also has less glamour and therefore less media coverage, which is why it gets mentioned in the twelfth paragraph of most analysis pieces rather than the first.

The calibrated view

The capability deserves the attention it's getting. The specific model will be surpassed within eighteen months. The pattern it enables will be standard within three years. All three are true simultaneously. The 'forever changed' framing is the part I'd hedge against, because it's been deployed for every major AI capability for three years running, and the forever-changed claims have a mixed track record. What's worth watching is the specific markets being restructured underneath, because those restructurings are permanent and they're happening whether or not the hype cycle is accurate about the exact timeline.

If you're in a market that depends on visual content production - or in a market that buys visual content as an input - the next eighteen months are the ones where your cost structure resets. The organisations that plan for this now will arrive at 2028 with margin intact. The ones that wait for certainty will arrive late, and the adjustment will be more painful than it needed to be. The capability is inevitable. The commercial response is the part that's still open, and the window for planning it is shorter than the announcement cadence suggests.

← Previous
If you must use ChatGPT, prompt it like you wrote it
Next →
Template first, content second (usually)

Discussion

Email used only for your avatar. Never shown, never stored in plain text.