The AI quietly transforming video production

Key takeaway
- The next wave of AI-driven automation is reshaping what content teams produce, how they produce it, and who gets to produce it at all.
- Generative tools dominate the headlines. The bigger shift is the AI already embedded in production pipelines between a finished piece of content and the moment a viewer presses play.
- Four capabilities are running at production scale today. AI thumbnail selection, smart cropping for vertical reframing, in-pipeline translation and dubbing, and video intelligence for scene understanding.
- AI applies creative decisions across an entire catalog at a scale and consistency no human team can match. Improvements compound across every title, not just the ones with dedicated marketing teams.
- The competitive divide is shifting. Small platforms with lean teams now compete with global giants on a leveled technical field, and the work that remains is the work that was always supposed to be the job.
The piece below is from CEO Murad Mordukhay’s article “When Infrastructure Thinks,” published in the IAMT Journal Q1 2026 (page 48).
The AI conversation everyone is having is not the one changing production
Most of the conversation about AI in media today centers on generative tools: image creators, LLMs for copywriting, and text-to-video generators. That technology matters, and some of it will land in production within a few years.
The bigger shift is happening inside the infrastructure between a finished piece of content and the moment a viewer presses play. AI is already embedded in production pipelines across thousands of platforms and millions of assets, making creative decisions at a scale that no human team could match. The effects extend well beyond efficiency. AI is changing what is possible to create, who can create it, and where creative energy gets spent.
How AI is making creative decisions
The current generation of AI-driven video automation may be doing the same work faster in some cases. The bigger move happens when it takes your existing context, makes decisions that require creative judgment, and applies them to every asset in a library.
AI thumbnail selection
A model evaluates every candidate frame in a piece of content across composition, facial expression, color contrast, and visual distinctiveness, then ranks a shortlist for a human to approve.
The shift is what gets the treatment. A streaming platform with 10,000 titles in its library used to A/B test thumbnails for the top 200 and ship default frames for the rest, because optimizing the long tail by hand was never going to pencil. Automated AI thumbnail selection extends the same optimization to every title, and back-catalog watch-through moves in ways that show up in the retention dashboards.
Aspect ratio reframing
Object detection and subject tracking reframe a 16:9 source into 9:16 or 1:1 by following the salient region of each shot. The crop changes at cut boundaries instead of slow-panning across them, which is why the output reads as composed rather than mechanical.
Horizontal libraries convert to vertical short-form catalogs in days. A year ago the same conversion took months inside the major social platforms and a head count of editors to match. A sports rights holder with a 5,000-hour archive can now feed the entire library into a vertical short-form channel without commissioning a manual reframe per clip.
Translation and dubbing
ASR transcribes the source. Neural machine translation handles the script, and a voice synthesis model generates dubbed audio that preserves the prosody and emotional register of the original speaker.
The output is captioned, indexed for search, and ready for international distribution. A localization workflow that used to take months and a roster of voice talent collapses into a pipeline configuration. A platform launching in five new markets ships the same week the master lands, and “is this title worth localizing” stops being a gating decision at the project level and becomes a default at the pipeline level.
Video intelligence
The newest layer moves past object detection into scene understanding: tone, narrative beats, scene-to-scene relationships. The outputs are concrete production work, including automated highlight reels with chapter timestamps, content-aware recommendation features, and personalized clip assembly from a single source.
A live sports producer used to staff a clipping team to cut a highlight reel after the final whistle, and video intelligence assembles the reel as the game runs, with chapter markers ready for the player and the social handoff. A long-form news publisher gets a TikTok-length cut, a six-minute YouTube cut, and a sub-30-second push notification clip from the same source asset, each one cut around the moments most likely to retain that audience.
The new competitive divide
The infrastructure between a creative idea and an audience is becoming invisible. What remains is the work that was always supposed to be the job: understanding audiences, building content that resonates, and delivering it to every screen and every market at the quality viewers expect.
Small streaming platforms with lean teams are already competing with global giants for viewer attention, and more commonly every day they are winning real audience share. These stories about successful small teams will be much more common now that AI has leveled the technical field and their creative vision is strong enough to compete on it.
AI will continue to transform content production well beyond where it stands today. The opportunity is to capture the creative advantage this transformation makes possible.
“The infrastructure between a creative idea and an audience is becoming invisible. The technical field is the part that changed, and it is the part we build.”
Murad Mordukhay, CEO and co-founder, Qencode
Frequently asked questions
What is AI video infrastructure?
It is the layer of automated, AI-driven services that sits between a finished master file and the viewer’s screen. AI video infrastructure handles tasks like thumbnail selection, aspect ratio reframing, translation and dubbing, and scene-level video intelligence at catalog scale. The same creative decisions get applied to every title in the library, including the back catalog that would never have justified a marketing review on its own.
How is AI thumbnail selection different from manual A/B testing?
Manual A/B testing requires a human to generate candidate thumbnails and a marketing team to run the test, which is why most platforms only optimize their top titles. AI thumbnail selection evaluates every candidate frame across composition, expression, color contrast, and distinctiveness, then ranks a shortlist for approval, and it does this across the entire catalog. The long tail of content gets the same optimization treatment as the front page, which is where the unit economics change.
Can AI dubbing actually preserve the original performance?
Modern voice synthesis preserves prosody, emotional register, and timing of the source speaker well enough for the majority of catalog content. A top-tier human voice actor on a tentpole feature is still doing something AI does not match, and that work is unlikely to disappear. For everything else, AI dubbing removes the cost-of-localization gate that used to keep titles from shipping into new markets, and platforms can now launch into a new region the same week the master lands.
What is video intelligence and how is it different from object detection?
Object detection identifies what is in a frame, and video intelligence understands what is happening across frames, including scene tone, narrative beats, and the relationship between scenes. The output is concrete production work: automated highlight reels with chapter timestamps, multi-format clip assembly from a single source, and content-aware recommendation features that move beyond metadata into the content itself.
Why does this matter more than generative AI for video right now?
Generative AI is producing impressive demos and some genuine production use cases. AI video infrastructure is already running at scale in production today and is changing the unit economics of video distribution. Both will matter for the next decade of streaming, and the infrastructure layer is the one moving the business faster and more quietly.
How does Qencode fit into AI video infrastructure?
Qencode is a full stack video platform with a pipeline that covers transcoding, live streaming, storage, content delivery, player, analytics, automatic subtitles, video intelligence, smart thumbnails, smart video cropping, forensic watermarking, and AI-powered video detection. The piece referenced here is Murad Mordukhay’s argument from the IAMT Journal Q1 2026 on why this layer matters more than the headlines suggest.
Read the full piece
The full article “When Infrastructure Thinks” runs on page 48 of the IAMT Journal Q1 2026 issue. For more from IAMT, the International Association of Media Technology, see their full publication archive.
Author
- Murad Mordukhay, CEO and co-founder, Qencode
