Beyond Generation: The Untold Side of AI Video Production
Hi, long time no see!
I’m Kiki, co-founder of SpaceTime Creative. With a background in animation and product design, I currently work at the intersection of AI video creation and product development. Over the past year, we’ve been running projects while building tools that reimagine how animated videos are produced using AI.
I recently listened to an episode of the podcast Silicon Valley 101 (one of my favorite Chinese podcasts), titled “How AI is Changing the Animation Industry.” The guest, Feb.Tea from the Azuki project, said something that really stuck with me:
“The problem in animation isn’t demand—it’s supply.”
Animation is a labor- and talent-intensive industry. High-quality series often rely on experienced artistry, aesthetic judgment, and tight collaboration. A 12–24 episode series can easily take up to three years to produce.
Feb.Tea visited three types of companies in Japan:
Traditional studios still focused on hand-drawn workflows, but experimenting with AI tools for in-betweening and background generation.
AI-native startups that abandon traditional pipelines entirely—for example, using motion capture combined with AI style transfer.
Tool-based teams building AI assistants to support, not replace, directors and creators.
But across all types, one shared challenge emerged: even a 5% error rate in AI-generated content can get amplified in animation. A crumpled shirt in one frame, an extra wood grain on a clock the next, or a misaligned facial feature after that—viewers may interpret these as intentional, not glitches. The result? More time spent fixing errors than if it had been drawn by hand from the start.
Our team fits closest to the second group, but we also adopt tool-building approaches from the third. Through our work over the past year, we realized that the bottleneck isn’t just in production—but increasingly in review.
This is especially true for educational videos, where accuracy is paramount. The problems tend to cluster in two places:
On the generation side: Prompting sometimes fails dozens of times before getting the desired output, forcing us to rely on image-to-image or manual edits.
On the review side: Even after content is generated, figuring out what’s wrong and where becomes a “find the flaw” puzzle.
Just like in production, review has a structural issue. If each frame has a 5% chance of including an unnoticed error—and those errors happen to be the ones viewers notice—the credibility of the content takes a hit.
So we started seeing content not as “done when it’s made,” but “done when it’s approved.”
This insight was reinforced repeatedly in client projects. No matter who the content is for, there’s always a multi-step sign-off: by directors, QA leads, brand reps. The speed of this chain directly impacts production capacity.
And yet, review remains tedious, repetitive, and heavily reliant on expert knowledge. The more content AI generates, the more pressure there is on human judgment. That’s what led us to build an AI-assisted review system.
This tool isn’t for automatic approval—it’s for catching risks humans might overlook. If clients can review 20 minutes of video in the time they used to review 10, that’s a 100% increase in review efficiency—and a direct boost to overall production output.
Picture this: You’re reviewing a scene set in a science lab from the 1780s. You need to judge whether the air conditioner on the wall is anachronistic, whether the furniture is too modern, whether the microscope is era-appropriate. Then consider the insects being studied—do their proportions, legs, and antennae match real species? And don’t forget details like a suspiciously thin dictionary or a laptop that somehow ended up on the table.
A one-minute video contains at least 12 distinct shots—since most image-to-video tools generate around 5 seconds per image. Each frame needs to be reviewed, which makes the task highly labor-intensive.
That’s why we stepped in with AI—not to replace reviewers, but to support them in surfacing red flags early.
Of course, even as we focus on solving these real-world challenges, something more imaginative is taking shape.
One of AI’s greatest promises is that it makes the previously impossible… suddenly possible. Which is why the questions people ask often fall into two categories:
Can it replace an existing workflow?
Can it create something new?
When AI video tools are used in industrial production, most scrutiny is on the first. We compare them to what’s already in place. But because many problems remain unsolved, teams tend to be cautious.
One especially critical insight from Feb.Tea: if AI replaces in-betweening, the biggest cost isn’t efficiency—it’s the loss of entry-level roles.
In-betweening has long been how junior animators learn the craft: motion structure, proportions, rhythm. Without this training ground, the already fragile animation talent pipeline could break entirely—mirroring what’s happening with junior developers in software.
But if we flip the lens—what if we use AI not to replicate the old, but to imagine the new?
Examples include:
Animating historical photos so characters can speak.
Slicing different fruits and matching the texture with hyperrealistic ASMR sound.
Generating fantasy “Animal Olympics” shorts that no amount of cat treats could reproduce in real life.
These types of content would’ve cost a fortune to produce using traditional VFX. Now, it takes a prompt and some creativity. The barrier to entry is lower than ever.
So rather than just worrying about “replacement,” we should ask:
Where do creators begin? How do learning paths evolve? Are we redefining what counts as content?
We’re already seeing more creators who didn’t attend film or art school, but can now independently complete an entire video workflow—from script to visuals to editing. Festivals are introducing AIGC categories. Platforms are testing AI-generated shorts. Creator profiles are shifting. Standards are evolving.
As Feb.Tea put it:
“Technological shifts won’t erase the human urge to create. The impulse to express is coded into us. Art doesn’t disappear—it just transforms.”
AI video tools have only been around for about three years. In that time, we’ve gone from local prototypes to platforms like Runway, Sora, Kling, Dreamina, and Hailuo.
Even more wild: model updates now outpace production cycles. A video that takes 3 months to deliver might span two full model versions. That kind of speed is unique to this moment in time.
We’re living through a shift in expressive language—new formats, new syntax, new kinds of stories.
It’s not fully defined yet. And that’s what makes it so exciting.
Thanks for reading—if any of this resonates, let’s connect! :)