COEY Cast Episode 136

Open Source Ears, Real Time Eyes

Spotify

Apple Podcast

COEY Cast

Riley Reylers
Hunter Glasdow

Episode Overview

03/20/2026

Runway is pushing AI video toward real time creation, with reported sub 100 millisecond response that could turn generation from a waiting game into a live creative tool. SkyReels V4 shows a different shift, where video models start looking more like usable software with benchmarks, pricing, multimodal inputs, and native audio. QuarkAudio adds the open source angle, pointing to a future where audio cleanup, separation, and voice tasks get less fragmented and more flexible. The bigger takeaway is not full autonomy. It is modular workflow design. Faster models move the bottleneck from rendering to judgment, approvals, brand safety, and taste. Human direction still matters most when automation makes endless iteration cheap.

Spotify

Apple Podcast

COEY Cast Open Source Ears, Real Time Eyes

Episode Transcript

Hunter: Happy Friday, March twentieth, twenty twenty-six, and hello from COEY Cast. It is somehow World Storytelling Day, which feels correct, because today we are talking about machines that are trying very hard to become your fastest little story department. I’m Hunter.

Riley: And I’m Riley. Also, this episode was assembled by a small parade of AI tools with absolutely no shame, so if the robot energy gets a little chaotic, that is not a bug, that is, um, a live demo with better branding.

Hunter: Yeah, the digital interns are clocked in. And today’s big thing is speed. Runway showed a real-time video generation model at NVIDIA GTC yesterday, and the spicy part is the reported time to first frame being under one hundred milliseconds. That is not just faster rendering. That is a workflow shift.

Riley: Totally. That’s the part people on X are freaking out about with the whole “new medium” thing. And, like, ok, maybe that phrase is a little dramatic, but also… if video stops feeling like baking and starts feeling like sketching, that does change how people work.

Hunter: Exactly. Most AI video has been prompt, wait, regret, retry. This starts looking more like creative steering. You move the scene, test a different camera angle, swap the vibe, adjust product framing, and you’re not paying the emotional tax of a full rerender every time.

Riley: Prompt, wait, regret is the most accurate genre label for the past year. But let me push you on this, Hunt. Just because you can tweak live doesn’t mean marketers should turn every campaign into an infinite sandbox where nobody ships anything.

Hunter: One hundred percent. Real-time does not remove the need for taste. It actually makes taste more important, because when iteration gets cheap, indecision gets expensive. The teams that win are not the ones making endless options. They’re the ones who know what a good option looks like fast.

Riley: Mmm. That’s the hidden bottleneck. Everybody’s like wow, sub one hundred milliseconds, and I’m like cool, now your Slack thread can be wrong at the speed of light. The machine got faster, but legal did not suddenly become a Formula One pit crew.

Hunter: That’s probably the actual answer to where the bottleneck moves next. Not compute. Judgment. Brand safety. Rights. Review. Human alignment. The render queue is shrinking, but the approval queue is still very much a queue.

Riley: And also, let’s be honest, internal opinions. The creative director wants bold. The paid team wants safe. The founder wants “make it pop,” which should be illegal as feedback. Live video generation means all those debates happen sooner, not less.

Hunter: Right. It compresses the feedback loop. Which is great, if you have a system. Dangerous, if you don’t. I think the smart use case for marketers is not “everyone becomes a live VJ now.” It’s more like rapid pre-vis, ad variant testing, virtual set ideation, reactive social, maybe product loops where you want to iterate scene direction in real time.

Riley: Yes. Also branded content that responds to culture faster. Not fake jumping on every meme in six seconds, but if your team can react to a trend while it still matters, that’s huge. Before this, the trend was already in its flop era by the time your render finished.

Hunter: And this fits something we talked about recently on the show and over on the blog around real-time video and image-to-video becoming workflow primitives. Once it feels interactive, people stop treating it like a special effects moment and start treating it like a production tool.

Riley: Which brings us to SkyReels V4, because that story has a slightly different flavor. Less “look how live this feels,” more “oh wow, video models are starting to look like actual software products.” Benchmarks, pricing, multimodal inputs, native audio, all in one package.

Hunter: Yeah. The buzz around SkyReels V4 is that it hit the top of the Artificial Analysis Video Arena for text-to-video with audio, and people are talking about up to fifteen-second ten eighty p clips with synchronized sound. That matters because audio has been the annoying extra suitcase in a lot of these workflows.

Riley: Thank you. I am so tired of the fake simplicity where the demo looks done but behind the curtain you still have to patch in voice, music, ambience, timing, and then pray the whole thing feels intentional. Native audio is not just a feature, it’s less glue code and less cleanup.

Hunter: That’s the real business story. If you can go from prompt plus references to a clip with sound that is close enough for campaign testing, the unit economics change. Suddenly this isn’t a research toy. It’s closer to production software.

Riley: But benchmark winner does not automatically mean “please represent our brand in public.” That’s where people get goofy. I do not care if a model won the arena if it still turns your product into a cousin of itself halfway through the clip.

Hunter: Exactly. Smart teams should ask very boring questions. Does it keep product identity stable. Does it hold up under revisions. Can we route outputs into review. Is the pricing predictable when you account for retries. Does it have API access that plays nicely with the rest of your stack.

Riley: And can your team control it without becoming prompt shamans. That one matters. If only the one weird genius on your team can get good outputs, that’s not a workflow, that’s a dependency problem in a trench coat.

Hunter: Totally. The other risk is polished mediocrity. Multimodal models can make it very easy to generate lots of decent-looking content. But decent-looking is not the same as persuasive, memorable, or on-brand.

Riley: Wait, yes. That is the entire thing. We are getting dangerously good at making smooth slop. If you feed these systems vague strategy, they’ll hand back beautifully lit nothing. It’s giving premium wallpaper.

Hunter: Which is why automation should own breadth, not final taste. Let the machine make the spread of options. Let humans decide what deserves to exist. For a creator or growth team, that means AI can handle rough concept expansion, alternate hooks, localization variants, filler visuals, maybe even synced social cutdowns. But the core promise, the emotional angle, the truth of the brand, that still needs a person awake at the wheel.

Riley: Awake and maybe slightly opinionated. I don’t want fully autonomous vibes-based brand management. That sounds like how you wake up to an ad campaign that technically followed the prompt and spiritually committed a felony.

Hunter: Fair. And that connects to the audio story, too. Alibaba open-sourced QuarkAudio today, and this one is sneaky important. Because audio is where a lot of real adoption happens quietly.

Riley: Oh, stealth layer for sure. Nobody posts a dramatic teaser trailer because their cleanup pipeline got better. But then all of a sudden their podcast sounds cleaner, their dubbing costs drop, their voice restoration gets easier, and their repurposing pipeline stops being held together with four brittle tools and hope.

Hunter: That’s why I like this story. QuarkAudio is being framed as one open-source model that can handle multiple audio tasks without task-specific prompting. Speech restoration, voice conversion, audio separation. If that holds up, it reduces fragmentation in the stack.

Riley: Which is kind of huge. Audio has had this long history of being weirdly fragmented. You’d have one tool for cleanup, another for stems, another for voice, another for dubbing, and they all kind of hated each other. A unified open-source layer could make the whole thing less annoying.

Hunter: And open source changes the conversation in a big way. People online keep saying open source is winning, and I think the truth is more nuanced. Open source is winning on speed of experimentation and flexibility. It is not automatically winning on governance.

Riley: Yup. Free can absolutely outpace enterprise-ready, until legal asks who owns what, where the model runs, how outputs are logged, and why the intern apparently cloned the brand voice into something cursed at two in the morning.

Hunter: That’s the tradeoff. Flexibility versus accountability. If you adopt open-source audio, video, or agent frameworks, you get more control. But you also inherit more responsibility. Logging, permissions, approval layers, disclosure, storage, rollback plans. You do not get to skip the grown-up part.

Riley: Speaking of grown-up part, this is where the OpenClaw style agent conversation sneaks back in. Because once you’ve got open video, open audio, and more capable agents, people start fantasizing about a fully autonomous content machine. And, ah, slow down, techno cowboy.

Hunter: Yeah. Open agents are useful when they do scoped operational work. Research prep, asset tagging, file movement, basic routing, draft packaging. They are not strategy. They do not understand nuance. They do not know the difference between bold and career-limiting.

Riley: That line is so real. An agent can absolutely help me assemble a campaign packet. I do not want it deciding the campaign thesis because it read five trend summaries and got overconfident. That’s how you end up with “let’s make the brand more feral” in a board meeting.

Hunter: Although, to be fair, some brands would try it.

Riley: Some brands deserve it.

Hunter: Fair enough. Big picture, if these layers keep accelerating together, real-time video from Runway style systems, practical multimodal packaging from SkyReels type models, open-source audio workhorses like QuarkAudio, and better open agents, then the realistic AI-forward org in two years is not fully autonomous. It’s modular.

Riley: Yes. Modular is the word. Small teams with stronger pipelines. Humans setting direction, guardrails, and taste. Machines handling prep, variants, cleanup, localization, and repetitive execution. Basically fewer people trapped in production sludge.

Hunter: Exactly. And probably more review infrastructure than people expect. Critic layers, approval states, provenance, risk tiers. The companies that move fast safely will look less like “AI replaced the team” and more like “the team finally stopped doing soul-crushing manual glue work.”

Riley: Which is the dream, honestly. More making. Less babysitting. More creative range. Less export gymnastics. And hopefully fewer moments where a model gives your brand six fingers and a lawsuit vibe at the same time.

Hunter: Before we wrap, quick tiny aside. Today also brought news about an app trying to help people talk to their pets with AI, which honestly feels inevitable.

Riley: If AI tells me why my cat keeps knocking things off shelves and the answer is just “for the plot,” I will believe it immediately.

Hunter: Same. Also saw chatter about AI agents making payments on the XRP Ledger, which is either the future of machine commerce or the setup for a very specific kind of headache.

Riley: Cute until your autonomous agent develops a spending habit. Hard pass unless the guardrails are very, very adult.

Hunter: And that’s probably the theme of the whole episode.

Riley: Faster tools, stronger workflows, more reasons to keep humans in the loop.

Hunter: That’s COEY Cast for Friday, March twentieth, twenty twenty-six. Thanks for hanging with us on World Storytelling Day while we talked about the machines getting a lot better at helping make the stories.

Riley: And if today is Absolutely Incredible Kid Day too, shout out to the little future prompt engineers currently asking the family tablet to generate a dragon with a skateboard and somehow having better instincts than half the industry.

Hunter: Please subscribe if you haven’t already, and check out COEY.com slash resources for AI news and updates.

Riley: Thanks for listening. Catch you next time.

Most Recent Episodes

Fun-CosyVoice, Sonic Identity, and Agents in Hoodies
03/03/2026
Gemini 3, GPT 5.3, and Kling 3.0: Workflow or Hype Show
03/02/2026
Open Weights vs Ad Agents: GLM5, Google AI Max, Meta Manus
03/01/2026
Voice Is the New Landing Page Open vs Closed and Real Time Video
02/28/2026