COEY Cast Episode 149

Open Voice, Multi Shot, and Google’s AI Music Push

Open Voice, Multi Shot, and Google’s AI Music Push

Open Voice, Multi Shot, and Google’s AI Music Push
  • Riley Reylers

    Riley Reylers

  • Hunter Glasdow

    Hunter Glasdow

Episode Overview

04/01/2026

Google’s Lyria 3 Pro, Runway’s Multi Shot App, and Mistral’s open weights text to speech model all point to the same shift. Audio, video, and voice are becoming programmable workflow layers for creators and marketers. That opens the door to faster campaign concepts, localized narration, branded audio, and more efficient content production. It also raises bigger questions around taste, governance, approvals, and whether teams are making better work or just more of it. The real advantage is not having one more AI toy. It is building a stack that supports strategy, review, and brand consistency while keeping humans in the loop where judgment still matters most.

COEY Cast Open Voice, Multi Shot, and Google’s AI Music Push
COEY Cast Open Voice, Multi Shot, and Google’s AI Music Push

Episode Transcript

Hunter: Happy Wednesday, April first, twenty twenty-six, and yes, you made it to COEY Cast, the only show that occasionally feels like two humans got locked in a studio with a stack of models and just said, yeah, sure, let the machines help produce the whole thing. I’m Hunter.

Riley: And I’m Riley. Also, it is April Fools’ Day, which is honestly the perfect holiday for AI news because every headline this week sounds fake until you test it.

Hunter: It really does. And apparently it’s also Apple’s fiftieth birthday, which means somewhere a vintage Macintosh is looking at Runway and whispering, oh no, they taught the rectangle how to direct.

Riley: Wait, that’s actually kind of beautiful. Also, quick heads up, this episode was assembled by an extremely online pile of AI tools, automations, and workflow glue. So if anything gets a little weird, that’s not a bug. That’s the art director having a moment.

Hunter: Today we’ve got a really fun cluster of stories because they all point at the same bigger thing. Audio and video generation are not side quests anymore. Google’s Lyria three Pro is pushing AI music deeper into paid Gemini and the API world. Runway just dropped its Multi-Shot App, which turns prompt-based video into something way more like actual scene building. And Mistral has an open-weights text-to-speech model that is suddenly making voice infrastructure look a lot more open and a lot less locked up.

Riley: Translation, your content stack is getting real crowded real fast. Music, voice, video, all trying to become one-click enough that your intern, your founder, and your chaos goblin social lead can all make campaign assets before lunch.

Hunter: That’s the opportunity and the danger. Because the question isn’t just, can AI make more stuff. It can. The question is whether it helps you make better stuff or just more forgettable stuff faster.

Riley: Mm, yes. The great slop acceleration problem.

Hunter: Exactly. Let’s start with Lyria three Pro, because I think this one matters more than people realize. Google rolled it out to paid Gemini users and developers through the API, and the interesting part to me is not just music generation. It’s that branded audio is now becoming another production layer you can automate.

Riley: Yeah, and people on X are treating it like, finally, every brand gets a Hans Zimmer button. And, ah, no. That is not what’s happening. You can generate polished music fast, sure. But if your campaign already has no point of view, now you just have a shinier wallpaper for the same boring ad.

Hunter: Totally. A sane marketing team should use AI audio where music is functional, not sacred. Think product explainers, podcast beds, social teasers, event promos, B-roll soundtrack support, internal rough cuts. Places where speed matters and originality matters up to a point, but you don’t need a composer-led masterpiece every single time.

Riley: Right. If you’re Nike trying to make the anthem of the summer, maybe you still want humans sweating over that. But if you need six regional ad variants, an upbeat retail loop, and some sonic branding experiments by Friday, then yeah, this is suddenly very useful.

Hunter: And that shifts the relationship with agencies and production shops. I don’t think composers disappear. I think taste becomes the premium layer. The value moves from, I can make a track, to, I know what emotional shape this campaign actually needs and I can guide the system there.

Riley: Ooh, say that again. Because ownership of taste is the real fight now. The soundtrack is one prompt away, but the taste isn’t. Somebody still has to know when the “inspiring cinematic build” is giving insurance commercial and not luxury launch.

Hunter: That’s it. Prompting is not taste. Access is not judgment. And we’ve been talking about this on recent episodes too, especially with all the audio infrastructure stuff. Audio is moving from cute feature to real workflow layer. That trend is not slowing down.

Riley: Yeah, last few days have been loud. We were already talking about voice interfaces becoming a real brand layer with Gemini live workflows, and now you’ve got Google making music more accessible, plus Mistral pushing open voice. It’s like every part of the media stack is trying to become programmable.

Hunter: Which is a great segue to Mistral. Their new open-weights TTS model is the kind of release that gets the open-source crowd yelling, the moat is dead, from the rooftops.

Riley: They do love that phrase.

Hunter: They really do. But the more practical read is this: open voice is getting good enough that a lot of teams are going to revisit whether they want to depend entirely on closed vendors for narration, dubbing, support voices, or agent speech.

Riley: And I get why. If it’s cheaper, customizable, multilingual, and you can run it more on your own terms, that is very attractive. Especially if you’re doing localization at scale. Brand explainer in English becomes five other languages without making your ops team cry.

Hunter: Yes, but then comes the compliance meeting.

Riley: The scariest genre.

Hunter: Exactly. Voice is where the ethics conversation stops being abstract. You’re dealing with consent, cloning, disclosure, impersonation risk, and just basic brand trust. So if you’re choosing open versus closed, the real question is not only performance. It’s governance. Who can use it, what voices are approved, how reference audio is stored, how outputs are labeled, and what review process exists before anything public goes live.

Riley: This is where the open-source fantasy hits the enterprise wall. People say, we can build this ourselves. And I’m like, babe, can you maintain it, secure it, document it, and stop Gary from cloning the CEO for a joke video?

Hunter: That is the question. Just because you can self-host part of the stack doesn’t mean you should become your own media infrastructure company. The smartest teams will probably build around open components where it creates leverage, but keep hard guardrails and maybe still use managed layers in public-facing workflows.

Riley: So basically, build where it gives you control, buy where it saves you headaches, and never let the voice model become the most reckless person in the company.

Hunter: Pretty much.

Riley: Okay, but we have to talk about Runway because the Multi-Shot App is the one that made my group chats go feral. This is the first time in a minute that AI video felt less like, wow, neat clip, and more like, oh, this is an actual content product.

Hunter: I agree. That’s what stood out. It’s not just a raw model flex. It packages the workflow. Dialogue, sound effects, framing, multiple shots, easier assembly. That matters because creators and marketing teams don’t just need generation. They need structure.

Riley: Yes. Thank you. The industry spent forever obsessing over single-shot wow moments. But actual work needs scenes. It needs continuity. It needs the thing to feel like a campaign concept, not a random dream sequence from a sleepy algorithm.

Hunter: And for marketers, that means faster ad concepts, faster social cuts, faster product storytelling. Maybe not final for everything, but definitely usable for pitches, concept testing, mood films, rough story versions, and in some cases, polished short-form outputs.

Riley: I do want to challenge you though, Hunt. Because every time a video tool gets easier, people say, this changes everything. But the bottleneck doesn’t disappear. It just moves. So where does it move now?

Hunter: That’s a great question. I think it moves upstream into strategy and downstream into approvals. If making the asset gets easier, then the harder part becomes deciding what should exist, what fits the brand, what legal will allow, and what actually deserves distribution.

Riley: Mm-hmm. Humans pretending they totally would’ve storyboarded it all by hand.

Hunter: Yes, there’s definitely going to be some of that.

Riley: Because let’s be honest, a lot of teams are about to backfill process language around outputs that were created way faster than their old creative rituals. And that’s not even bad. It just means the new premium is creative direction, brand consistency, and taste review.

Hunter: Plus legal. We can’t ignore that. Especially as these systems get more commercially usable. Music provenance, voice consent, likeness boundaries, training concerns, disclosure policies. All of that gets more important, not less.

Riley: Also, what people online are getting wrong about AI video right now is kind of both extremes at once. Some folks still underestimate how fast it’s becoming usable for commercial work. Others think audiences will watch endless glossy nonsense forever just because the lighting is dramatic.

Hunter: Right. Audience tolerance is not infinite. The novelty window closes fast. If everybody can make cinematic-looking content, then cinematic-looking content stops being enough. The bar moves to concept, pacing, brand fit, and whether the piece actually makes someone feel something.

Riley: Or laugh. Or click. Or remember it twelve seconds later. Like, if the vibe is immaculate but the message is oatmeal, who cares?

Hunter: Exactly. And that connects to a bigger week in AI, honestly. We’ve seen this broad shift from flashy demos toward workflow products. Not just here. Across multimodal systems, voice interfaces, campaign operators, agent frameworks. The common thread is simple: the winners are not the tools with the loudest launch video. They’re the ones that remove real production friction without creating new chaos.

Riley: Which is why I’d rank near-term value kind of like this. AI video workflows for concepting and content ops, very immediate. Open voice, super practical, especially for localization and support. Generated music, useful but more situational unless your team actually produces a lot of media. And action-taking agents, still promising, still a little, um, lobster with a clipboard.

Hunter: I basically agree. Agents generate headlines first. Video and voice create value first. Music sits in a very interesting middle because it becomes powerful when it’s embedded in the rest of the stack. If your video tool, your ad workflow, and your publishing system all connect, then AI music stops being a novelty and becomes a production advantage.

Riley: Yeah, if it’s just one more toy, cute. If it’s wired into your process, now we’re talking.

Hunter: And that’s really the takeaway from all three stories. Audio, video, and voice are becoming composable building blocks. The opportunity is huge. But if you don’t have human review, clear taste standards, and workflow discipline, you’re just building a faster machine for average content.

Riley: Which, no thank you. We reject factory-farmed mediocrity.

Hunter: We do. Oh, and since it is April Fools’ Day, we were told to include one fake story and ask if you caught it.

Riley: So here it is. One of today’s claims was not real. Was it that Apple is celebrating its fiftieth birthday, that Runway’s Multi-Shot App is making structured AI video feel more commercially usable, that Google expanded Lyria three Pro into paid Gemini and API workflows, or that Mistral’s open voice model is making teams rethink closed vendors?

Hunter: And the fake one is…

Riley: None of those. Sorry. The joke is that all of this sounds made up, and somehow it’s not. Very on-brand for April first.

Hunter: Honestly, perfect. Thanks for hanging with us on COEY Cast.

Riley: Go check out COEY.com slash resources for AI news, breakdowns, and updates. It’s a good place to keep your brain calibrated while the robots learn cinematography.

Hunter: And subscribe so you don’t miss the next one.

Riley: Have a good April Fools’ Day, have a good Apple fiftieth, and maybe don’t give your voice clone access to the company account.

Hunter: Catch you next time.

Most Recent Episodes
  • Open Voice, Multi Shot, and Google’s AI Music Push
    04/01/2026
  • Open Qwen, Closed Loop: Multimodal Gets Real
    03/31/2026
  • OpenClaw or Open Chaos? The Open Source Agent Reality
    03/30/2026
  • Gemini Flash Live and the Great AI Workflow Reality Check
    03/29/2026