COEY Cast Episode 88

Gemini 3 Ultra, Agent OS, and the War on Cinematic Sludge

Gemini 3 Ultra, Agent OS, and the War on Cinematic Sludge

Gemini 3 Ultra, Agent OS, and the War on Cinematic Sludge
  • Riley Reylers

    Riley Reylers

  • Hunter Glasdow

    Hunter Glasdow

Episode Overview

01/26/2026

Gemini 3.0 Ultra is promising analysis of up to one hundred hours of video per prompt, and that shift could rewrite how marketers and media teams mine their archives. This episode breaks down real workflows for webinar libraries, podcast backlogs, and brand consistency using multimodal analysis. Hunter and Riley dig into where models still fail on nuance and attribution, and why governance, receipts, and routing matter more than hype. They unpack Meta’s rumored Manus AI acquisition, agent operating systems, and why boring approvals are the killer agent use case. Finally, they cover Linum v2 and LTX 2 Pro, local text to video, audio to video storytelling, and how to avoid cinematic sludge while scaling human plus machine co-creation.

COEY Cast Gemini 3 Ultra, Agent OS, and the War on Cinematic Sludge
COEY Cast Gemini 3 Ultra, Agent OS, and the War on Cinematic Sludge

Episode Transcript

Hunter: It’s Monday, January twenty sixth, twenty twenty six. You’re listening to COEY Cast… and apparently it’s Spouse’s Day. So if you’re partnered up, congrats, you’re legally required to forward this episode and say “this is us.” I’m Hunter.

Riley: I’m Riley. And yes, this episode is made end-to-end by machines. Like, the whole thing. If something gets a little unhinged, that’s not a bug, that’s the vibe.

Hunter: Today’s big story is the internet collectively yelling about “Gemini three point oh Ultra,” and the flex is… it can take, allegedly, like a hundred hours of video in one prompt.

Riley: A hundred hours is… disgusting. That’s like, an entire reality show season. Wait, hold up, are we sure this isn’t just X doing that thing where everyone retweets a screenshot until it becomes law?

Hunter: Totally fair. It’s “launch buzz,” not a crisp spec sheet. But even if it’s half true, the workflow implication is real: long-form multimodal analysis becomes normal. Video, audio, transcript, slides, all together.

Riley: Okay, but give me the first real marketing workflow where that isn’t just a flex.

Hunter: Easiest one? Webinar libraries. You drop in your entire backlog: product demos, onboarding trainings, customer panels, founder AMAs. Then you ask it for a clean content map. Like, “What are the top objections prospects raise?” “Where do we explain pricing?” “What stories convert?” And it spits back time-coded clips, plus suggested hooks, plus a summary that’s actually grounded in the footage.

Riley: Mm. So it’s like, an intern that never sleeps, but with timestamps.

Hunter: Exactly. And it’s not just highlights. It’s “make an editorial brain.” If you have twenty episodes of a podcast, you can ask, “What topics do we keep repeating?” “Where do we contradict ourselves?” That’s brand consistency, but across media.

Riley: Okay but where does it faceplant? Because it always faceplants somewhere.

Hunter: Two places. One, nuance. If the audio’s messy, multiple speakers talking over each other, the model might hallucinate “who said what.” That’s lethal if you’re pulling quotes.

Riley: Yeah, imagine the clip saying your CEO promised something they absolutely did not promise. Instant group chat crisis.

Hunter: Exactly. And two, the “creative drift” problem. If everyone uses the same “find highlights” prompt, we get industrial-scale highlight reels that all sound like the same motivational LinkedIn guy reading your transcript.

Riley: The dreaded “thought leadership voice.” Like, “In today’s fast-moving landscape…” I will throw my phone.

Hunter: Right. The insight moat isn’t “it can summarize.” It’s whether your workflow forces specificity. Like: extract the claims, the objections, the proof moments, the emotional moments. Different buckets. Different outputs.

Riley: Who wins if this is real? Brand team, SEO team, or the intern who just got promoted to AI babysitter?

Hunter: The team that owns the archive. Which is usually nobody, which is the problem. But if someone owns the archive, the winners are brand and lifecycle. Because you can turn one long asset into a full month of coherent, on-message stuff without losing the plot.

Riley: Hot take: SEO also wins, but only if they stop thinking in blog posts and start thinking in “answer inventory.” Because if Gemini can digest the entire library, it can also surface the cleanest, most provable answer moments.

Hunter: Yeah, that ties into this bigger trend: assistants are the new distributors. Your content has to be structured enough to be eligible, not just inspiring.

Riley: Also, we’ve been seeing this shift where governance is the real differentiator. Like, cool, you can ingest a hundred hours of video. Can you do it without accidentally producing a highlight that violates a claim rule, or uses a customer story you don’t have rights for?

Hunter: Exactly. The more you automate, the more you need a control plane. At minimum, you need a “don’t let the agent publish unhinged stuff” setup.

Riley: Okay, define “minimum viable.” Like, if I’m a normal company, what do I actually do without building Skynet?

Hunter: Three things. First, you keep a source of truth for claims. If the model generates a line like “we cut costs by fifty percent,” it needs to cite where that came from, or it gets blocked.

Riley: Receipts culture. I love it.

Hunter: Second, you require structured output. Not a poetic summary. A real object: clip timestamps, speaker attribution, theme tags, risk flags.

Riley: And third?

Hunter: Routing. Low risk stuff can autopublish drafts. Medium risk stuff goes to a human queue. High risk stuff… the system should just refuse. Like, “I’m not touching medical claims, good luck.”

Riley: That’s actually such a vibe. AI should learn to say “no” more often.

Hunter: Now, speaking of agents: there’s chatter that Meta bought Manus AI for like two billion-ish, and people are calling it an “operating system for agents.”

Riley: The phrase “operating system for agents” makes me feel like I need to update my life. But also, that’s the shift, right? It’s not just who has the best base model. It’s who can execute.

Hunter: Yep. Models are becoming commodities. Execution layers are the leverage: orchestration, memory, tool use, policies, audit trails.

Riley: Where does agent orchestration become boring but valuable in marketing?

Hunter: Approvals and publishing. Everyone wants the sexy part, like “write the campaign.” But the boring win is: briefs come in, drafts generate, critics check them, a human approves diffs, and then the system schedules across channels. That is the unglamorous machine that prints consistency.

Riley: And where is it still science fair nonsense?

Hunter: Fully autonomous “run my brand” agents. Like, “go engage in the comments, be funny, and don’t start discourse.” That’s how you get a brand account accidentally joining a feud between two pop stars at midnight.

Riley: Wait, but I’ve seen people try that. It’s always like, the agent replies with “I appreciate your feedback” to a meme. The internet does not appreciate your feedback.

Hunter: Exactly. Comment sections require taste, timing, and context. Agents are great at workflows. They’re not great at vibes.

Riley: Also the Manus story has this extra spicy angle: people are talking about scrutiny because of China origins. That’s going to make governance and procurement a bigger deal, especially for enterprise.

Hunter: Totally. Geopolitics is now a product feature. Teams will ask, “Where is this hosted?” “Where did the training data come from?” “Can we run it inside our perimeter?” That’s why open and portable stacks are getting more attractive.

Riley: Okay, so if I want the open-source-ish equivalent to avoid vendor lock-in, what’s the move?

Hunter: You basically build a modular stack. A workflow engine, some model gateway or router behavior, a retrieval layer for your truth, and a critic layer for checks. The point isn’t one tool, it’s composability.

Riley: Translation: you’re gonna become a little bit of an ops person. Congrats.

Hunter: A little. But it pays off because you can swap models when the market shifts, which it will, weekly.

Riley: Speaking of weekly chaos: text-to-video is sprinting. Linum v two is getting love because it’s open-weight and small enough to run locally, and then LTX two Pro is popping off with audio-to-video. Like, upload audio and it generates synced visuals.

Hunter: That audio-to-video thing is sneaky powerful for marketers. Because audio is already your script lock. If you have the voiceover, you have the pacing, you have the beats.

Riley: Yes! And it’s the first non-cringe way to “turn every podcast into video.” Not just the static waveform with subtitles. You can do like, animated storyboards, or abstract visuals that actually match the rhythm.

Hunter: Exactly. The best format I’m seeing is “audio-led micro-doc.” You take one strong minute, you generate a tight visual sequence that supports the point, and you still keep it obviously stylized so it doesn’t pretend to be real footage.

Riley: The fastest way it becomes spam is when everyone does the exact same “AI b-roll” prompt and we get the same moody city shots and the same slow zoom on a laptop keyboard.

Hunter: The cinematic sludge returns.

Riley: Not the cinematic sludge. Please.

Hunter: Here’s where Linum v two matters, though. Local text-to-video isn’t just cost-avoidance cosplay. It’s privacy and iteration speed. If you’re a brand with unreleased product visuals or sensitive concepts, running locally or on your own infra is a real advantage.

Riley: But also, it’s prototyping. Like, you can generate rough ad concepts without waiting for a full production cycle. Then you pick the winners and do the human version.

Hunter: That’s the co-creation sweet spot. Humans pick the concept, machines generate variations, humans pick again.

Riley: Okay, quick ecosystem check before we wrap: the bigger pattern across all of this is multimodal plus agents plus open weights. We’ve got long-video understanding getting wild, agent “OS” moves, and then open video models getting good enough to actually ship drafts.

Hunter: And the missing piece across all of it is trust layers. If you don’t have policy, receipts, and routing, you don’t have automation. You have a slot machine with a subscription.

Riley: Wait, that’s… painfully accurate.

Hunter: Also, we’ve talked recently on the show about audio being the control surface, and you can feel that here. Audio-to-video workflows are basically saying: lock audio first, then let visuals follow.

Riley: Which is how humans work anyway. Like, music videos, ads, even comedy timing. Audio leads.

Hunter: So, practical takeaway for listeners: if you’re excited about “hundred hours of video in one prompt,” start smaller. Pick one webinar. Build a repeatable extraction template: hooks, objections, proof moments, and time-coded clips. Then put a human review gate on anything that becomes public.

Riley: And if you’re excited about agents, make them do boring stuff first. Approvals, tagging, scheduling, versioning. Don’t let them freestyle your brand voice in public. Yet.

Hunter: Yet. Alright, Spouse’s Day people, go be emotionally present for like… a minute.

Riley: Or just send them an AI-generated highlight reel of your relationship. Kidding. Mostly.

Hunter: Thanks for hanging with us on COEY Cast. Subscribe if you want more of this chaos, and check out coey.com slash resources for AI news and updates.

Riley: Catch you next time. And happy Spouse’s Day to the real ones who tolerate your automation projects.

Most Recent Episodes
  • Open Voice, Multi Shot, and Google’s AI Music Push
    04/01/2026
  • Open Qwen, Closed Loop: Multimodal Gets Real
    03/31/2026
  • OpenClaw or Open Chaos? The Open Source Agent Reality
    03/30/2026
  • Gemini Flash Live and the Great AI Workflow Reality Check
    03/29/2026