COEY Cast Episode 115

Open Weights, Long Context, and Real Time Video Chaos

Spotify

Apple Podcast

Open Weights, Long Context, and Real Time Video Chaos

COEY Cast

Riley Reylers
Hunter Glasdow

Episode Overview

02/26/2026

Open source models are leveling up and crashing straight into real workflows. This episode breaks down Alibaba’s new Qwen 3.5 open weights for agentic workflows, why “smaller model bigger results” only matters if your loops get cheaper and more reliable, and where governance still fails when agents touch real systems. Then it dives into Llama 4 rumors around huge context windows and local first setups and what that actually changes for brand memory and creator ops. Finally it unpacks PrunaAI p-video on DeepInfra for real time video plus audio, why it collapses ad production steps, and how to keep taste, truth, and authenticity intact.

Spotify

Apple Podcast

COEY Cast Open Weights, Long Context, and Real Time Video Chaos

Episode Transcript

Hunter: It is Thursday, February twenty sixth, twenty twenty six, and apparently it’s both Pistachio Day and Tell a Fairy Tale Day. Which feels correct, because the AI timeline is fully in “magical nuts and bedtime stories” mode right now. This is COEY Cast. I’m Hunter.

Riley: And I’m Riley. Also, if you’re new here, yes, this episode was assembled by a swarm of automations and AI tools. So if we suddenly start narrating like a Disney villain, that’s not a bug. That’s the experiment.

Hunter: We keep the weird. Today’s big cluster of stories is basically: open weights are leveling up again, Meta’s Llama four rumors are doing laps on X, and then PrunaAI just casually drops real-time video plus audio on DeepInfra like it’s nothing.

Riley: Wait, hold up. Before we go full fairy tale, can we talk about the Qwen thing? Because X is acting like “agentic workflows” is now solved forever.

Hunter: Yeah. Alibaba’s Tongyi team dropped new open-source Qwen three point five weights that people are pointing to, specifically Qwen3.5-122B-A10B and Qwen3.5-35B-A3B. And the pitch is super specific: efficient “Mixture of Experts” style models that are meant to run agent loops without turning your GPU bill into a horror story.

Riley: Okay, so what’s the first boring real-world task it crushes? Like not “it reasons better,” I mean the stuff creators and marketers actually hate doing.

Hunter: The first boring win is content ops glue. Things like: take a messy campaign brief, pull in brand rules, generate a structured set of deliverables, and prep tool calls. So it can draft the landing page sections, write ad variants, create a metadata bundle for your DAM, and kick off repurposing prompts for video and audio tools.

Riley: So it’s like… the producer. Not the star.

Hunter: Exactly. Agentic models are best when they’re the coordinator. They keep state, they call tools, they format outputs, and they don’t complain. Where it faceplants is when you let it run unbounded actions without guardrails. The “confidently wrong intern” energy gets dangerous when it can email customers or publish to socials.

Riley: Mm-hmm. The moment your agent can press the big red button, it will press the big red button. For fun.

Hunter: Yup. And Qwen’s story right now is “smaller model, bigger results,” so let’s do the BS detector checklist. My list is not sexy, but it works.

Riley: Hit me, Hunt.

Hunter: First, does it actually reduce total workflow cost, not just single-call cost? Agent loops are multiple calls: plan, tool, validate, repair. If the model saves you pennies per call but needs more retries, you lose.

Riley: Ooh, yes. Cheap model that spirals is like buying a cheap tripod that falls over and breaks your camera.

Hunter: Second, can it reliably produce structured outputs under pressure? Like strict JSON that doesn’t drift. Third, does tool use behave, meaning it chooses the right tool at the right time, and it knows when to stop.

Riley: And fourth is my favorite: does it stay calm when you give it ugly inputs. Like a transcript with typos, a half-finished Notion doc, and a screenshot that looks like it went through a fax machine.

Hunter: Exactly. That’s real life. Benchmarks don’t look like your Google Drive.

Riley: Okay but open weights are great until the lawyers wake up. What is X missing on compliance here?

Hunter: Two things. One, open weights doesn’t mean open risk-free. You still need to know what data goes into prompts, what comes out, and where it’s stored. If you’re in regulated industries, you need retention policies, audit logs, and access control. Two, agentic systems amplify IP risk because they touch more systems. It’s not just “model output,” it’s “model output plus actions plus external content pulled in.”

Riley: So the problem isn’t “it wrote a risky sentence.” The problem is “it wrote a risky sentence and then posted it everywhere.”

Hunter: Exactly. And that’s why the workflow is the product. Open weights help you host privately, but you still need governance.

Riley: Okay, pivot to the other bedtime story: Llama four. X is screaming “Mixture of Experts,” “multimodal,” and then the biggest flex of all, that massive context claim. In practice, what’s the longest-context use case that’s actually useful and not just a measuring contest?

Hunter: The useful version is “single project memory.” Like: your entire brand history, offers, positioning docs, past campaign performance summaries, and your current quarter strategy. Then the model can answer, draft, and critique with the full context without you constantly re-feeding it.

Riley: That’s actually huge for creators too. Like a YouTuber’s whole channel bible, sponsor constraints, audience lore, and the last six months of comments.

Hunter: Exactly. The useless version is: stuffing ten million tokens just because you can, then hoping it magically becomes truth. Context is not correctness. It’s just more stuff to be wrong about.

Riley: Also, I’ve seen people say performance gets weird after a certain point. Like it’s “supported,” but not “stable.”

Hunter: Yeah, and that’s the operational reality. Even if a model claims enormous context, you have to test where it degrades, how retrieval behaves, and whether your infra can even serve that window without timing out.

Riley: So if marketers hear “massive context,” do we actually shift from campaigns to what you called “persistent memory narratives,” or is that just a fancy way to overcomplicate a landing page?

Hunter: I think it’s both. The smart move is: treat your brand as a living narrative, with consistent memory, but keep outputs small and channel-specific. Don’t turn your homepage into a novel. Use the big context to make the small outputs more consistent.

Riley: I love that. Big brain, small deliverables.

Hunter: Exactly. Now, the local-first angle. People are already claiming they’re running Llama four variants locally. The realistic playbook is not “everyone becomes a GPU procurement company.”

Riley: Thank you. Because I do not want marketing ops arguing about rack units.

Hunter: The playbook is hybrid. Run local for privacy-heavy tasks like summarizing internal docs, drafting sensitive emails, first-pass analysis. Then escalate to hosted models for heavy multimodal generation or when you need the best quality. And you route it with policies: cost caps, risk tiers, and logging.

Riley: And the “gotchas” with open Mixture of Experts models?

Hunter: Routing and evaluation weirdness. Sometimes quantization can get painful, sometimes performance varies a lot across hardware, and long-context behavior can be unpredictable. Teams think the hard part is downloading weights. The hard part is making it boring and reliable.

Riley: Boring is the dream. Okay. Now the spicy one: PrunaAI “p-video” on DeepInfra. Real-time video, plus audio, and people are saying up to ten eighty at forty eight frames per second. That is… unhinged.

Hunter: This one is super practical because it’s an API deployment story, not a research flex. And that matters for creators and marketers because it collapses steps. You can iterate fast, preview drafts quickly, and generate audio inside the same pipeline.

Riley: So what’s the first marketing workflow that becomes trivially automated?

Hunter: Product animation ads. You start with a product image, you feed a prompt like “clean studio lighting, gentle camera push, premium vibe,” then you generate short clips in different aspect ratios, and you get draft voice or sound to match. That becomes a repeatable variant factory.

Riley: And what stays stubbornly human?

Hunter: Taste and truth. Taste is: is this actually on-brand, does it feel cheap, is the pacing right. Truth is: are we making claims visually that we can’t back up, are we implying features that don’t exist, are we creating misleading demos.

Riley: Also, humor. Like, AI can generate a joke, but it can’t feel the room the way the internet feels the room.

Hunter: Exactly. Now the authenticity crisis question. If video plus audio gets cheap and fast, brands can publish a lot more. But if they publish too much synthetic content without a signal of integrity, audiences just stop believing.

Riley: So what do we do? Watermark everything? Put a label that says “made by robots”?

Hunter: I think the better move is consistency and disclosure in the right places. Use AI for drafts and variants, but keep a recognizable human face, voice, or editorial stance that anchors trust. And keep provenance internally even if you don’t scream it publicly. You want receipts if something goes sideways.

Riley: Receipts are the new brand safety.

Hunter: They really are.

Riley: Okay, quick ecosystem temperature check since last week has been chaos. We’ve had Mercury two pushing reasoning diffusion, Google Lyria three messing with ad music, and the whole “ads inside assistants” thing bubbling. How do these stories connect?

Hunter: The connective tissue is automation economics. Everyone is racing toward models that are cheap enough, fast enough, and controllable enough to sit inside workflows all day. Not “hero prompts.” Infrastructure. And with assistants getting monetized, brands are going to need clean truth packs and structured data so they show up correctly in AI surfaces.

Riley: So the future is less “make one viral video,” and more “build a system that can make, test, and adapt content without melting your team.”

Hunter: Exactly. Now, if I had to place a bet based on what people are saying online: in a year, do open-source models win because they’re better, cheaper, or because enterprises are tired of vendor lock-in?

Riley: I think it’s vendor lock-in rage. Like, people will pay for quality, but they hate being trapped. Also, creators want control. They want their workflows to survive the next platform mood swing.

Hunter: I’m with you. And the planning advice is: build for swapability. Models change. Your system should not.

Riley: Tell a fairy tale day moral: don’t marry the model. Date the model. Marry the workflow.

Hunter: That’s the quote. Also, Pistachio Day moral: crack the shell, don’t eat the whole thing at once. Same advice for automation. Small, controlled rollouts.

Riley: Love that. Alright y’all, thanks for hanging with us on COEY Cast on Thursday, February twenty sixth, twenty twenty six.

Hunter: Subscribe if you want more of this chaos, and go check out COEY.com slash resources for AI news and updates.

Riley: And go celebrate Pistachio Day responsibly. No unbounded agent access. Catch you next time.

Most Recent Episodes

Open Voice, Multi Shot, and Google’s AI Music Push
04/01/2026
Open Qwen, Closed Loop: Multimodal Gets Real
03/31/2026
OpenClaw or Open Chaos? The Open Source Agent Reality
03/30/2026
Gemini Flash Live and the Great AI Workflow Reality Check
03/29/2026