COEY Cast Episode 148

Open Qwen, Closed Loop: Multimodal Gets Real

Spotify

Apple Podcast

Open Qwen, Closed Loop: Multimodal Gets Real

COEY Cast

Riley Reylers
Hunter Glasdow

Episode Overview

03/31/2026

Alibaba’s open Qwen 3.5 Omni is pushing multimodal AI past flashy demos and closer to real workflow value. Voice, camera input, long audio context, and fast generation are starting to look less like chatbot features and more like a new interface for building drafts, prototypes, and internal tools. The bigger question is where this actually works for teams with approvals, brand rules, and security needs. The conversation also maps the rise of practical AI video through Kling 3.0, Dreamina, and Seedance 2.0, plus why Intercom’s Fin Apex 1.0 may be the clearest sign of how enterprises will really buy AI. The takeaway is simple. Route the right work to the right model and keep humans on taste, trust, and decisions.

Spotify

Apple Podcast

COEY Cast Open Qwen, Closed Loop: Multimodal Gets Real

Episode Transcript

Hunter: Happy Tuesday, March thirty first, twenty twenty-six, and happy César Chávez Day, plus apparently Bunsen Burner Day, which feels right because the AI internet absolutely set the lab table on fire this week. This is COEY Cast. I’m Hunter.

Riley: And I’m Riley. Also, yes, this episode was basically assembled by a little swarm of AI tools passing files around like caffeinated stagehands, so if anything gets a tiny weird, congrats, you found the live specimen.

Hunter: Honestly, today’s big story is one of the clearest signs that multimodal AI is leaving the toy aisle. Alibaba dropped Qwen three point five Omni, and people on X are losing their minds over what they’re calling audio-visual vibe coding.

Riley: Which, I’m sorry, is such an internet phrase. It sounds fake and real at the same time. But the demos are wild. You talk to it, show it things with a camera, maybe wave your hands at a sketch, and it starts helping generate websites, games, interfaces, all that.

Hunter: Yeah. And the reason it matters is not just, wow, cool demo. It’s that Qwen seems to be pushing a native multimodal stack pretty hard. Text, image, audio, video, low-latency voice, long audio context, multilingual speech handling. That combination starts to look less like chatbot plus add-ons and more like an actual interface layer.

Riley: Mmm. But hold up. This is where I wanna be annoying in a productive way. Because the timeline sees a model make a website from a spoken prompt and instantly goes, we have entered the future. And I’m like, okay, sure, but where does that become useful first for a normal company that still has legal, brand review, security, and someone named Greg in procurement?

Hunter: That is the right question. To me, the first sane use case is internal workflows and product demos, not full autonomous campaign creation. Like, a product marketer talking through a landing page idea while screen-sharing references, then getting a rough prototype, draft copy, maybe a suggested flow. That’s useful.

Riley: Yes. Internal first. Low embarrassment radius. Love that. Because campaign creative is where people get reckless. They see multimodal and think, sweet, now the machine can be my creative director, videographer, editor, and strategist. Umm, no.

Hunter: Exactly. This thing might be great at compressing the path from idea to rough artifact. But rough artifact is the key phrase. If you’re trying to survive budget review and legal review, you need repeatability, logging, governance, approvals, and some proof that the model does not go feral when your prompt gets messy.

Riley: Feral is the key word for this whole year, honestly.

Hunter: It really is. And this connects to what we were just talking about on recent episodes. We had that OpenClaw conversation literally yesterday, and before that we talked Gemini Flash Live and this whole voice-plus-camera workflow trend. So Qwen isn’t showing up in a vacuum. It’s part of a very obvious pattern.

Riley: Right. The pattern is, AI is getting eyes and ears and a mouth, and suddenly software feels less like forms and buttons and more like, hey, show me the thing, tell me what you want, let’s make a draft. Which is kind of the dream. Also kind of terrifying.

Hunter: Both can be true. And I think operators need to separate spectacle from systems. The spectacle is audio-visual vibe coding. The system question is, can this plug into a real workflow where someone describes a concept, the model drafts assets, routes them, tags them, maybe summarizes a meeting, and hands off structured outputs into the next step.

Riley: Wait, yes, structured outputs is the boring sexy part. Because if it just makes a pretty thing, cute. If it can turn a creative review call into usable assets and decisions, now we’re cooking with the Bunsen burner.

Hunter: Nicely done.

Riley: Thank you, I work very hard.

Hunter: But there’s another wrinkle here, which is the open model conversation. People online are acting like open means easy. It does not. Open means you have more control and more responsibility. Qwen’s openness is exciting for companies that don’t want to hand every workflow to a black-box vendor, but self-hosting or hybrid deployment still means governance, security, evals, monitoring.

Riley: Yeah, people romanticize control. They’re like, we’ll just run the open model. Babe, with what team? With what policies? With what incident plan when the model decides your brand tone is haunted startup founder?

Hunter: Haunted founder voice is a recurring risk category, yes.

Riley: Put it in the risk register.

Hunter: But seriously, open can be a real bridge, especially for experimentation. That’s part of why OpenClaw has been getting so much attention. Not because it replaces your team, but because it gives you a sandbox for repeatable tasks, approvals, memory, and coordination without fully outsourcing your stack.

Riley: And that’s what people online get half right. Open is not automatically enterprise-ready. But it is strategically useful if you want leverage and optionality. The wrong take is, open equals freedom. The better take is, open equals you now own more of the mess.

Hunter: Very well said.

Riley: Thank you. I contain multitudes and small ops anxieties.

Hunter: Now, while everyone’s posting Qwen demos, the video side got a huge push too. Kling three point zero and ByteDance’s Dreamina and Seedance two point zero had some of the strongest momentum of the past few days.

Riley: Oh, the AI video people were fed. Fully fed. Kling looks like it’s leaning into unified generation and stronger motion control, and Seedance through the Dreamina and CapCut world feels way more tuned for creators who actually need to ship stuff, not just admire it.

Hunter: That’s what stands out to me. This is less about pure novelty and more about practical production. Better realism, smoother animation, stronger character consistency, reference images, built-in audio, lip-sync. Those are marketer features. That’s ad iteration. That’s multi-asset testing. That’s localization.

Riley: But the bottleneck is not just model quality anymore. I’m sorry, it’s taste. Taste and process. Most teams still do not know what good AI-native production looks like. They’re using futuristic tools with old production assumptions.

Hunter: Explain that.

Riley: Gladly. Teams still think in this very traditional way where one hero asset gets made, then chopped into smaller pieces. AI-native production should be more like system design. You create a concept spine, reference pack, tone rules, product truths, motion language, then let the tools generate a family of assets from that. Not random one-offs. A system.

Hunter: That makes sense. It’s closer to building a repeatable creative engine than running a single shoot.

Riley: Exactly. And if you don’t do that, you just make faster mediocre. Which, to answer the internet’s favorite question, yes, I think AI video probably gives us both better ads and way more mediocre ads first.

Hunter: I agree. The floor is dropping and the ceiling is rising at the same time.

Riley: Ooh. That’s clean.

Hunter: Thanks. But the flood risk is real. If every brand can suddenly produce polished short-form video at scale, originality becomes more valuable, not less. The premium shifts from production access to point of view.

Riley: Yes. Because polished slop is still slop. And the feed is absolutely capable of drowning in beautiful nonsense.

Hunter: Which means the human premium becomes concept, taste, selection, restraint. Automation removes the grind. It does not eliminate the need for judgment.

Riley: Also, quick side note, the CapCut adjacency here matters. Anytime ByteDance ties powerful generation closer to distribution and editing behavior, you should pay attention. That’s not just a model launch. That’s workflow gravity.

Hunter: Great point. And we’ve seen that same platform gravity in the ad ecosystem too. Once generation tools live inside the places where editing, publishing, and optimization already happen, adoption gets easier and sameness risk gets higher.

Riley: Convenience is undefeated. So is creative flattening, if you’re lazy.

Hunter: Then there’s the other story that I think might actually be more important for enterprise strategy than any giant multimodal launch, and that’s Intercom’s Fin Apex one point zero.

Riley: Oh, absolutely. This one is such a sneaky important story. Because it’s not trying to be the smartest model in the universe. It’s just trying to be better at support.

Hunter: Right. Domain-specific, narrower scope, measured against business outcomes like resolution and efficiency. That, to me, looks a lot more like the real enterprise future than everybody chasing the biggest frontier headline every week.

Riley: Same. Because if I’m buying AI for a company, the grown-up question is not, did it trend on X. It’s, does this reduce pain for customers? Does this make my team faster? Does it lower the chaos tax?

Hunter: Exactly. And Fin Apex is a good reminder that specialized systems can outperform bigger general models in the lanes that actually matter to a business.

Riley: Which is kind of funny, because the AI world still has this prestige bias. Like, if it’s not the giant omni brain from a frontier lab, people think it’s less impressive. Meanwhile the boring company with the narrow model is over there quietly improving retention.

Hunter: Quietly winning is underrated.

Riley: Very. Also, this is where I push back on the obsession with model IQ. In support, nobody cares if the system can write a moody screenplay about sentient soup. They care if it resolves the ticket cleanly and does not invent refund policies.

Hunter: That should probably be the new enterprise benchmark.

Riley: Can it avoid inventing refund policies? Huge if true.

Hunter: So when you zoom out, the smartest AI strategy right now might be a combination of all three lanes. Frontier labs ship spectacle and push interfaces forward. Open ecosystems ship leverage and flexibility. Narrow systems ship actual business wins.

Riley: Mmm, yes. The answer is annoyingly hybrid.

Hunter: Usually is.

Riley: And that’s why I get a little skeptical when people online try to declare one winner. Like, frontier labs are not the whole story. Open source is not the whole story. The boring vertical players are not the whole story. The smart teams are routing work to the right kind of model and tool for the job.

Hunter: That’s the move. Build around tasks and workflows, not around loyalty to one vendor or one hype cycle.

Riley: And maybe, just maybe, stop confusing a cool demo with operational maturity.

Hunter: That would help.

Riley: I know, I ask a lot.

Hunter: So if I’m talking to creators, marketers, and media operators right now, my advice is pretty simple. Use multimodal frontier tools like Qwen to explore new interfaces and compress ideation. Use video tools like Kling and Dreamina where controlled variation and production speed matter. Use narrow systems like Fin Apex as the mental model for where ROI gets real. And keep humans close to taste, approvals, risk, and customer trust.

Riley: Mine is even simpler. Build the machine where the work is boring, repetitive, and easy to review. Keep humans where the stakes are emotional, strategic, or reputational. And for the love of the timeline, please do not auto-scale blandness.

Hunter: That is a strong note to end on.

Riley: I contain strong notes. It’s Tuesday.

Hunter: It is. Thanks for hanging with us on this Tuesday, March thirty first, also César Chávez Day and Bunsen Burner Day, which feels like a perfect mix of solidarity and controlled experiments.

Riley: Heavy on controlled. Light on setting your whole brand on fire.

Hunter: Check out COEY.com slash resources for AI news and updates.

Riley: And subscribe, obviously. Feed your brain before the multimodal robots start pitching your campaigns without adult supervision.

Hunter: Thanks for listening to COEY Cast.

Riley: Catch you later.

Most Recent Episodes

Open Voice, Multi Shot, and Google’s AI Music Push
04/01/2026
Open Qwen, Closed Loop: Multimodal Gets Real
03/31/2026
OpenClaw or Open Chaos? The Open Source Agent Reality
03/30/2026
Gemini Flash Live and the Great AI Workflow Reality Check
03/29/2026