
COEY Cast Episode 139
Open Source Gets Real with LTX 2.3, Rakuten, and Kitten
Open Source Gets Real with LTX 2.3, Rakuten, and Kitten
Episode Overview
03/22/2026
Open source AI had a very loud week. LTX 2.3 moved closer to real production use with an API for image to video and native portrait output, which matters a lot for social teams and ad workflows. Rakuten AI 3.0 added fuel to the regional model conversation with a large Japanese release that raises useful questions about localization, transparency, and what counts as real innovation. Kitten TTS showed how small voice models can bring text to speech to browsers, CPUs, and lower cost products. The bigger takeaway is simple. Better models are not the whole game. Workflow design, human review, and operational sanity are what turn open tools into something a team can actually use.


Episode Transcript
Hunter: It is Sunday, March twenty-second, twenty twenty-six, which apparently is International Goof Off Day, so honestly the calendar understood the assignment. You’re listening to COEY Cast, the show assembled by an extremely online pile of AI tools, workflow glue, and whatever digital elbow grease survived the render queue. I’m Hunter.
Riley: And I’m Riley. Happy Goof Off Day, which is so funny because we are absolutely not goofing off. We are making a fully automated podcast about AI automating everybody’s job-adjacent tasks. Very normal. Very chill. Very not a black mirror screensaver.
Hunter: Totally normal. And today we’ve got a fun one because the open-source crowd is loud right now. LTX-two point three just got way more real as a production API for video, Rakuten AI three point oh is making noise as this huge Japanese Mixture of experts release, and Kitten TTS is out here basically saying, hey, what if your voice stack didn’t need a giant cloud bill to function.
Riley: It’s like the whole week’s theme is, oh, you wanted AI infrastructure without renting your soul to somebody else’s platform? Cute. Here you go. But also, um, now you own the mess.
Hunter: Yeah, and that is the trade. Everybody loves open source until somebody has to maintain it at two in the morning.
Riley: Or explain to legal why the “open” thing is actually, like, three licenses, a fine-tune, a maybe, and a Discord server.
Hunter: That too. Let’s start with LTX-two point three, because I think this is the clearest workflow story. The model itself already had buzz, but the big change is the production API. That means teams don’t have to stand up and babysit their own GPU stack just to get vertical video generation into a real pipeline.
Riley: Which is huge. Like, the internet hears “better model” and claps. But operators hear “I no longer need a tiny Kubernetes side quest to make ads.” That’s the real applause line.
Hunter: Exactly. And from what we’re seeing, the upgrades people care about are not abstract benchmark chest-thumping. It’s stronger image-to-video motion, cleaner textures, better prompt adherence, better audio, native portrait video. That last one matters a lot more than people admit because vertical is not some side format anymore. Vertical is the battlefield.
Riley: Thank you. I need every brand to stop acting like portrait video is the weird cousin at Thanksgiving. It is the main room. If your AI video stack still treats nine by sixteen like an afterthought, you’re building for the wrong internet.
Hunter: Right. And the moat question here is interesting. Is the moat the model, the workflow, or who removes the most GPU pain first? My answer is mostly workflow, with a side of operational simplicity.
Riley: I’d go even harder. I think raw model moat is getting cooked. Not dead, but cooked. Because if multiple teams can access good enough video, then the value shifts to who can turn a brief into ten usable, on-brand variants without chaos.
Hunter: Hm, yeah.
Riley: Like, nobody in marketing wakes up and says, wow, I hope we procure more latent space today. They want a system where campaign idea goes to storyboard goes to motion draft goes to approval without three people manually downloading files called final-final-v-seven.
Hunter: That makes sense. And this lines up with something we’ve been talking about on recent episodes. Faster generation moves the bottleneck. It doesn’t remove it. With Runway pushing real-time-ish creation and all these video systems getting quicker, the new bottleneck becomes taste, approvals, brand safety, and deciding what deserves to exist.
Riley: Which, by the way, is why marketers should not use LTX-two point three to flood the web with polished nonsense. We already have enough polished nonsense. The internet is basically shellacked nonsense at scale.
Hunter: So what should they do instead?
Riley: Use it for volume in the draft layer, not the truth layer. That’s the trick. Let the machine make options. Let the human decide which story is actually worth telling. If your workflow is just prompt to publish, congrats, you built a slop cannon with decent lighting.
Hunter: I like that framing. Draft layer, not truth layer. You can use LTX for concept testing, ad angle exploration, shot prototypes, even localized variants, but the human should still be the one judging whether the content says anything real, useful, or brand-safe.
Riley: And honestly, short-form teams should think in systems now. If you can generate portrait video directly, then your pipeline can go from campaign brief to visual concepts to social cuts much faster. But you need critics in the loop. Not just vibes. Actual checks.
Hunter: Yeah, we’ve covered that a lot lately. Automation should own prep work, formatting, routing, maybe first-pass generation. Humans stay close to final judgment, sensitive claims, and anything that can create brand risk.
Riley: Also, people on X seem hyped that LTX makes it easier for builders to ditch expensive subscriptions and roll their own thing. That energy is real. But I did see some pushback too, mostly around motion weirdness and the occasional uncanny output. So, not magic. Just more usable.
Hunter: Which, honestly, is usually the real step forward. Not perfect. More deployable.
Riley: There it is. More deployable is the hot girl phrase of AI infrastructure.
Hunter: I don’t think that phrase means what you think it means.
Riley: Hunt, I know exactly what I’m doing.
Hunter: Fair enough. Now, Rakuten AI three point oh is a different story. This one is less about media generation and more about what it signals. Big open-weight, region-specific models are still very much a thing. And in this case, a very large Japanese-focused Mixture of experts model tied to national AI infrastructure.
Riley: Which I think a lot of U.S.-centric AI people forget. Not every important model story is, like, whichever Bay Area lab dropped a moody benchmark chart this week. Language, regulation, regional enterprise needs, those are real. Localization is not a side quest.
Hunter: Exactly. If you serve a specific market deeply, especially one with language nuance, domain context, and enterprise adoption needs, a regionally strong model can matter more than some generic frontier brag sheet.
Riley: But hold up, the spicy part on X is not just praise. There’s also skepticism around originality. Like, cool, huge release, open-ish energy, national infrastructure story. But people are asking whether some of these giant launches are true innovation or just expensive relabeling with a patriotic ribbon on top.
Hunter: Yeah, and organizations need to get more mature about that. The question shouldn’t be, is this totally original in some pure research sense. It should be, what was actually built, what was adapted, what is licensed, what is transparent, and what performance can you verify in your context.
Riley: Thank you. Because tech history is full of this. New wrapper, new packaging, new keynote music, same bones. Sometimes that’s fine. Productization matters. But don’t sell me a remix as a moon landing.
Hunter: Right. A well-executed adaptation can still be valuable. If a company takes an existing foundation, tunes it properly for a market, releases it under usable terms, and supports enterprise deployment, that’s not nothing.
Riley: No, totally. But be honest about the stack. That’s the key. If your model is built on top of another base, say that. If the innovation is in fine-tuning, infrastructure, localization, or cost profile, say that. The trust hit comes from pretending every release sprang fully formed from the forehead of Zeus.
Hunter: And for buyers, the winners here are probably a mix. National ecosystems win because they reduce dependency and build local capacity. Enterprise buyers win because they get more tailored options. Open-source communities win because they keep forcing transparency and reuse into the conversation.
Riley: Open-source people are basically the unpaid forensic accountants of AI at this point.
Hunter: That’s, uh, actually pretty accurate.
Riley: They are. Every time a big shiny model drops, somebody in open source is already in the files going, wait a second, why does this smell like a previous checkpoint with new merch.
Hunter: Which is healthy. Because over the next two years, we’re probably headed toward more open models, more agents, more automation, and more executives pretending all of this is perfectly on schedule.
Riley: Oh my gosh, yes. The slide deck always says transformation is on track, while in reality Kyle from ops is manually restarting a workflow and whispering, please don’t break in front of the client.
Hunter: So what should people be excited about? I’d say lower cost experimentation, more local control, and more modular stacks. Teams can mix open models, commercial APIs, and automations in ways that actually fit their budget and risk tolerance.
Riley: And what should they worry about? Fake certainty. That’s my answer. Because better tooling can make bad process look slick. A polished wrong answer is still wrong. A beautiful bad video is still bad. A branded uncanny voice still creeps people out.
Hunter: Which brings us perfectly to Kitten TTS.
Riley: The tiniest menace of the week. I love this one.
Hunter: It’s a great story. Tiny open-source text-to-speech variants that can run on CPUs and even in browsers through WebGPU, with a lot of excitement around expressive speech and low infrastructure overhead. That matters because it brings voice generation way downmarket.
Riley: Yeah, this is the part where every scrappy team goes, wait, I can have branded voice features without getting trapped in another SaaS invoice cosplay situation?
Hunter: And for some use cases, yes. If you need local narration, embedded product voice, offline assistants, accessibility features, lightweight localization drafts, this kind of model makes a lot of sense.
Riley: Especially if privacy matters. Or latency. Or you just don’t want every tiny voice interaction to ping some cloud endpoint and rack up cost while your user waits.
Hunter: But not every use case deserves local deployment. If you need enterprise-grade multilingual support, advanced voice governance, and polished production tooling, a managed service may still be the smarter path.
Riley: Yes, thank you. I need the open-source absolutists and the SaaS absolutists to both relax a little. Sometimes local is the move. Sometimes paying for the boring reliable thing is worth it.
Hunter: Exactly. It depends on the workflow. Helpful brand utility is things like quick in-app narration, accessible product guidance, maybe interactive voice notes. The bad version is when customers realize they’re stuck in a creepy phone tree that sounds suspiciously cheerful.
Riley: Ah yes, the uncanny valley receptionist. She knows your order number, but not what empathy is.
Hunter: And this is where human-in-the-loop still matters. Even if the model is tiny and local, someone should be evaluating voice design. Does it sound on-brand? Does it sound trustworthy? Is it actually helping?
Riley: Also, creators should look at this and think beyond “AI voiceover.” Like, could this power local editing tools, fast rough-cuts, character prototyping, browser-based storytelling, interactive demos? Tiny models unlock weird fun product ideas because the friction drops.
Hunter: That’s a great point. Lowering infrastructure burden often expands creativity more than a marginal quality gain does. If a tiny voice model is good enough and easy to ship, it gets used in more places.
Riley: Which is why I think this whole week really points to one thing. The future winners are not the people with access to one magical model. It’s the teams that design sane systems around multiple models.
Hunter: Yep. Model routing, verification, human review, selective automation. We talked about this with governed marketing orchestration, with local OpenClaw setups, with stronger reasoning stacks. The point is not full autonomy. The point is reliable throughput.
Riley: And deciding what gets automated now versus what still needs a very awake human. Automate research prep, tagging, draft generation, formatting, route decisions when the rules are clear. Keep humans on final taste, sensitive claims, nuanced messaging, and any moment where context can blow up the brand.
Hunter: That’s the playbook. Open source early when it helps you learn and lower costs, but don’t confuse early adoption with maturity. If you bring in open models, own the ops reality too.
Riley: Yeah. Don’t be like, we’re sovereign now, and then realize nobody on the team wants to maintain the stack. Freedom is cute until somebody has to debug the freedom.
Hunter: That might be the episode title.
Riley: Honestly? Put it on a shirt.
Hunter: On that note, thanks for hanging with us on this very International Goof Off Day edition of COEY Cast.
Riley: Thanks, friends. Go do a little goofing off, but, you know, responsibly. Maybe let the bots do the boring stuff first.
Hunter: And make sure you check out COEY.com slash resources for AI news and updates.
Riley: Subscribe too, so your podcast app can lovingly auto-deliver our latest experiment in machine-assisted chaos.
Hunter: Catch you next time.
Riley: Later.




