COEY Cast Episode 145

Open Mic Night for AI: Covo, Cohere, and NotebookLM

Open Mic Night for AI: Covo, Cohere, and NotebookLM

Open Mic Night for AI: Covo, Cohere, and NotebookLM
  • Riley Reylers

    Riley Reylers

  • Hunter Glasdow

    Hunter Glasdow

Episode Overview

03/28/2026

Audio just stopped being a side feature and started looking like core workflow infrastructure. This conversation tracks three big signals behind that shift. Tencent’s open Covo Audio pushes toward more natural voice interaction with lower latency and better interruption handling. Cohere’s open speech recognition model could unlock cheaper, faster transcription for meetings, podcasts, support, and multilingual operations. NotebookLM is also stretching beyond research and into narrated video creation, collapsing steps that used to live across multiple tools. The real question is not which demo looks coolest. It is where automation actually removes friction while keeping humans close to judgment, brand, accuracy, and risk. That is where creators, marketers, and operators get real leverage.

COEY Cast Open Mic Night for AI: Covo, Cohere, and NotebookLM
COEY Cast Open Mic Night for AI: Covo, Cohere, and NotebookLM

Episode Transcript

Hunter: Happy Saturday, March twenty eighth, twenty twenty six, and hello from COEY Cast, the podcast that was basically assembled by a stack of robots, a few prompts, and whatever chaos was floating through the automation pipes at the time. I’m Hunter.

Riley: And I’m Riley. Also, apparently it is Something on a Stick Day, which feels weirdly correct for the internet right now because every AI demo is like, hey, what if we put another feature on a stick and called it a product?

Hunter: That is painfully accurate. Today we’re digging into a cluster of stories that actually matter if you make content or run marketing workflows. Tencent’s Covo-Audio is making noise because it does voice interaction in a more native way, Cohere dropped an open speech recognition model that people are very hyped about, and NotebookLM is sliding from research tool into, like, wait… are you making the whole video now?

Riley: Yeah, that last one gave me the same feeling as when Canva started doing things that used to require three tabs, a designer, and emotional support. It’s like, oh cool, the workflow just collapsed into one box.

Hunter: Exactly. And the bigger theme under all of this is that audio is no longer the cute side quest. It’s becoming core infrastructure. We talked about that recently with open speech and audio workflows in general, and this week just kind of shoved that point through the wall.

Riley: Mm-hmm. Audio used to be this extra garnish. Now it’s like, no, babe, I am the backbone. Transcription, voice agents, call analysis, localization, podcast indexing, content repurposing. The boring layer is suddenly very hot.

Hunter: Which is always where real workflow value shows up first. So let’s start with Covo-Audio. What’s interesting here is not just that it’s open and local-friendly. It’s that it’s aiming to handle live voice conversation end to end instead of stitching together speech recognition, then a language model, then text to speech, then praying the whole thing doesn’t lag or talk over the user in the dumbest possible way.

Riley: Or do that cursed thing where you interrupt it and it just keeps yapping like it’s trapped in a keynote demo.

Hunter: Right. The promise here is lower latency, better interruption handling, and more natural turn taking. That matters because inside real companies, the first thing that changes probably is not some magical all-purpose AI receptionist. It’s narrower stuff. Support triage. Lead qualification. Internal help desk. Maybe appointment handling.

Riley: I agree, but I wanna push on sales a little. Because support is the obvious answer, but sales teams love anything that smells like scale. If a voice agent can qualify leads after hours, route people, answer basic pricing questions, and tee up the rep with clean notes, they are going to try that yesterday.

Hunter: Fair. But I think the best sales use is pre-call and post-call, not full replacement. Have the system answer inbound, gather intent, summarize objections, tag urgency, update CRM notes, draft follow-up. That’s useful. Fully autonomous closer energy? Eh. That still feels like fancy demo territory for a lot of teams.

Riley: Yeah, because nobody wants to lose a good deal to a robot that confidently misunderstands “enterprise rollout” as “please read me the feature page in a soothing voice.”

Hunter: Exactly. Human in the loop still wins where nuance and stakes go up. But this is closer to useful than a lot of voice AI from the past year. The old pipeline approach always had that weird stitched feeling. You could hear the handoff between components. It was like three interns in a trench coat trying to be a call center.

Riley: Oh my god, yes. And that’s why people on X are so excited. It’s not just open source fandom. It’s that full duplex voice starts to feel less like pressing walkie-talkie buttons and more like an actual conversation. If that gets stable, creators and brands can build voice experiences that don’t instantly feel fake.

Hunter: And local deployment matters too. A lot of orgs want more control over privacy, cost, and lock-in. We’ve been saying that with open weights in general. If your workflow is business critical, you probably don’t want your whole stack hanging on one vendor roadmap.

Riley: But hold up, this is where people get cosplay-y. Not every company needs to self-host everything just because open source is cool and somebody in IT had a Kubernetes phase.

Hunter: Absolutely. “Run it ourselves” is strategic when voice data is sensitive, when cost at scale matters, when you need deep customization, or when uptime and governance need tighter control. If you’re a small team and you barely have clean documentation, self-hosting everything may just be a hobby with invoices.

Riley: Thank you. Some teams need a product, not a side quest. Open source is not automatically the right answer just because the vibes are good.

Hunter: Speaking of open source, Cohere’s speech model is the other big signal. And honestly, I think this one may change work faster than the flashy voice agents.

Riley: Same. Better transcription is not sexy, but it is ridiculously leveraged. It touches meetings, podcasts, webinars, support calls, interviews, research, social clips, multilingual teams. It’s the layer under the layer.

Hunter: Yep. If transcription gets cheaper, faster, and more accurate in open form, startups and internal teams can build on top of it immediately. Meeting capture. Customer call analysis. Searchable media archives. Automatic clipping suggestions. Podcast indexing. Content repurposing pipelines. That’s not hypothetical. That’s this quarter stuff.

Riley: And this is where humans still matter a ton. Not in typing the transcript, obviously. In deciding what is worth turning into content. In spotting the one sentence that becomes the campaign hook. In editing around tone, context, risk, and all the weirdly confident mistakes models still make.

Hunter: Taste, judgment, and prioritization. Same story we’ve had across image and video too. The machine can help produce the raw material. Humans decide what deserves to survive.

Riley: Also, multilingual ops. We do not talk enough about how useful this is outside Silicon Valley English brain. If an open speech model gets good across languages, that’s huge for global teams trying to localize faster without rebuilding the workflow every time.

Hunter: That’s a great point. Open models matter a lot there because teams can adapt workflows to their own language needs instead of waiting for a closed vendor to maybe care later.

Riley: Okay, now let’s get into NotebookLM because this one is very funny to me. It started as, here is your smart research notebook. And now it’s like, surprise, I made the explainer video too.

Hunter: Yeah. If it can reliably turn notes or source docs into narrated videos with visuals, that compresses a very real workflow. Normally you’d go from brief to script, then voiceover, then slides or visuals, then edit, then publish. If one tool can take a big chunk of that, lean teams are going to pay attention.

Riley: Especially internal teams. Sales enablement, onboarding, training, product explainers, investor update recaps, knowledge base walkthroughs. Not everything needs cinema. Sometimes you just need a clear video by lunch.

Hunter: Exactly. But this is where workflow compression can also become polished nonsense at scale. Just because the tool can make a video does not mean the video should exist.

Riley: Say it louder. We are entering an era of beautifully narrated slop. Like, wow, the transitions are smooth, the voice is crisp, and the actual point is nowhere to be found.

Hunter: Which means organizations need a checkpoint before publish. Not a complicated one. Just a human asking: Is this accurate? Is this on brand? Is the narrative actually useful? Are we saying something real, or did the machine just average our notes into beige competence?

Riley: Beige competence is the villain of so much AI content.

Hunter: It really is.

Riley: Also, NotebookLM moving into video is a reminder that these tools are becoming workflow surfaces, not single-purpose apps. Research, synthesis, narration, asset generation, output. The more that happens in one place, the more powerful it gets and the easier it is to accidentally skip judgment.

Hunter: That lines up with a lot of what we’ve been talking about this week. The model race is noisy, leak chatter is noisy, every launch thread is noisy. But the real thing to watch is whether workflows collapse in a useful way. Does it remove friction? Does it create leverage? Does it keep humans near the decisions that matter?

Riley: And can your team actually trust it. Because the realistic path forward on all this agentic stuff is not, cool, let the machine run the company. It’s scoped autonomy. Let it handle constrained tasks. Put guardrails around actions. Keep audit trails. Keep review loops. Stop pretending risk disappears if you call it innovation.

Hunter: Well said. And since people keep asking what changes organizations first over the next year-ish, my bet is still transcription. Then voice agents in narrow lanes. Then AI-generated video for education and enablement. Local agent frameworks are interesting, but for most teams they’re still a little ahead of operations.

Riley: I’m with you, mostly. I think transcription wins first because it’s invisible and insanely useful. Voice agents come next where there’s clear ROI. Video generation lands fast for teams that already have good source material. And local autonomous agent dreams, including all the OpenClaw-style energy, still need the boring grown-up stuff like permissions, governance, and actual workflow design.

Hunter: Hmmm, that makes sense.

Riley: Thank you, Hunt.

Hunter: Don’t get used to it.

Riley: Too late.

Hunter: So the takeaway for creators and marketers is pretty simple. Open voice is getting more natural. Open transcription is becoming a serious infrastructure layer. And research tools are mutating into production tools. The move is not to chase every shiny demo. It’s to build model-agnostic systems, keep humans at the points of taste and risk, and automate the grind around the work.

Riley: Yeah. Let the machine do the setup, the sorting, the summarizing, the first pass, the boring middle. Let people do the deciding, the shaping, the weird creative leap, and the, um, maybe we should not publish this as is.

Hunter: That’s our show. Thanks for hanging with us on this Saturday, March twenty eighth, aka Something on a Stick Day, which honestly feels like a good reminder not to ship every AI feature just because you managed to skewer it onto your stack.

Riley: Please. For the love of workflows.

Hunter: Go check out COEY.com slash resources for AI news and updates.

Riley: And subscribe so you don’t miss the next one.

Hunter: Thanks for listening.

Riley: Catch you next time.

Most Recent Episodes
  • Open Voice, Multi Shot, and Google’s AI Music Push
    04/01/2026
  • Open Qwen, Closed Loop: Multimodal Gets Real
    03/31/2026
  • OpenClaw or Open Chaos? The Open Source Agent Reality
    03/30/2026
  • Gemini Flash Live and the Great AI Workflow Reality Check
    03/29/2026