Apple's Siri Google Gemini revamp and the Future of Voice AI

13 January 2026

Last Year’s Voice AI Predictions

Zendesk publishes an annual CX Trends Report, and I was reviewing last year’s predictions to see how that looks in hindsight. One prediction that Zendesk went big on was voice AI — they talked extensively about bots and automated processes running off the back of spoken word conversations with customers.

The fourth prediction on their list was: “Voice AI gains ground as the preferred channel for complex issues.”

And I think you know where I’m going with this. I don’t feel like this has really come to pass.

Voice AI was meant to be the future - Apple's Siri revamp with Google Gemini

(This article is a tidied up transcription of this video so give that a watch if that suits you better just now!)

I was editing a video about the Zendesk CX Trends Report when something significant happened that made me want to share some thoughts immediately.

The Reality of Voice AI Today

There are some limited applications of voice AI, and specific businesses are doing it pretty well. But by and large, voice AI hasn’t become the preferred channel Zendesk predicted, and I think that’s for a range of reasons.

Personally, I find it very frustrating. I’m looking to get to a human ASAP if I’m talking to a simulated voice. Zendesk had a strong sense that people would be phoning in, talking to an AI agent rather than typing to one, and would be happy to have their query resolved by a synthesised voice rather than an actual human. But that hasn’t happened at all, really.

Why Voice AI Has Struggled

The key barrier to uptake of voice-led AI conversations has been simple: people really don’t like talking to these things. It’s quite frustrating, and we know it’s frustrating because we’ve all tried things like Siri and Alexa and found them to be… well, frustrating.

You know how we had voice assistants in our cars 10 years ago? But you never press the button, do you? You never actually want to say “navigate me to this place” because it’s annoying. You’d use your phone instead and tap it. You’d find a place on Google Maps rather than fish around through voice commands.

The iPhone is one of the most popular devices, and we all carry a phone of some sort with us all the time. Everyone has tried these voice assistants and found them frustrating. That’s the right way to describe it.

Apple’s Game-Changing Announcement

But today, Apple announced they’re revamping Siri using what they’re calling a “foundation layer” — a model from Google Gemini for on-device processing. My understanding is it’ll still have the handoff to ChatGPT if you’re asking something that requires that level of processing and thinking.

MKBHD on Siri

And this is going to make a big difference.

For those of us in the tech world who’ve tried first-generation smart watches and other early tech, we’re always evaluating: is this the future? We need to stay aware of what’s up and coming, but also what’s still in heavy use. That’s why you’ll see behind me various machines, various operating systems, and each one has all the browsers on it. We’ve got to be aware of the different ways people get in touch.

I’ve had really good conversations about alternative input methods — I’ve used different keyboards like Dasher (controlled by orientation rather than tapping), even the Morse code keyboard which just has one button. That’s a bad way to do things, but you can text people from your pocket. Pretty handy, and I haven’t been able to do that since I had a Nokia 30-something.

What Apple’s Changes Mean

So if Apple are integrating Google’s foundation layer into Siri, what does that mean?

For me, it means Siri is going to have better processing of natural language requests. At the moment, it feels like you’re bumping your head against the wall — like when you play old text adventure games, typing over and over: “get sword”, “pick up sword”, “put sword in bag”, trying to find the right syntax that works.

We’ve all got used to doing that with Siri and Alexa — “turn on thing”, “start thing”, “do thing”. It’s just really annoying.

But if they’re going to have this better layer of natural language processing from Google, that might be the thing that moves us into being happier to use voice-based interfaces.

The Privacy Problem

That said, voice interfaces are always going to be something we do in private. I was walking down the road yesterday behind someone having the loudest phone conversation. Dom Jolly really showed us back in the late 90s that we’re sick of people with big phones in public spaces.

If Apple are revamping the way Siri handles natural language, a lot of people will give it another go. Some might find it’s quite good, and that might give a bit of momentum towards conversational voice interactions with AI.

Would I bet the house on it? No. There are lots of situations where you don’t want to do that. It’s something I’d use in the car while driving, or maybe with those Ray-Ban Meta glasses when there’s no one around. Sometimes it’s going to feel weird and you’ll want to use a different interface.

By and large, these things are starting to be a bit less horrible to use, and people who try to be near the cutting edge are going to give it another shot. We’re going to see a change over the coming months.

But Voice AI Won’t Replace Humans

I don’t think voice AI is going to eliminate the need to have actual people on the phone. When I get in touch by phone, it’s because I’ve got two things happening:

I’ve got something that needs to be solved quickly. I don’t want to send an email and think “this could be a week before I hear back”. I want to get someone on the phone and sort it out.
I need to feel like someone knows this has been a real pain for me. If I’m calling up saying “hey, this thing isn’t working”, I want someone to go “oh God, no, that really should be working. I’m really sorry.”

That’s the main thing I get from calling support services, and you’re never going to get that from ChatGPT because it’s never going to feel right.

People who’ve used Claude know that when it apologises, it really frustrates. It fundamentally feels very insincere. “Oh, I’m sorry I deleted the wrong folder” — you’re not sorry, are you? You’re software. You’re not sorry in the same way that the washing machine isn’t sorry if it eats one of my socks. It’s just a thing that happened where a machine has been given a task and hasn’t completed it properly.

What We’ll See Over the Coming Months

So has this changed my view on voice AI? I think it will pick up a bit. Will it be the future? It’ll be in the future for a while.

But long term, I think when we’re having voice conversations, it’s always going to be because we want to hear empathy on the other end and we want to demonstrate that this is a priority issue for us.

If I have to talk to a bot, I’d rather do it with my thumbs. That’s simpler. And I think that’s probably true of most people.

The Real Goal: Better Automation

For me, voice AI is a bit of a red herring. It’s going to be a distraction from what we actually need to achieve: greater automation.

People need to be able to solve their own problems in a natural language way. “My thing hasn’t arrived” — then be able to identify “was it this order, was it that” and have really step-by-step process automation.

You can have a GPT layer on top of that to make it more natural language and conversational, but ultimately the process needs to be step by step by step. You can design that process through seeing how people interact, but it still needs to be a defined process rather than just some AI freestyling — we know that can go horribly wrong.

The Luxury of Human Service

The other side of it is the priority service, the luxury end of service. If you’re a top-tier customer, or if you’re trying to be a brand that differentiates itself in terms of quality proposition, then human people who can actually tell you “I’m really sorry. Let me personally as a human take an individual look and prioritise your precise situation” — that’s what matters.

“Oh, you know, other people have had this. Thanks so much for raising it — gives us a good opportunity to improve.” Having conversations like that is always going to be the premium option, whereas automation is always going to feel budget.

Think of the nicest restaurant you can imagine: someone comes over and talks you through what the ingredients are. Now think of the absolute budget eating option: there’s a screen, you tap it, and something pops out of a machine.

We’ve come to understand that human interaction is the luxury part of whatever your service or product is. Voice AI is always going to, by definition, feel like a cop-out.

Final Thoughts

But let’s see what happens. Good luck to Apple, I suppose — the world’s most cautious and wealthy organisation.

What do you think? If you disagree, or if you’re implementing loads of voice AI stuff for your clients, drop me a line. If you’re using a lot of services that have these things, tell me all about it. I want to try it out — try what you’re working on, try what you’re finding to be good.

I’d like to understand what works for different businesses and different people.