Deep learning is revolutionizing natural language processing. The accuracy of today’s neural networks is an order of magnitude higher than what was delivered with the previous generation of algorithms based on phoneme analysis and probabilistic language modeling.
When communicating with a voice assistant, there is a big difference in having to repeat every other word and not having to repeat anything. Suddenly voice becomes usable. It just works.
As students of technology strategy, we love this kind of discontinuities. Discontinuities signal opportunities, an edge in the contest for attention and funding.
“Is it a platform or a feature?” is the next question that is usually asked. Platforms unlock network effects and signal truly enormous unicorn-worthy opportunities.
So, is voice a platform?
The big tech is certainly betting their billions that it is. The race is on to bootstrap voice ecosystems with pleasant-sounding names: Siri, Alexa, Cortana. We even had Samantha, an intelligent OS in Her, a movie that was made in 2013.
Wait, 2013? This was half a decade ago, which might as well be “forever.” After billions of dollars in investment and years of product development, instead of the promised soulmates many of us got glorified clock radios that we use to check the weather, set the alarm, play a song, or call a friend.
What is going on? This deserves some analysis.
To begin with, in their desire to be the platform, voice assistants seem to be promoting a somewhat unnatural interaction model.
In the real world, if we want an agent X to perform a task T, we rarely turn to an intermediary Y. The more critical is the task, the more important it is that we communicate directly. In the voice assistant world, on the other hand, a voice assistant must be in the middle of every interaction.
This fact is creating a slew of configuration, communication, security, and attribution challenges. Suddenly one needs two mobile apps and a website to connect a vacuum cleaner to a smart speaker.
This, of course, is just the tip of the iceberg. Complexity, or scope, is the real problem.
Some voice ecosystems now boast hundreds of thousands of skills, anything from buying groceries to rolling down car windows. Routing requests to these skills requires a voice assistant to recognize millions of user intents, a capability that is bordering on general intelligence. Training a neural net this big requires an enormous volume of annotated voice data.
This brings us to the big kahuna of issues, the end user privacy. There is only one source of voice data that can feed a network this big – the end users. If we accept that the voice is a platform, the lack of privacy on this platform is not a bug, it is a feature.
This is where it gets really exciting. Apparently, in their pursuit of voice as a platform, the big tech is painting itself in the corner and leaving a lot of room to the rest of us.
With voice apps, one can pick any two of the three -- scope, accuracy, and privacy -- but not all three. Choosing voice as the platform forces the platform vendors to pick the first two. It is a strategic choice.
But what if one were to pick the other two: accuracy and privacy. This can be achieved but at the expense of reducing the scope. In other words, one must accept the fact that voice is a feature and not a platform after all.
So here is my prediction for 2020: we are going to see many more “voice as a feature” startups, startups that help companies add accurate voice processing to their products and services while maintaining the end user privacy.