Voice Recognition Technology Challenges in 2020, Possibilities for the Future

Many challenges faced the voice recognition industry in 2020. We look to 2021 and beyond for improved VUI in all sectors.

It’s been a challenging year, to say the least, for all of us globally. COVID-19 upended our lives and radically changed the way we work, communicate, and socialize. These changes caused a drastic demand for voice technology services that could provide clear, intelligible voice recognition in less than ideal environments. As we continue to struggle with the pandemic, we also struggle to achieve optimum voice technology to meet work, educational, and social needs.

This article looks at how the increased demand for voice recognition devices has highlighted voice technology challenges that already existed and what is needed to overcome these problems. We also look to the future and explore the possibilities for advances in voice technology in 2021 and beyond.

Audio / Video Conferencing with Background Noise

The pandemic and ensuing stay at home orders heightened the already existing difficulties that faced consumers with voice user interface (VUI) devices.

Parents working at home on Zoom calls while their children attempt to learn using separate audio conferencing devices, all trying to speak and be understood, could be named by many as the number one voice tech challenge of 2020.

Whether from the same household or surrounding environments, the background din of multiple speakers and noise impedes the ability to communicate during video or audio conferencing, or while in a car, on your cell phone, or talking to your digital voice assistant.

Accurate voice recognition and voice enhancement technology are necessary to provide a reliable voice user interaction experience. The companies that make voice-enabled devices and integrate VUI technology that complements their existing technology will gain an advantage in all industries that employ voice technology.

Speech Recognition and Voice Assistant Devices

Although the adoption of voice assistants surged after the pandemic wreaked havoc, user frustration has always been a problem, especially with digital assistants on smartphones. In a study conducted by PricewaterhouseCooper, 62% of respondents surveyed expressed frustration with the lack of understanding, reliability, and accuracy.

However, children may encounter the most difficulties when using speech and voice recognition technology, especially in the home learning environment.

Speech recognition devices were not designed with children in mind. The voice, language, and often erratic behavior of children are much more complicated than adults. The variables of children’s speech patterns, language structure, and voice pitch (which changes dramatically with age), not to mention syntax, grammar, and pronunciation, need to be taken into account by voice recognition devices. While an adult can modify requests by speaking more clearly, and changing tone and wording, children, especially younger ones, more often than not receive an error message or incorrect response from a digital voice assistant.

Coupling these challenges with the added problem of background noise while schooling from home, children will give up their attempts to communicate with a voice-enabled device. Even worse, a child who is told they are wrong when they are right by a machine that did not understand the intended message can damage confidence. The opposite can be just as harmful. Giving a false positive, when a child is told the wrong answer is correct, can risk socio-emotional harm.

The challenge for voice user interface designers is developing voice recognition technology to learn and adapt to the way children speak.

Lack of Trust and Privacy Issues

The pandemic caused a surge in online shopping in 2020, and this growth is expected to continue well into the future. Retailers have seen a 30% to 40% increase in eCommerce sales since March. However, lack of trust is a significant inhibitor to further growth in online shopping. According to PwC, one out of four consumers says they would not consider using a voice assistant now or in the future to shop. And 46% surveyed said they don't trust their voice assistant to process their order correctly. Distrust in using a voice assistant to pay online also deters people from using these devices.

Privacy issues are also a determining factor in the adoption of voice-enabled devices. While some teachers value the benefits of using VUI devices in the classroom, many school districts refuse to implement voice technology due to concerns about Children’s Online Privacy Protection Act compliance.

Privacy issues also face voice tech in other sectors, such as keeping data secure in banking and finance or simply keeping certain information from ears not meant to hear.

Voice technology companies need to address these concerns to advance further in these markets. VUI design that provides precise acquisition of speech and moderates the stream of information processed by voice recognition systems can help.

Touchless Screens

The coronavirus has substantially impacted the awareness of the things we touch in our everyday lives, including screens. From the grocery store to bank ATMs and airport kiosks to elevator buttons, hygiene comes to the forefront.

While some of these areas implement voice-control technologies, many need to catch up with the times. And the ones that already employ voice recognition and control may have limited capabilities in noisy environments.

“The future is clear and simple. No more buttons around you. Remote controls, keyboards, light switches, touch screens, all will be history.

VUI has fallen victim to poor reliability of speech-to-text technologies, those components that are responsible for decoding the pronounced instructions. Voice enhancement technology that reduces background noise and, at the same time, provides clear speech recognition is necessary for all industries that provide interactive screens to the public.

The Future of Voice Recognition Technology

While voice enhancement technology is critical, especially during this pandemic, to using platforms like Zoom, dictating to digital assistants, or using online speech transcription services, there are many other areas where voice recognition is already experiencing vast improvements.

A Human-Centric Approach

What may address many challenges in voice user interface and speech recognition capabilities is designing the technology to become more human-centric.

At the highest level, the interfaces should become less strict or less ‘machine-driven’ and instead become human-centric, so humans can interact with machines naturally, without using a strict unwavering linguistic law.

This human-centric approach could solve the problems of children interacting with voice recognition devices. Additionally, companies like Google and Amazon are developing deeper conversational skills and technology to discern people’s emotions in their voices. This type of technology could also solve the problem of unexpected variables in voice recognition.

No More Buttons

“The future is clear and simple,” says Alon Slapak, Kardome’s co-founder and director of research and development. “No more buttons around you. Remote controls, keyboards, light switches, touch screens, all will be history. Look at your smartphone and recall the buttons and the keyboards you were using just a decade ago. Your touch will be bestowed to your loved ones.”

Eliminating switches, stalks, buttons, and touch screens, which are more expensive to produce than modern MEMS microphones, is undoubtedly a cost-effective advancement in voice technology that can have a beneficial impact on many private and public business sectors.

Machine Learning and Artificial Intelligence

Machine learning, Artificial Intelligence (AI), and the data that feeds AI are vital factors that will drive improvements in voice recognition.

Machine learning is the lynchpin to voice technology and the ever-growing data that provides AI, making it, and the machines that employ AI, smarter. AI in voice is built to learn from experiences, identify trends, and provide answers.

In a recent Voice Talks episode, Leslie Pound, CEO of Tada Labs, predicted that “voice connected to real-query data” is the future in voice technology.

“Will see more connection to data,” Pound said. “Data is doubling every year. Data is coming from our lights, our phone, our cars. We have this entire infrastructure of data and databases, and we will see people integrate more and more with that infrastructure.”

Individualized Experiences, Including Speaker Verification

We will also see more personalized interaction with voice recognition devices. You can already customize digital voice assistants, like Google Home, to respond to your voice only and read off a list of pre-ordained items, such as news, the weather, your schedule, and hand-picked podcasts based on set voice activations.

Amazon’s Alexa can personalize responses for everyone in a home. Alexa’s voice recognition capabilities become smarter over time, making the preciseness of personalized answers more accurate.

The increasing amount of voice recognition skills — which jumped from 10,000 to more than 100,000 in only three years — will continue to expand personalization possibilities.

Proactive Voice Assistants

The next age in voice recognition and personalization is voice assistants’ ability to predict what you might want. In a demo of Alexa Conversations by Rohit Prasad, Alexa’s head scientist, Alexa helped plan a night out rather than waiting for a new request for every part of the evening. A user needs only to begin the conversation, such as asking to book movie tickets. Alexa takes over and follows up by asking if you want to book dinner reservations or call an Uber.

This ability to proactively engage with users requires hardware and software that allows the voice recognition device to listen to and log a vast amount of data from a user’s everyday life. Additionally, learning through billions of user interactions per week, Alexa knows what skills are commonly used together, allowing it to predict and package skills together in a recommendation intelligently.

Omnipresent Voice Recognition Integration

While it seems the future is already here, the growth of voice-enabled smart devices continues, from smart TVs, watches, speakers, car voice assistants, and more.

The auto industry is also ripe for further integration of voice recognition devices. Smart speakers, voice assistants, and voice-controlled navigation all provide a more effortless and safer driving experience. The Capgemini Research Institute expects the consumer use of voice assistants in cars to reach 95% by 2022.

The day will come soon when one can use their voice to open the window, start a car, turn on the air conditioning, while the smart car assistant identifies each speaker – driver or passenger - and his location in the vehicle, and provides personalized responses.

Such technology is underway with Kardome. Renault-Nissan-Mitsubishi’s (RNM's) Innovation Lab in Tel Aviv, Israel is currently evaluating Kardome’s smart audio solutions for automotive applications.

From Left: Alik Gorenshtein Data & AI Lead at Renault-Nissan-Mitsubishi Innovation Lab TLV, Kardome Director R&D Alon Slapak and CEO Dani Cherkassky. The VUI technology company is testing its smart audio solution with Renault-Nissan-Mitsubishi's Innovation Lab.

Voice-enabled smart televisions that work with virtual assistants will come more into play with the help of refined microphone arrays.

The gaming industry is ripe for voice technology integration. According to a survey by Adobe, 63% of smart speaker owners have one in their living rooms. This usage creates a significant opportunity for the gaming industry and voice tech worlds to build a voice-enabled experience for friends and family. Already companies are providing voice-controlled table-top gaming. In partnership with Doppio Games, Netflix developed a multiplayer voice-control game, “The 3% Challenge,” based on its popular sci-fi series, 3%. HBO, Lego, Pretzel Lab, and other companies have also developed voice-controlled games.

Pound of Tada Labs sees the expansion of voice recognition technology into several key areas:

Voice in meetings
Voice connected to real data
Voice for Business Intelligence
Voice in construction

Other areas predicted to integrate voice recognition on a more extensive basis are the healthcare and finance industries.

In Summary

The year 2020 has pushed the voice recognition technology industry to address and improve VUI in many sectors rapidly. However, many areas still need improvement. Background noise, multi-speaker environments, intelligently transcribing voice commands, and other issues dampen the voice interaction experience for many devices and users. The challenges that faced VUI developers this year will only serve to inspire the future of voice technology.

Learn about how Kardome can improve voice interaction experiences. Book a Demo

Voice Recognition Technology Challenges in 2020, Possibilities for the Future

Table of Contents

Audio / Video Conferencing with Background Noise

Speech Recognition and Voice Assistant Devices

Lack of Trust and Privacy Issues

Touchless Screens