Will your next Best Friend be a BOT?

With 5G phones on the horizon, state-of-the-art voice technology is being positioned to become the next wave of AI-driven devices that will significantly change every aspect of our personal and professional lives. These disruptive devices will provide us on-demand, relevant suggestions and viable step-by-step pathways to solving complex problems. In fact the changes in store both technologically and behaviorally at home and the workplace will be so extreme that the laptop or mobile phone you are using to read this article are destined to become another glorified door stopper. Here’s what you can expect.

Googling as we know it today will change from a one line entry to an instantaneous voice exchange between you and a Bot. Rather than delivering a list of links, the output your Bot fires back will be specific to your questions. Downloading Apps on to your smartphone will no longer be necessary because your Bots AI will scour every major App available and connect you with the one that will best meet your needs. Not only will your Bot deliver optimal answers but also ask you insightful questions for you to consider. Username and Password requirements will no longer be necessary. Instead, the voice print you use to speak into a device will vet you seamlessly. HTML-driven web page searching will soon appear primitive and inefficient. Even the ‘water cooler’ chat among peers for pollenating ideas will become a distant memory.

Due to the FOMO effect, (Fear of Missing Out) I decided to attend the Conversational Interaction Conference in San Jose, California this year. The two-day, annual event held in early February provided me with valuable industry insights, updates on working applications, future blue prints, new industry buzz words, plus a list of key hurdles confronting voice technology developers today. Below is a brief synopsis of what I saw and heard at the conference. I have included links and capitalized key industry buzz words for easy identification. After reading this article and accessing the links provided, you should have gained sufficient intel/understanding to ask better questions on the current state and future of voice technology.

Voice Controlled Devices

Let us start with some familiar Voice Controlled Devices that are already in the marketplace such as Amazon’s Alexa, Apple’s Siri, Microsoft’s Cortana or Google’s Home. These ubiquitous smart devices have allowed us to seamlessly use our natural voice to make on-demand requests, such as to play a song, the news, the weather, or a lot more. For the most part these devices do a decent job of what is referred to in the industry as “Command and Control” applications. On the backend a user’s command such as “Alexa! What is the weather?” is converted from speech to text using Speech Recognition (SR) then passed on to a Natural Language Processor or (NLP), which invokes Deep Learning (DL) and Neural Network (NN) algorithms to analyze and prioritize keywords. Backed by powerful cloud-based computing power, multiple algorithms use these keyword inputs to generate a text response. This text is then converted to speech using Text-To-Speech (TTS) and delivered back to the device. This entire process takes less than half of a second, which is within the tolerance levels of a normal, two-way conversation between two humans.

Every time an individual wakes up a Voice Assistant Smart Device with their voice (i.e. Alexa! or Siri!), the commands uttered after are stored and scored. These unsuspecting devices automatically deliver your data to a Cloud-based, voice service platform, which appends your data to a Machine Learning training data set used to increase the algorithm’s overall accuracy. Ultimately these devices will mimic human-to-human conversational interactions similar to that of two friends chatting casually in the same room. A good visual of this concept is depicted in a movie trailer for 2001: A Space Odyssey where a talking Voice Assistant called HAL 9000 converses seamlessly with the protagonist.

To get a glimpse of how close we have come to HAL 9000, Aigo, a California-based startup, produced a video comparing its breakthrough technology with that of Alexa’s. (Be sure to view the entire video.) It reveals what one can expect from these devices, going forward. Also on a similar game-changing trajectory is BMW‘s Intelligent Personal Assistant, an advanced voice activated driving experience technology. With this voice package, BMW aims to eliminate dashboard touch-screens, …the outcome of which could inspire radically different auto interior designs, in particular, for autonomous vehicles.

These Voice Assistant Smart devices are part of a larger category called Voice User Interface or VUI’s. VUI’s include any smart device or app that relies on a user’s voice for input commands. These VUI’s could be life-size human-like Robots, Chatbots, Holographs, and more. Let’s take a closer look at who is doing what in this space with the following VUI’s.

Robots, Chatbots, Holographs

SoftBank Robotics created a friendly-looking robot called ‘Pepper’ that companies such as HSBC and Carrefour use to greet their customers. Like a dispatcher, “Pepper’ directs customers to the appropriate employee or department using the language of choice. In the event of a communication error, ‘Pepper’ has an integrated, iPad-size screen for optional click inputs. ‘Pepper’s’ disarming, child-like interactions dazzle audiences, especially when it successfully greets a customer by name. Similar to its Brethren, ‘Pepper’ becomes smarter by matching images of its visitors with their corresponding Contextual Utterances. ‘Pepper’ has also shown promise with young medical patients who tend to feel more comfortable chatting about their symptoms with a robot than with an adult.

What ‘Pepper’ does offline, Avatars do online and more. Avatars interact with individuals from a web page or App using voice, text, or both. Just as ringtones gave mobile phones their customized personality, Avatars can help set the tone for interactions by mimicking a celebrity’s voice/image or simply project a voice with an engaging accent. Sapientx, a San Francisco-based firm specializes in creative Avatar designs. Their creations help humanize the Chatbot experience blending both entertainment with genuine, positive interactions. They can also inject a visual/voice branding experience that consumers can identify with personally. The firm’s white paper expands upon how their interactive branding power can transcend generations in many positive ways.

A typical Chatbot experience requires either a click or utterance from the user. Depending upon the App, a Chatbot will reply via voice or text. Users communicate directly with the activated device (via voice or text) and receive near instantaneous responses, …just as though they were interacting directly with a department head.

Bank of America’s Chatbot or Virtual Financial Assistant is called Erica. With Erica, users can learn about their personal spending habits, receive suggestions on driving financial improvements, and learn about ancillary services available at the bank. This solution is ideal for up-selling and cross-selling sales campaigns that can leverage a client’s changing needs.

Workday.com, a cloud-based business service entity offers a Chatbot to help their employees stay connected. Their Chatbot improves efficiency, knowledge sharing, and collaboration. By integrating this platform in the workplace, Workday.com employees are regularly reminded of their fundamental core values, which in turn helps promote management’s engaging culture internally.

Another interesting example came from Murphy Oil, an oil and gas company located in Arkansas. Management hired Alan AI, Inc, a Texas-based enterprise mobile development firm, to design a conversational voice Chatbot for their field engineers to deploy while checking and maintaining the company’s equipment. This ‘supercharged’ Voice Assistant gives their engineers the ability to trigger required workflows and monitor equipment progress in real-time. Other Chatbot enterprise developers at the conference included Rulai and Grid Dynamics, which is pending an IPO.

So far the industries leveraging Chatbots include Entertainment, Healthcare, Financial Services, Cable/Telecommunications and Utilities. The bulk of applications, however, are for specific internal use such as HR, expense reports, employee directory assistance, etc. Essentially they address all the little nagging items employees endure on a day-to-day basis, (i.e. recording travel receipts). Internal applications enjoy greater success because they can leverage a company’s industry jargon and finite database to achieve higher levels of speech recognition accuracy.

If you are looking for some winner applications for internal use, consider reaching out to Oracle’s Conversational Design Team. They are the group behind Oracle’s Virtual Assistant. What caught my attention at the event was that Oracle’s deliberately focuses its resources to develop internal applications that deliver 95% or higher levels of voice response accuracy. This ‘high bar’ approach has given their team the bandwidth to address more challenging issues such as multiple requests in the same ‘Utterance’, …also referred to in the industry as ‘Intent Handling’.

Finally, Microsoft delivered an impressive keynote, which included a speech to text demo from the very same PowerPoint used to deliver the presentation. As the keynote speaker, Xuedong Huang, Technical Fellow and Chief Speech Scientist for Microsoft addressed the audience in his Scottish accent English, a Spanish text version of his words appeared in real-time below his slides. Being fluent in Spanish, I could personally verify that the sub-title translation was impressively authentic. This near-flawless, application could be easily paired up with any two languages from a list of 50 available; hence, a French spoken presentation could display Chinese sub titles or vice versa. This service is currently imbedded in Office 365 and costs about $1 per hour of speech.

As though this feat was yesterday’s achievement, Mr. Huang felt obligated to awe the audience with yet another example of achieving ‘Human Parity’. This time he used his own voice print and intonations to display an image of himself conversing natively in Japanese. According to a Japanese-speaking member in the audience, the Japanese delivery also sounded authentic.

In a successful attempt to leave a lasting impression with the audience, Mr. Huang played a video of Julie White, an actress and global motivational speaker. She used a life-size hologram of herself to deliver a speech in native Japanese to a Japanese audience in Japan, …all the while she remained at her home in San Diego, California. The implications from the potential uses of this breakthrough technology both good and devious were equally startling and open for ongoing debate.

Creating Your Own Chatbot? – Key Design Issues to Consider

Before you decide on launching your own Chatbot, it would be wise to align your lofty expectations with a dosage of reality. Voice technology is much harder than it may appear. It is its own worse enemy because incremental successes tend to propagate the need for more backend technology, which in turn unleashes more complexity, often at a geometric progression. Fortunately and as testimony to the recent Conversational Interactions Conference, the industry has met and exceeded many impressive breakthroughs. However, the battle to achieve ‘human parity’ across all platforms and applications on a sustainable basis continues. Here is what keeps developers up at night.

Hardware issues… Noisy Environments can pose serious issues for Ambient Voice Exchanges. Amazon’s Echo units currently include 5 unidirectional mics that can pick up surrounding voice commands from multiple angles. More may be needed… For Chatbot apps, however, the key issue is just the opposite. Back ground noises, such as traffic or machinery need to be eliminated. UmeVoice, Inc, a headset manufacturer, offers military grade, noise cancelation headsets and ear buds that help drown out surrounding noises, allowing users to provide voice inputs in practically any situation with excellent clarity including audible whispers.

Software challenges… The backend engines that seamlessly support Voice Controlled Devices from the Cloud depend upon the ongoing advancements in Natural Language Processing (NLP), Deep Learning (DL), Neural Learning (NL), Speech Recognition (SR), Text To Speech (TTS), and a slew more acronyms yet to be named, …plus all of their respective ancillary development tools! The process is never ending…

UI/UX Design protocols… Humanizing Bots requires a deep understanding of acceptable human behavior. UI/UX (User Interface / Use eXperience) designers deliberately include facial expressions to invite positive emotions from the user, especially in the event a voice exchange fails to meet expectations. In the industry this soft yet crucial issue is known as ‘Failing Gracefully’. To appreciate the importance of “Failing Gracefully”, imagine the mounting frustrations that one would experience if the other person not only delayed their response but also repeated the same question multiple times. Natural voice exchange tolerance hinges on less than half of a second per response. Any longer and a simple dialogue between two individuals risks appearing like two separate conversations, …the likes of which would harbor irreparable frustration, distrust, and confusion.

Even with the best hardware available, voice applications can fail due to poor conversational interaction protocols. For example, the gender voice or accent used to interact with a user may come across unappealing or unfriendly. In the case of Chatbots, replies may be too long, not relevant, potentially insulting, or even ‘creepy’, especially if the Bot were to divulge into any personal details unintentionally. Even a user can fill the feedback loop with poor data by inadvertently gaming the system, either by speaking unnaturally slow or using specific terms out of context.

Conversational interactions should be fun, entertaining, terse, non-invasive, and to the point. Bots should engage with users to learn more about them without appearing overburdening. Each exchange should build upon the previous one to help profile the user. The more the Bot knows about the user, the more likely its suggestions will resonate and the all encompassing trust factor increases. In short the key challenge with designing an application is to find the balance where the Bot can gracefully and gradually extract more intel from a user, while simultaneously integrating the aggregate data to help it provide the user with more relevant and timely suggestions. This fine line of being helpful while remaining invisible and accurate is an ongoing industry challenge, …and even more so, when a growing list of backend technology issues are considered simultaneously. It is at this very tenuous and yet exciting juncture where the industry stands today.

Some of the Industry’s Greatest Challenges

Perhaps the greatest challenge for Bots is the handling of instructions that change midstream. This event occurs when a user who asks for one thing suddenly changes his or her mind in the same utterance for something else. On the receiving end, the undoing of one for command for another can cause processing algorithms to breakdown. A reset routine to handle the new request is a possible work around, however the additional processing time could create an extended delay that would exceed the half of a second conversational interaction requirement.

…and if all of these challenges were not enough, developers rightfully complain about the need to code and maintain the same voice applications for each platform, IOS, Android, Siri, etc. Often than not, these platforms will modify their API’s (Application Programming Interface) without warnings, sending developers into a mad rush to fix each application!


Tom Kadala is a technology innovator and freelance writer on topics related to artificial intelligence and machine learning. He is also the founder of RagingFX.com, a first-of-its-kind Autonomous Company.

© 2020 Tom Kadala