Google Introduces Lifelike AI Experience With Google Duplex

At this year’s Google annual I/O developer conference, CEO Sundar Pichai unveiled a new technology called ‘Duplex’. This new technology enables the company’s Google Assistant to interact with humans by doing phone calls in real-time.

 

What is Google Duplex?

 

Not only that, with Duplex, Google Assistant could also book a hair appointment and reserve a table for you at your favorite restaurant, among other things.

 

Pichai said that Duplex is one of the many tools the search giant could make it easier than ever for you to interact with your smart devices. Duplex is actually another addition that would be released alongside with the latest Android mobile operating system, Android P.

 

On stage at the I/O Conference, Pichai demonstrated the Duplex technology which was held in Mountain View, California. The event started last Tuesday and ran through Thursday. In a demo, Google Assistant dials up a local hair salon to schedule an appointment.

 

Pichai pointed out that the demo was a real call using Google Assistant. “The amazing thing is that Assistant can actually understand the nuances of conversation,” he said. “We’ve been working on this technology for many years. It’s called Google Duplex.”

 

Then the Google’s chief executive said Duplex is still under development. The search giant plans to conduct early testing of Duplex inside Assistant this summer “to help users make restaurant reservations, schedule hair salon appointments, and get holiday hours over the phone.”

Google Duplex: Incredibly Human-like and Smart Enough to Take Your Job

Google is a master conjurer of what’s innovative and in the recent developer’s conference, it showed its mettle with Google Duplex, a new technology that can conduct “natural” conversation over the phone through Google Assistant. It is designed to conduct mundane tasks like setting up appointments and inquiring about prices – something that it did too well causing an uncomfortable stir since it sounded so human. According to the Verge, this personal assistant will now have a built-in disclosure identifying itself as AI before engaging in a conversation with a human. Using WaveNet, an audio-generating technology from DeepMind, Duplex also uses advances in language processing to understand and generate natural speech. In contrast to what we have gotten accustomed to, the conversation is not stilted and it does need adjusting to.

But before we jump the gun on Google and declare that Duplex will grab all Front desk jobs in the future, note that its expertise is limited to small real-world tasks that it needs to be deeply trained on. For now, Duplex can carry on limited talks convincingly but is not suitable for lengthy conversations. Duplex, a fully automated system can initiate calls and receive them in a variety of voices. Incredible as it sounds, the voice you hear is computer generated even if the accent, context, syntax, and pauses are humanlike. The audio files below where an appointment to a hair salon and restaurant are made are from the Google Blog. To hear is to believe:

“Google Duplex: An AI System for Accomplishing Real-World Tasks Over the Phone” — Yaniv Leviathan, Principal Engineer & Yossi Matias, Vice President, Engineering, Google

Longer conversations between someone who is not too familiar with the booking system in the salon or the menu in the restaurant are challenging, if at all feasible. Natural sounding syntax, intonation, and meaningful pauses are extremely difficult if the level of familiarity is low. These are deemed complex conversations and while it may sound “human-like”, the contextual responses or nuances are not up to par.

According to Google Blog, they have yet to fully master interruptions, elaborations, synchs, and pauses but it is relying on advances in Google’s automatic speech recognition (ASR) technology, the recurrent neural network (RNN) and TensorFlow Extended (TFX) to improve “understanding, interacting, timing, and speaking”. Meaningful conversation is a result of a sequence of processes:

  1. The ASR processes incoming sound.
  2. The text that is produced is run against the context and other inputs.
  3. The response text is created.
  4. The TTS (text to speech) system reads the response aloud.

According to Google, this will greatly help businesses because information and the appropriate reply is available 24/7. There will also be “downtime”, something that can be considerable and expensive when training and nesting Frontliners. From the user-end, you can book, search and get information asynchronously, effortlessly, and in the background. How soon can Google’s deep-learning and AI get this mainstreamed and threaten customer service jobs globally? Hopefully, it can make life easier but not smarter than humans.