Google is a master conjurer of what’s innovative and in the recent developer’s conference, it showed its mettle with Google Duplex, a new technology that can conduct “natural” conversation over the phone through Google Assistant. It is designed to conduct mundane tasks like setting up appointments and inquiring about prices – something that it did too well causing an uncomfortable stir since it sounded so human. According to the Verge, this personal assistant will now have a built-in disclosure identifying itself as AI before engaging in a conversation with a human. Using WaveNet, an audio-generating technology from DeepMind, Duplex also uses advances in language processing to understand and generate natural speech. In contrast to what we have gotten accustomed to, the conversation is not stilted and it does need adjusting to.
But before we jump the gun on Google and declare that Duplex will grab all Front desk jobs in the future, note that its expertise is limited to small real-world tasks that it needs to be deeply trained on. For now, Duplex can carry on limited talks convincingly but is not suitable for lengthy conversations. Duplex, a fully automated system can initiate calls and receive them in a variety of voices. Incredible as it sounds, the voice you hear is computer generated even if the accent, context, syntax, and pauses are humanlike. The audio files below where an appointment to a hair salon and restaurant are made are from the Google Blog. To hear is to believe:
“Google Duplex: An AI System for Accomplishing Real-World Tasks Over the Phone” — Yaniv Leviathan, Principal Engineer & Yossi Matias, Vice President, Engineering, Google
Longer conversations between someone who is not too familiar with the booking system in the salon or the menu in the restaurant are challenging, if at all feasible. Natural sounding syntax, intonation, and meaningful pauses are extremely difficult if the level of familiarity is low. These are deemed complex conversations and while it may sound “human-like”, the contextual responses or nuances are not up to par.
According to Google Blog, they have yet to fully master interruptions, elaborations, synchs, and pauses but it is relying on advances in Google’s automatic speech recognition (ASR) technology, the recurrent neural network (RNN) and TensorFlow Extended (TFX) to improve “understanding, interacting, timing, and speaking”. Meaningful conversation is a result of a sequence of processes:
- The ASR processes incoming sound.
- The text that is produced is run against the context and other inputs.
- The response text is created.
- The TTS (text to speech) system reads the response aloud.
According to Google, this will greatly help businesses because information and the appropriate reply is available 24/7. There will also be “downtime”, something that can be considerable and expensive when training and nesting Frontliners. From the user-end, you can book, search and get information asynchronously, effortlessly, and in the background. How soon can Google’s deep-learning and AI get this mainstreamed and threaten customer service jobs globally? Hopefully, it can make life easier but not smarter than humans.