Introduction
When I was a teenager, I used to enjoy watching a weekly television programme called “Lost in Space”. In this science fiction series, a robot having a male identity was used as one of the main characters. He spoke perfect English but his speech sounded monotonic and dull – devoid of inflexions, and variations of volume and tone. In short, sounding unnatural. This depiction of robots as sounding devoid of human emotions was not uncommon at that time – perhaps film producers felt the need to reinforce the differences between humans and machines. However, in a recent film called Her (released in 2014), the main character, named Theodore, uses an AI operating system that speaks conversational language. Theodore has assigned a female gender to his operating system and named her Samantha. She talks with a very sensitive, warm, human voice that that is indistinguishable from human. The machine interacts with Theodore on a very personal level and they develop an emotionally intimate bond. So much so that he eventually falls in love with Samantha – which is nothing more than a computerised voice.
This change in anthropomorphising machines on film is a good reflection of the way that AI designers are incorporating emotions in new applications – both in detecting and expressing human emotions. It was thought by some that simulating emotions might be one of the more challenging aspects of AI – believing that replicating and understanding the range of human emotions would be very difficult. Many current AI projects are proving otherwise [1,2]. For example, Google, at their May 2018 Duplex developer conference, has demonstrated an extension to its AI Helper programme. This extension can make phone calls in certain situations that sound convincingly like communication with a real human-being. The company have now acknowledged that it will be an AI program when the system is rolled out, rather than actual human voices.
In this article, I briefly describe the concept of emotions from both human and machine perspective, as well as review some applications and technologies that are used in replicating and understanding human emotions.
Human Emotions
Human emotions have a long evolutionary purpose for our survival as a species. They are either a reaction to an external stimulus, or a spontaneous expression of an internal thought process. Emotions like fear are often a reaction to an external stimulus, such as when we cross a busy road the fear of getting run-over causes our evolutionary survival mechanism to take effect. These are external causes that trigger the emotions inside our brain. However, emotions can be invoked as the result of an internal thought process. For example, If I managed to find a solution to a complicated mathematical differential equation, that could make me happy as a result of a feeling of personal satisfaction. It may be a purely introspective action with no external cause, but solving it still triggers emotions.
In the same way, AI designers could simulate this emotion from the machines internal logic. This could be the emotion of joy emanating from solving, for example, a differential equation. Furthermore, simulating emotions triggered from external stimuli like joy, sadness, surprise, disappointment, fear, and anger could be invoked with interactions through written language, sensors, and so on. Computational methods would then be required for the processing and expression of emotions that occurs with human interaction.
Machine emotions
One of the founding fathers of AI, Marvin Minsky [3], was once questioned about machine emotions and said: “The question is not whether intelligent machines can have any emotions, but whether machines can be intelligent without any emotions”. Indeed, without emotions we would not have survived as a species and our intelligence has improved as a result of our emotions. Furthermore, we cannot detach our emotions from the way in which we apply our intelligence. For example, a medical clinician may decide on medical grounds that the best treatment option for a very elderly hospital patient would be a surgical procedure. However, the clinician’s emotional empathy with the patient might override this view. Taking the age of the patient into account, he or she may decide that the emotional stress likely to be incurred by the patient is not worth the risk of the operation – and therefore, rule it out. Emotional intelligence, as well as technical knowledge, is used to decide the treatment options. Of course, machines could never feel emotions akin to us humans. Nevertheless, they could simulate emotions that enable them to interact with humans in more appropriate ways.
Ray Kurzweil explains in his book called “How to Create a Mind” [4], that in theory any neural process can be reproduced digitally in a computer. For example, sensory feelings like heat, feeling hot or cold, could be simulated from the environment if the machine is equipped with the appropriate sensors. However it does not always make sense to try to replicate everything a human being feels in a machine. For example, some physiological feelings like hunger, and tiredness, are feelings that alert us of the state of our body and are normally triggered by hormones and our digestive system. A distinction should be made about the differences between mobile robots and a disembodied computer. The later would have a range of emotions far more limited as it would not be able to physically interact with its environment as a robot would. The more sensory feedback a machine could receive, the wider the range of feelings and emotions it will be able to experience.
Replicating and Humanising AI speech
The ability to generate natural-sounding speech has long been a challenge for AI programs that transform text into spoken words. Artificial intelligence (AI) personal assistants such as Siri (Apples natural language understanding program for the iPhone), Alexa (Amazon’s virtual personal assistant), and Google Assistant (mentioned earlier) all use text-to-speech software to create a more convenient interface with their users. These systems work by forging together words and phrases from prerecorded files of one particular voice. Switching to a different voice—such as having Alexa sound like a boy—requires a new audio file containing every possible word the device might need to communicate with users.
However, humanizing the unique sound of an individual’s voice incorporating emotions is relatively new and beginning to make an impact. For example, a Canadian based company called Lyrebird [5], has created an AI system that learns to mimic a person’s voice by analyzing speech recordings and the corresponding text transcripts. Lyrebird’s software can, as they say “create the most realistic artificial voices in the world” —and mimic almost any voice. By listening at length to spoken audio, it can extrapolate to generate completely new sentences that include the different intonations and emotions of each voice. Lyrebird, like so many of voice recognition software, uses, artificial neural networks for learning voice recognition to transform bits of sound into speech.
Understanding human emotions using AI
In recent years, AI has improved significantly at detecting emotions in humans through voice, body language, facial expressions, and so on. For example, voice recognition AI software systems, are learning to detect human emotions through speech intonation, speech pauses, and so on, in much the same way that we detect changes in emotional moods of our loved ones, friends, or work colleagues. Recently, researchers [6] have developed a deep learning AI program that can tell whether a person is a criminal just by looking at their facial features with an accuracy rate of 90%. In 2016 Apple bought a start-up company that created software that can read facial expressions – called Emotient [7]. This could be used to make AI programs like, SIRI and Alexa, understand the moods of their owners. Another application of this software could be in retailing: with support from in store CCTV cameras, they could determine the customer thinking from their body language. For example, watching a customer return to the same item, or display a concentrated study might indicate strong interest triggering an approach from store assistants.
The Future of AI emotions
There are several potential benefits of using AI programs to detect human emotions. They don’t get paid, or tired and can operate 24 hours per day making consistent decisions. Furthermore, the view that human emotions are off limits to machines that work in logic no longer carries weight. In his book called Homo Deus, Yuval Noah Harari [8], asserts that humans are essentially a collection of biological algorithms shaped by millions of years of evolution. This means that non-organic algorithms could replicate and even surpass everything that organic algorithms can do in human beings. We can expect to hear more about emotional AI in the future.
Keith Darlington
References
- [1] https://mysteriousuniverse.org/2018/03/job-interviews-will-soon-be-conducted-by-this-emotion-reading-russian-robot/
- [2] http://www.iflscience.com/technology/emotional-ai-developed-team-russian-researchers/all/
- [3] Minsky, M.L.: The society of mind. Simon and Schuster, New York, N.Y. (1986)
- [4] Kurzweil, R. How to Create a Mind. Viking Penguin Press. 2012.
- [5] https://lyrebird.ai/
- [6] https://www.newscientist.com/article/2114900-concerns-as-face-recognition-tech-used-to-identify-criminals/
- [7] https://www.devicedaily.com/pin/apple-buys-a-startup-that-may-help-it-read-your-facial-expressions/
- [8] Harari, Y. N. Homo Deus: A brief history of tomorrow, Harvill Secker Publishers, 2015.
Comments on this publication