Today any child can play with a mobile phone or a tablet that shows in real time a distorted version of her own face or one that has been converted into an animated cartoon but that follows the expressions of her face and the movement of her lips with amazing precision. On the Internet many apps have proliferated like Face Swap, which exchanges the faces of users, or FaceApp, which retouches, rejuvenates or ages one’s face instantly. And big Hollywood productions have shown us how dead actors or actresses can come back to life, like Carrie Fisher in the Star Wars saga.
All this is part of the new generation of audiovisual editing tools, which take advantage of advances in Artificial Intelligence (AI) and achieve incredible levels of realism in digital manipulation. But beyond their recreational use, these technologies are taking the concern over fake news to a whole new level, one that challenges the credibility that is traditionally granted to a video clip. In December 2017, the website Motherboard reported that a Reddit user with the alias Deepfakes had inserted the faces of actresses like Gal Gadot or Scarlett Johansson into pornographic videos.
The creation of these demeaning hoaxes was based on open source deep-learning software such as Google’s TensorFlow. But this year’s launch of the FakeApp application has made the creation of these “deepfakes” available to anyone with a home computer. However, despite their amazing results, these homemade creations have not yet reached perfection in the movement of the face or the naturalness of the expressions.
Manage the expressions of Trump
Another different situation is that of the algorithms that are seeing the light in computer science laboratories. In this case, the results can deceive even the sharpest of eyes. In 2016 a team led by Matthias Niessner, from the Technical University of Munich (Germany), published the results of its Face2Face tool, which captures live facial expressions of a model to be transplanted in real time to another person’s face in a recorded video. The researchers managed to direct with their own faces the expressions of George W. Bush, Vladimir Putin or Donald Trump as if they were digital puppets and with an impressive level of realism. “Our resulting synthesized model is so close to the input that it is hard to distinguish between the synthesized and the real face,” Niessner tells OpenMind.
Face2Face captures facial expressions to be transplanted to another person’s face. Credit: Matthias Niessner
Equally spectacular are the results achieved in 2017 by a team from the Paul G. Allen School of Computer Science & Engineering at the University of Washington. The neural network they developed analyses hours of video of a person to learn their vocalization gestures. Next, the system starts with an audio clip to generate a video in which the person appears, synchronizing the movement of their lips with the words. In the sample video it is practically impossible to guess that the image of Barack Obama delivering a speech is a digital creation.
According to study co-author Steve Seitz, the system could be used to generate realistic videos that avoid the large bandwidth consumption in current videoconferences. Co-author Ira Kemelmacher-Shlizerman adds another possible more recreational use: “being able to hold a conversation with a historical figure in virtual reality by creating visuals just from audio.” However, Seitz clarifies that his purpose has not been to distort reality: “We very consciously decided against going down the path of putting other people’s words into someone’s mouth.”
Recreate voices digitally
But if someone wants to create malicious fake audio or video, the truth is that they can find the necessary tools. In 2016, the firm Adobe introduced VoCo, a sound-editing platform—not yet released to the market—that with just 20 minutes of listening learns to simulate the voice of any person, so that it is possible to put any phrase into their voice simply by typing it. Since then, other similar and increasingly powerful tools have appeared, such as that from the Canadian AI startup Lyrebird, which learns with just one minute of audio and allows any Internet user to digitally recreate their own voice. Recently, researchers from the Chinese search engine Baidu have published the results of a neural network that clones a person’s voice from just a few seconds of material.
Lyrebird created a digital copy of Barack Obama’s voice. Credit: Lyrebird
Since Adobe launched the first version of Photoshop in 1990, retouching photos for malicious or fraudulent purposes has become a concern. Nowadays, the new AI-based systems are improving at such a meteoric pace that it is becoming increasingly difficult to distinguish the authentic from the fake in audio and video documents as well.
Faced with this concern, the responses from the inventors of these tools have been varied. Lyrebird, for example, sums up its message that its intentions are honest and that public access to his product will prevent misuse: “We are making the technology available to anyone and we are introducing it incrementally so that society can adapt to it, leverage its positive aspects for good, while preventing potentially negative applications,” they say on their website. For its part, in the presentation of VoCo, a spokesperson for Adobe said that the company was working on a system in the style of watermarks to ensure that fraudulent audio clips are detectable.
Detection of fake videos
The truth is that progress in audiovisual falsification technology will simultaneously lead to the development of better techniques to uncover fraud. Seitz and his collaborators note that their system could also be adapted to the detection of fake videos. For his part, Niessner explains that part of his job is to manufacture the antidote at the same time as the poison: “Our efforts include the detection of edits in video footage in order to verify a clip’s authenticity.”
The researcher explains that facial expressions and their transitions are as unique in each person as their calligraphy, and that the analysis of the traits of one’s facial system allows a suspicious video to be compared with another authentic one of the same person to detect possible inconsistencies that reveal manipulation. At the end of the day, as Niessner suggests, computer-based audiovisual recreation has been with us for decades, and the possible perversion of its ends should not obscure the promise of these technologies. “We hope to provide a positive takeaway,” he concludes.