Deepfake Videos Are Getting Scary Good

By: John Donovan  | 
man removing face
If technology continues on its current trajectory, it will become impossible to detect AI-assisted deepfake videos. Plume Creative/Getty Images

At this stage of its development, something undeniably creepy still runs through deepfakes, a catch-all but misleading label for fake videos created and manipulated with the aid of artificial intelligence (AI) and deep, machine learning.

It's not just the weird, little-bit-off, not-quite-right videos produced by these increasingly sophisticated software programs. Although, yeah, they can be unsettling. And it's not just the ethical dilemma in altering original photos and videos, either. Though that's definitely poking a hornet's nest.


Mostly, it's the whole idea that we are rapidly closing in on a point where we simply may not be able to trust our own eyes. Is that photo a true depiction of its subject? Is that video? Does that face go with that body? Do those words go with that face?

Can that guy really dance like that?


The Computer Science Behind Deepfakes

Way back in late 2017, a Reddit user known as Deepfakes, according to Know Your Meme, unveiled some face-swapping pornographic videos — it's exactly as sad and lame as it sounds; someone's face, often a public figure, superimposed onto someone else's head — and the deepfakes frenzy began.

Shortly after, Deepfakes launched an app, FakeApp, and people jumped all over it. All sorts of memes from that and other programs — some funny, some just plain creepy, some worse — have been produced since. They include Nicolas Cage's face on Amy Adams's body (playing Lois Lane in a Superman movie) and a great BuzzFeed production featuring comedian Jordan Peele, as former President Barack Obama, who warns of some of the more possibly sinister uses of the tech in a slightly NSFW video (which ends with the Fauxbama saying, "Stay woke, bi$%*es!").


The latest deepfake video came courtesy of a TikToker impersonating Tom Cruise. Three videos are shockingly real and show Cruise, among other things, hitting a golf ball. The videos were created by Chris Ume, a visual effects specialist from Belgium.

The computer science used to create the programs behind these videos can be extremely complex, much more intense than what is used for simple deepfakes. Intricate algorithms and computer science terms like generative adversarial networks (GAN) and deep neural networks pepper the academic papers of the more advanced video-editing techniques.

Generally, what these programs do is examine the video of a subject frame by frame and "learn" the subject's size and shape and movements so that they can be transferred to another subject on video. Whereas deepfakes have been limited mainly to swapping out the subjects' faces, the more advanced programs can transfer full 3D head positions, including things like a head tilt or a raised eyebrow or a set of pursed lips. Some work has been done on entire body movements.

The more these programs detect, the more variables that these networks are fed and "learn," the more efficient, effective and realistic the videos become.


Beyond Deepfakes

It's important to note that not all video and photo editing techniques based in artificial intelligence and machine learning are deepfakes. Those in academics who work in the field see deepfakes as amateurish, relegated to mere face-swapping.

A group at the University of California Berkeley is working on a technique that takes an entire body in motion — a professional dancer — and swaps it onto an amateur's body on video. With a little AI wizardry, then, even someone with two left feet can at least appear to move like Baryshnikov. The Berkeley group detail its work in the paper, Everybody Dance Now.


The technique is not perfect, of course. But this is tricky stuff. Even pulling off a computer-generated moving face is difficult. As of now, most AI-generated faces, even on deepfakes — especially on deepfakes — are obvious forgeries. Something, almost invariably, seems a little off.

"I think that one thing is the shadow details of the faces," says Tinghui Zhou, a grad student in computer science at Berkeley and one of the authors of Everybody Dance Now. "We [humans] are very good at identifying whether a face is real or not — the shadow details, how the wrinkles move, how the eyes move — all those kind of details need to be exactly right. I think the machine-learning system these days is still not able to capture all those details."

Another new AI video-manipulation system — or, as its architects call it, a "photo-realistic re-animation of portrait videos" — actually uses one "source" actor that can alter the face on a "target" actor.

You, the "source" (for example), move your mouth a certain way, computers map the movement, feed it into the learning program and the program translates it to a video in which Obama mouths your words. You laugh, or raise your eyebrow, and Obama does, too.

A paper on that process, known as Deep Video Portraits, was presented at a computer graphics and interactive techniques conference in Vancouver in mid-August 2018, and reveals a place for the program: Hollywood.

"[C]omputer-generated videos have been an integral part of feature-film movies for over 30 years. Virtually every high-end movie production contains a significant percentage of computer-generated imagery, or CGI, from Lord of the Rings to Benjamin Button," the authors write. "These results are hard to distinguish from reality and it often goes unnoticed that this content is not real ... but the process was time-consuming and required domain experts. The production of even a short synthetic video clip costs millions in budget and multiple months of work, even for professionally trained artists, since they have to manually create and animate vast amounts of 3D content."

Thanks to AI, we can now produce the same imagery in a lot less time. And cheaper. And — if not now, soon — just as convincingly.


Walking an Ethical Tightrope

The process of manipulating existing video, or creating a new video with false images, as comedian Peele and others warn, can be downright dangerous in the wrong hands. Some prominent actresses and entertainers had their faces stolen and weaved into porn videos in the most disturbing early examples of deepfakes. Using images to, as Peele warned with his Obama video, produce "fake news" is a very real possibility.

Many outlets already have taken steps to stop deepfakes. Reddit, in fact, shut down the subReddit deepfakes. Pornhub vows to ban AI-generated porn. Tumblr and Twitter are among other sites to ban pornographic deepfakes.


But these videos might not be particularly easy to police, especially as the programs to create them improve. Michael Zollhöfer, a computer science professor at Stanford and one of the minds behind Deep Video Portraits, says those in the academic community are aware of the ethics involved. From Zollhöfer, in a press release announcing his project:

The media industry has been touching up photos with photo-editing software for many years, meaning most of us have learned to take what we see in photos with a pinch of salt. With ever improving video editing technology, we must also start being more critical about the video content we consume every day, especially if there is no proof of origin.

Everyone involved in building this technology, Zhou says, needs to take proper steps to ensure it's not misused. Developing software to detect computer-enhanced or altered videos, and marking the video with invisible "watermarks" to show, under forensic evaluation, that they're computer-generated, will help. Again, from Deep Video Portraits:

It is important to note that the detailed research and understanding of the algorithms and principles behind state-of-the-art video editing tools, as we conduct it, is also the key to develop technologies which enable the detection of their use ... The methods to detect video manipulations and the methods to perform video editing rest on very similar principles.

Says Zhou: "I think we, as researchers, we definitely have a responsibility to sort of raise public awareness in terms of the abuse of these technologies. But I want to emphasize: There are many positive uses of this research. We've had requests from dancers to use our research to help their dancing. There are positive aspects of this technology."


What Lies Ahead

The field continues to improve as programs become more sophisticated and machines better learn how to overcome the obvious and less-obvious faults in these computer-generated videos and photos. Where it can go is anybody's guess. But many worry improvements in the technology could come so far so fast, we could be entering an era in which we can no longer trust what we see and hear.

And that brings us to another type of fake video to be that could also cause major trouble, especially for the upcoming 2020 presidential election: dumbfakes. In May 2019, for example, a distorted video of Speaker Nancy Pelosi spread like wildfire on social media. The video appeared to show Pelosi slur and stumble through a speech. In reality, the video was digitally altered by a sports blogger and "Trump superfan" from New York, who then uploaded it Facebook. The video was quickly debunked, but by then it was already viewed millions of times. YouTube removed it saying the video violated its standards. However, Facebook kept it on the site, only saying the video was "false," and that it would try to limit how much it could be shared.


While this altered video of Pelosi isn't a technical as a deepfake, Hany Farid, a digital forensics expert at the University of California, Berkeley, told NPR that's what makes it even more concerning. These are labeled dumbfakes because they're easier and cheaper to producer than deepfakes. Typically it requires changing the speed of the video or basic editing to produce a persuasive new video.

"The clock is ticking," Farid told NPR. "The Nancy Pelosi video was a canary in a coal mine."


Originally Published: Sep 5, 2018