DeepFakes Detection for Audio and Video

Artificial intelligence (AI) offers many opportunities such as improved health care, more efficient energy consumption or more durable products. However, AI also involves new risks. "Deepfakes" is an important buzzword. It is reminiscent of "fake news": Consciously false text messages on social networks to falsify public opinion formation. "Deepfakes," on the other hand, mean deceptively real-looking video and audio manipulations that can only be made with AI. The risks and challenges that "Deepfakes" entail are significant - not only for the media landscape, but also for companies and individuals. At the same time, AI also offers the tools to reliably expose "deepfakes."

AI-based systems can learn to imitate a voice or body language.

In the past, making high-quality manipulations of video or audio material was almost impossible. Due to the dynamic content, the challenge is to consistently falsify at least 16,000 data points per second. AI methods can now master this almost playfully. The necessary software can be found as open source software freely available on the net, convincing manipulations can be created automatically.

How exactly does this work? Systems are also learned from the net with "Deepfake" creation, as with comparable machine learning models. Architectures such as Tacotron and Wav2Lip (Sources: Wang, Yuxuan, et al.Shen, Jonathan, et al.Prajwal, K. R., et al.) make it possible to construct neural networks that connect any set of a target person with the appropriate facial expression and the typical speech melody. It is precisely these neural networks that the term "Deep" in the English word combination "Deepfakes" alludes to. About 30 minutes of suitable audio and video material are enough.

Neural networks can also be trained to recognize manipulated media content as counterfeits.

Deepfakes" go hand in hand with new risks

The risks that "deepfakes" entail are considerable. Each of us theoretically runs the risk of transfers or contracts being concluded online on his behalf by means of fake voice or videos - provided that sufficient audio and video material is available. Companies can also be harmed if, for example, employees are tricked into fraudulent actions with fake audio messages. This is what happened in the case of a UK-based energy company whose CEO was apparently asked by the CEO of the German parent company, but in reality by a machine-cloned voice, to transfer a six-figure amount of money (source: Forbes).

For the media landscape, the possibility of manipulating statements by politicians or influential decision-makers is a particular challenge. Because public figures usually have comprehensive audio and video content and thus sufficient AI training material for the production of "deepfakes". In this way, high-ranking politicians worldwide can put virtually any sentences "in their mouths" that seem deceptively real in image and sound (example with Angela Merkel).

Beat "Deepfakes" with your own weapons

AI makes "Deepfakes" possible, but can also help significantly to reliably expose audio and video manipulation. This is exactly where the Fraunhofer Institute for Applied and Integrated Security AISEC comes in. The IT security experts of the Cognitive Security Technologies research department design systems that securely and automatically detect "deepfakes" as counterfeits. They are also researching methods to strengthen the robustness of systems that evaluate video and audio material.

n order for the scientists at Fraunhofer AISEC to fully understand the technology behind the fraud, identify possible weak points and design protective measures, they first switch to the side of the counterfeiters in simulations. Here they generate convincingly fake audio and video data, on the basis of which they then develop algorithms for detecting counterfeits. The use of AI is crucial. Because just as neural networks can learn to create media content, they can also be trained to detect counterfeit material. For this purpose, they are presented with a series of real and manipulated recordings, on the basis of which the networks learn to recognize the smallest discrepancies that a person cannot perceive. Such AI algorithms are then also able to automatically decide whether an audio or video file is genuine or fake.

With practice, users can recognize "deepfakes" more reliably, but do not reach the hit rates of AI security systems.

In addition, the cyber security experts at Fraunhofer AISEC subject AI systems, e.g. B. used in facial recognition or speech processing, precise security checks. Using penetration tests, they analyze their weak points and develop "hardened" security solutions that withstand deception attempts with "deepfakes." With methods such as "Robust Learning" and "Adversarial Learning", Fraunhofer AISEC gives the AI algorithms thicker armor, so to speak, and makes them more resilient, for example through a more complex design of programming.

Use case insurance industry: "Deepfakes" deceive voice ID system

Banks, insurance companies or mobile operators are increasingly offering to identify with the voice on a call. So the voice has the meaning of a password. Authentication by voice recognition may be more convenient than conventional authentication methods such as PIN or password. However, in order to be used as a trustworthy and reliable alternative, the Voice ID system must be robust and secure.

The scientists at Fraunhofer showed in the latest use case that there is still some catching up to do in terms of security: In a penetration test, they successfully leveraged the voice recognition system (a so-called voice ID system) of a large German insurance company. On the basis of provided training material in the form of a recording of a public speech of the target person for about ten minutes, a high-quality audio "deepfake" was specially made at Fraunhofer, which could deceive the security system and allow access to the target person's personal account.

Build competence and awareness for the recognition and defense of "deepfakes"

"Deepfakes" will be easier and easier to produce in the future. Therefore, it is important to build competencies and tools and to take measures to make the use of AI comprehensible when data is generated and to recognize counterfeits as such.

The experts at Fraunhofer already have methods that detect manipulations much more reliably than humans. At the same time, it is advisable to label "deepfakes" as such and regulate their use. Appropriate legal framework conditions, which provide for punitive measures for the covert use of AI, could be helpful in this. At the same time, the traceability of unchanged information should also be strengthened. In addition to the detection of "Deepfakes", unchanged original recordings can be demonstrably marked as such.

In addition, security systems can be tested and protected for their susceptibility to deception. Based on the own "deepfakes" of the researchers of Fraunhofer, voice ID and other security systems are comprehensively tested as part of penetration tests in order to detect vulnerabilities before an attacker does.

Last but not least, given the importance of AI-supported systems, it is necessary to raise risk awareness. Users must learn to question and examine media material - be it the speech of a politician or the surprising telephone request of the superiors to make a transfer to an unknown account. Evaluations by the security researchers suggest that it is easier for users to recognize "deepfakes" as such with increasing awareness. The "Deepfakes" manufactured at Fraunhofer AISEC are therefore also used for training and education purposes.


Angela Merkel Poems

Angela Merkel supposedly recites a poem: The demonstrator illustrates the maturity of German-speaking "Deepfakes".


Can you spot the audio deepfake?

Man against machine: Who recognizes manipulated audios more reliably?


Clone of Angela Merkel

A deepfake AI synthesizes the voice of former Chancellor Angela Merkel.