![]() ![]() Here, a user can enter a text for the TTS system to process and convert into an audio signal. The basic functioning of the text-to-speech procedure is outlined in the figure above. Source: brgfx / Macrovector / Freepik Zusammenstellung: BSI In creating manipulated voices, text-to-speech ( TTS) and voice conversion ( VC) procedures are particularly significant. Current methods are still limited to single images, but these can already produce close-ups with a high image resolution and depth of detail. In the process of synthesising facial images, new people can also be be created that do not exist in reality. The manipulator can then control this with their own video stream and create deceptively real facial expressions on the target person. Popular techniques achieve this by creating a 3D model of the target's face from a video stream. This makes it possible to create visually deceptive videos in which a person makes statements that they never made in reality. However, the video must be of a high quality and contain as many different facial expressions and perspectives as possible so that the model can learn to manipulate them.įace reenactment involves manipulating a person's head movement, facial expressions or lip movement. Only a few minutes of video of a target person are required as training material. Some of these models also support face swapping in real time (or with only a slight delay). Meanwhile, commercially available graphics cards can be used to train high-resolution models that can handle close-ups of faces in full HD videos. The resulting neural networks learn to extract the relevant facial expression and illumination information from a facial image in coded form and to generate a corresponding facial image from the coded information. The model for this involves using an autoencoderin common public software libraries. In the face swap process (shown in the figure above), the aim is to input the face of one person and create a facial image of another person with the same expression, illumination and gaze direction. Source: brgfx / Freepik Zusammenstellung: BSI These either attempt to exchange faces in a video ('face swapping'), control the facial expressions/head movements of a person in a video ('face reenactment'), or synthesise new (pseudo) identities. To manipulate faces in videos, several AI-based processes have been developed in recent years. The sections below explain the attack methods that exist according to the current state of the art, which data is required for a successful attack and the effort that is necessary to create forgeries using deep-fake methods. Methods for manipulating media identities can be divided into three forms of media: video/images, audio and text. Because they use deep neural networks, such methods are often referred to as 'deep fakes'. Methods from the field of artificial intelligence (AI) have now made this much easier, and high-quality fakes can be created with comparatively little effort and expertise. For a long time, it was very time-consuming to produce high-quality manipulations of dynamic media such as videos or audio recordings. ![]() It is common knowledge that images can be manipulated by a variety of methods. Methods for manipulating media identities have existed for many years now. Challenges of (automated) countermeasures.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |