604 research outputs found

    Facial re-enactment, speech synthesis and the rise of the Deepfake

    Get PDF
    Emergent technologies in the fields of audio speech synthesis and video facial manipulation have the potential to drastically impact our societal patterns of multimedia consumption. At a time when social media and internet culture is plagued by misinformation, propaganda and “fake news”, their latent misuse represents a possible looming threat to fragile systems of information sharing and social democratic discourse. It has thus become increasingly recognised in both academic and mainstream journalism that the ramifications of these tools must be examined to determine what they are and how their widespread availability can be managed. This research project seeks to examine four emerging software programs – Face2Face, FakeApp , Adobe VoCo and Lyrebird – that are designed to facilitate the synthesis of speech and manipulate facial features in videos. I will explore their positive industry applications and the potentially negative consequences of their release into the public domain. Consideration will be directed to how such consequences and risks can be ameliorated through detection, regulation and education. A final analysis of these three competing threads will then attempt to address whether the practical and commercial applications of these technologies are outweighed by the inherent unethical or illegal uses they engender, and if so; what we can do in response

    Anti- Forensics: The Tampering of Media

    Get PDF
    In the context of forensic investigations, the traditional understanding of evidence is changing where nowadays most prosecutors, lawyers and judges heavily rely on multimedia signs. This modern shift has allowed the law enforcement to better reconstruct the crime scenes or reveal the truth of any critical event.In this paper we shed the light on the role of video, audio and photos as forensic evidences presenting the possibility of their tampering by various easy-to-use, available anti-forensics softwares. We proved that along with the forensic analysis, digital processing, enhancement and authentication via forgery detection algorithms to testify the integrity of the content and the respective source of each, differentiating between an original and altered evidence is now feasible. These operations assist the court to attain higher degree of intelligibility of the multimedia data handled and assert the information retrieved from each that support the success of the investigation process

    A Framework for the Measurement of Simulated Behavior Performance

    Get PDF
    Recent development in video games, simulation, training, and robotics has seen a push for greater visual and behavioral realism. As the reliance on high fidelity models in the education, training, and simulation communities to provide information used for strategic and tactical decisions rises, the importance of accuracy and credibility of simulated behavior increases. Credibility is typically established through verification and validation techniques. Increased interest exists in bringing behavior realism to the same level as the visual. Thus far validation process for behavioral models is unclear. With real world behavior a major goal, this research investigates the validation problem and provides a process for quantifying behavioral correctness. We design a representation of behavior based on kinematic features capturable from persistent sensors and develop a domain independent classification framework for the measuring of behavior replication. We demonstrate functionality through correct behavior comparison and evaluation of sample simulated behaviors

    Automatic 3D Facial Expression Analysis in Videos

    Full text link
    We introduce a novel framework for automatic 3D facial expression analysis in videos. Preliminary results demonstrate editing facial expression with facial expression recognition. We first build a 3D expression database to learn the expression space of a human face. The real-time 3D video data were captured by a camera/projector scanning system. From this database, we extract the geometry deformation independent of pose and illumination changes. All possible facial deformations of an individual make a nonlinear manifold embedded in a high dimensional space. To combine the manifolds of different subjects that vary significantly and are usually hard to align, we transfer the facial deformations in all training videos to one standard model. Lipschitz embedding embeds the normalized deformation of the standard model in a low dimensional generalized manifold. We learn a probabilistic expression model on the generalized manifold. To edit a facial expression of a new subject in 3D videos, the system searches over this generalized manifold for optimal replacement with the 'target' expression, which will be blended with the deformation in the previous frames to synthesize images of the new expression with the current head pose. Experimental results show that our method works effectively

    Emo-StarGAN: A Semi-Supervised Any-to-Many Non-Parallel Emotion-Preserving Voice Conversion

    Full text link
    Speech anonymisation prevents misuse of spoken data by removing any personal identifier while preserving at least linguistic content. However, emotion preservation is crucial for natural human-computer interaction. The well-known voice conversion technique StarGANv2-VC achieves anonymisation but fails to preserve emotion. This work presents an any-to-many semi-supervised StarGANv2-VC variant trained on partially emotion-labelled non-parallel data. We propose emotion-aware losses computed on the emotion embeddings and acoustic features correlated to emotion. Additionally, we use an emotion classifier to provide direct emotion supervision. Objective and subjective evaluations show that the proposed approach significantly improves emotion preservation over the vanilla StarGANv2-VC. This considerable improvement is seen over diverse datasets, emotions, target speakers, and inter-group conversions without compromising intelligibility and anonymisation.Comment: Accepted in Interspeech 202
    • 

    corecore