Search CORE

1,517 research outputs found

Datasets, Clues and State-of-the-Arts for Multimedia Forensics: An Extensive Review

Author: Vishwakarma Dinesh Kumar
Yadav Ankit
Publication venue
Publication date: 13/01/2024
Field of study

With the large chunks of social media data being created daily and the parallel rise of realistic multimedia tampering methods, detecting and localising tampering in images and videos has become essential. This survey focusses on approaches for tampering detection in multimedia data using deep learning models. Specifically, it presents a detailed analysis of benchmark datasets for malicious manipulation detection that are publicly available. It also offers a comprehensive list of tampering clues and commonly used deep learning architectures. Next, it discusses the current state-of-the-art tampering detection methods, categorizing them into meaningful types such as deepfake detection methods, splice tampering detection methods, copy-move tampering detection methods, etc. and discussing their strengths and weaknesses. Top results achieved on benchmark datasets, comparison of deep learning approaches against traditional methods and critical insights from the recent tampering detection methods are also discussed. Lastly, the research gaps, future direction and conclusion are discussed to provide an in-depth understanding of the tampering detection research arena

arXiv.org e-Print Archive

Investigating Social Interactions Using Multi-Modal Nonverbal Features

Author: Carissimi Nicolo'
Publication venue: Universit\ue0 degli studi di Genova
Publication date: 25/02/2019
Field of study

Every day, humans are involved in social situations and interplays, with the goal of sharing emotions and thoughts, establishing relationships with or acting on other human beings. These interactions are possible thanks to what is called social intelligence, which is the ability to express and recognize social signals produced during the interactions. These signals aid the information exchange and are expressed through verbal and non-verbal behavioral cues, such as facial expressions, gestures, body pose or prosody. Recently, many works have demonstrated that social signals can be captured and analyzed by automatic systems, giving birth to a relatively new research area called social signal processing, which aims at replicating human social intelligence with machines. In this thesis, we explore the use of behavioral cues and computational methods for modeling and understanding social interactions. Concretely, we focus on several behavioral cues in three specic contexts: rst, we analyze the relationship between gaze and leadership in small group interactions. Second, we expand our analysis to face and head gestures in the context of deception detection in dyadic interactions. Finally, we analyze the whole body for group detection in mingling scenarios

Archivio istituzionale della ricerca - Università di Genova

Bimodal Emotion Classification Using Deep Learning

Author: Singh Ashutosh Kumar
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2020
Field of study

Multimodal Emotion Recognition is an emerging associative field in the area of Human Computer Interaction and Sentiment Analysis. It extracts information from each modality to predict the emotions accurately. In this research, Bimodal Emotion Recognition framework is developed with the decision-level fusion of Audio and Video modality using RAVDES dataset. Designing such frameworks are computationally expensive and require more time to train the network. Thus, a relatively small dataset has been used for the scope of this research. The conducted research is inspired by the use of neural networks for emotion classification from multimodal data. The developed framework further confirmed the fact that merging modality can enhance the accuracy in classifying emotions. Later, decision-level fusion is further explored with changes in the architecture of the Unimodal networks. The research showed that the Bimodal framework formed with the fusion of unimodal networks having wide layer with more nodes outperformed the framework designed with the fusion of narrow unimodal networks having lesser nodes

Arrow@TUDublin

Multi-modal video analysis for early fire detection

Author: Verstockt Steven
Publication venue: Ghent University. Faculty of Engineering and Architecture
Publication date: 01/01/2011
Field of study

In dit proefschrift worden verschillende aspecten van een intelligent videogebaseerd branddetectiesysteem onderzocht. In een eerste luik ligt de nadruk op de multimodale verwerking van visuele, infrarood en time-of-flight videobeelden, die de louter visuele detectie verbetert. Om de verwerkingskost zo minimaal mogelijk te houden, met het oog op real-time detectie, is er voor elk van het type sensoren een set ’low-cost’ brandkarakteristieken geselecteerd die vuur en vlammen uniek beschrijven. Door het samenvoegen van de verschillende typen informatie kunnen het aantal gemiste detecties en valse alarmen worden gereduceerd, wat resulteert in een significante verbetering van videogebaseerde branddetectie. Om de multimodale detectieresultaten te kunnen combineren, dienen de multimodale beelden wel geregistreerd (~gealigneerd) te zijn. Het tweede luik van dit proefschrift focust zich hoofdzakelijk op dit samenvoegen van multimodale data en behandelt een nieuwe silhouet gebaseerde registratiemethode. In het derde en tevens laatste luik van dit proefschrift worden methodes voorgesteld om videogebaseerde brandanalyse, en in een latere fase ook brandmodellering, uit te voeren. Elk van de voorgestelde technieken voor multimodale detectie en multi-view lokalisatie zijn uitvoerig getest in de praktijk. Zo werden onder andere succesvolle testen uitgevoerd voor de vroegtijdige detectie van wagenbranden in ondergrondse parkeergarages

Ghent University Academic Bibliography

Unmasking the imposters: towards improving the generalisation of deep learning methods for face presentation attack detection.

Author: Abdullakutty Faseela Chakkalakkal
Publication venue
Publication date: 31/10/2023
Field of study

Identity theft has had a detrimental impact on the reliability of face recognition, which has been extensively employed in security applications. The most prevalent are presentation attacks. By using a photo, video, or mask of an authorized user, attackers can bypass face recognition systems. Fake presentation attacks are detected by the camera sensors of face recognition systems using face presentation attack detection. Presentation attacks can be detected using convolutional neural networks, commonly used in computer vision applications. An in-depth analysis of current deep learning methods is used in this research to examine various aspects of detecting face presentation attacks. A number of new techniques are implemented and evaluated in this study, including pre-trained models, manual feature extraction, and data aggregation. The thesis explores the effectiveness of various machine learning and deep learning models in improving detection performance by using publicly available datasets with different dataset partitions than those specified in the official dataset protocol. Furthermore, the research investigates how deep models and data aggregation can be used to detect face presentation attacks, as well as a novel approach that combines manual features with deep features in order to improve detection accuracy. Moreover, task-specific features are also extracted using pre-trained deep models to enhance the performance of detection and generalisation further. This problem is motivated by the need to achieve generalization against new and rapidly evolving attack variants. It is possible to extract identifiable features from presentation attack variants in order to detect them. However, new methods are needed to deal with emerging attacks and improve the generalization capability. This thesis examines the necessary measures to detect face presentation attacks in a more robust and generalised manner

Open Access Institutional Repository at Robert Gordon University