1,517 research outputs found
Datasets, Clues and State-of-the-Arts for Multimedia Forensics: An Extensive Review
With the large chunks of social media data being created daily and the
parallel rise of realistic multimedia tampering methods, detecting and
localising tampering in images and videos has become essential. This survey
focusses on approaches for tampering detection in multimedia data using deep
learning models. Specifically, it presents a detailed analysis of benchmark
datasets for malicious manipulation detection that are publicly available. It
also offers a comprehensive list of tampering clues and commonly used deep
learning architectures. Next, it discusses the current state-of-the-art
tampering detection methods, categorizing them into meaningful types such as
deepfake detection methods, splice tampering detection methods, copy-move
tampering detection methods, etc. and discussing their strengths and
weaknesses. Top results achieved on benchmark datasets, comparison of deep
learning approaches against traditional methods and critical insights from the
recent tampering detection methods are also discussed. Lastly, the research
gaps, future direction and conclusion are discussed to provide an in-depth
understanding of the tampering detection research arena
Investigating Social Interactions Using Multi-Modal Nonverbal Features
Every day, humans are involved in social situations and interplays, with the goal of
sharing emotions and thoughts, establishing relationships with or acting on other
human beings. These interactions are possible thanks to what is called social intelligence,
which is the ability to express and recognize social signals produced during
the interactions. These signals aid the information exchange and are expressed
through verbal and non-verbal behavioral cues, such as facial expressions, gestures,
body pose or prosody. Recently, many works have demonstrated that social signals
can be captured and analyzed by automatic systems, giving birth to a relatively
new research area called social signal processing, which aims at replicating human
social intelligence with machines. In this thesis, we explore the use of behavioral
cues and computational methods for modeling and understanding social interactions.
Concretely, we focus on several behavioral cues in three specic contexts:
rst, we analyze the relationship between gaze and leadership in small group interactions.
Second, we expand our analysis to face and head gestures in the context of
deception detection in dyadic interactions. Finally, we analyze the whole body for
group detection in mingling scenarios
Bimodal Emotion Classification Using Deep Learning
Multimodal Emotion Recognition is an emerging associative field in the area of Human Computer Interaction and Sentiment Analysis. It extracts information from each modality to predict the emotions accurately. In this research, Bimodal Emotion Recognition framework is developed with the decision-level fusion of Audio and Video modality using RAVDES dataset. Designing such frameworks are computationally expensive and require more time to train the network. Thus, a relatively small dataset has been used for the scope of this research. The conducted research is inspired by the use of neural networks for emotion classification from multimodal data. The developed framework further confirmed the fact that merging modality can enhance the accuracy in classifying emotions. Later, decision-level fusion is further explored with changes in the architecture of the Unimodal networks. The research showed that the Bimodal framework formed with the fusion of unimodal networks having wide layer with more nodes outperformed the framework designed with the fusion of narrow unimodal networks having lesser nodes
Multi-modal video analysis for early fire detection
In dit proefschrift worden verschillende aspecten van een intelligent videogebaseerd branddetectiesysteem onderzocht. In een eerste luik ligt de nadruk op de multimodale verwerking van visuele, infrarood en time-of-flight videobeelden, die de louter visuele detectie verbetert. Om de verwerkingskost zo minimaal mogelijk te houden, met het oog op real-time detectie, is er voor elk van het type sensoren een set ’low-cost’ brandkarakteristieken geselecteerd die vuur en vlammen uniek beschrijven. Door het samenvoegen van de verschillende typen informatie kunnen het aantal gemiste detecties en valse alarmen worden gereduceerd, wat resulteert in een significante verbetering van videogebaseerde branddetectie. Om de multimodale detectieresultaten te kunnen combineren, dienen de multimodale beelden wel geregistreerd (~gealigneerd) te zijn. Het tweede luik van dit proefschrift focust zich hoofdzakelijk op dit samenvoegen van multimodale data en behandelt een nieuwe silhouet gebaseerde registratiemethode. In het derde en tevens laatste luik van dit proefschrift worden methodes voorgesteld om videogebaseerde brandanalyse, en in een latere fase ook brandmodellering, uit te voeren. Elk van de voorgestelde technieken voor multimodale detectie en multi-view lokalisatie zijn uitvoerig getest in de praktijk. Zo werden onder andere succesvolle testen uitgevoerd voor de vroegtijdige detectie van wagenbranden in ondergrondse parkeergarages
Unmasking the imposters: towards improving the generalisation of deep learning methods for face presentation attack detection.
Identity theft has had a detrimental impact on the reliability of face recognition, which has been extensively employed in security applications. The most prevalent are presentation attacks. By using a photo, video, or mask of an authorized user, attackers can bypass face recognition systems. Fake presentation attacks are detected by the camera sensors of face recognition systems using face presentation attack detection. Presentation attacks can be detected using convolutional neural networks, commonly used in computer vision applications. An in-depth analysis of current deep learning methods is used in this research to examine various aspects of detecting face presentation attacks. A number of new techniques are implemented and evaluated in this study, including pre-trained models, manual feature extraction, and data aggregation. The thesis explores the effectiveness of various machine learning and deep learning models in improving detection performance by using publicly available datasets with different dataset partitions than those specified in the official dataset protocol. Furthermore, the research investigates how deep models and data aggregation can be used to detect face presentation attacks, as well as a novel approach that combines manual features with deep features in order to improve detection accuracy. Moreover, task-specific features are also extracted using pre-trained deep models to enhance the performance of detection and generalisation further. This problem is motivated by the need to achieve generalization against new and rapidly evolving attack variants. It is possible to extract identifiable features from presentation attack variants in order to detect them. However, new methods are needed to deal with emerging attacks and improve the generalization capability. This thesis examines the necessary measures to detect face presentation attacks in a more robust and generalised manner
- …