74 research outputs found
Towards Effective Image Forensics via A Novel Computationally Efficient Framework and A New Image Splice Dataset
Splice detection models are the need of the hour since splice manipulations
can be used to mislead, spread rumors and create disharmony in society.
However, there is a severe lack of image splicing datasets, which restricts the
capabilities of deep learning models to extract discriminative features without
overfitting. This manuscript presents two-fold contributions toward splice
detection. Firstly, a novel splice detection dataset is proposed having two
variants. The two variants include spliced samples generated from code and
through manual editing. Spliced images in both variants have corresponding
binary masks to aid localization approaches. Secondly, a novel
Spatio-Compression Lightweight Splice Detection Framework is proposed for
accurate splice detection with minimum computational cost. The proposed
dual-branch framework extracts discriminative spatial features from a
lightweight spatial branch. It uses original resolution compression data to
extract double compression artifacts from the second branch, thereby making it
'information preserving.' Several CNNs are tested in combination with the
proposed framework on a composite dataset of images from the proposed dataset
and the CASIA v2.0 dataset. The best model accuracy of 0.9382 is achieved and
compared with similar state-of-the-art methods, demonstrating the superiority
of the proposed framework
A Deep Multi-Level Attentive network for Multimodal Sentiment Analysis
Multimodal sentiment analysis has attracted increasing attention with broad
application prospects. The existing methods focuses on single modality, which
fails to capture the social media content for multiple modalities. Moreover, in
multi-modal learning, most of the works have focused on simply combining the
two modalities, without exploring the complicated correlations between them.
This resulted in dissatisfying performance for multimodal sentiment
classification. Motivated by the status quo, we propose a Deep Multi-Level
Attentive network, which exploits the correlation between image and text
modalities to improve multimodal learning. Specifically, we generate the
bi-attentive visual map along the spatial and channel dimensions to magnify
CNNs representation power. Then we model the correlation between the image
regions and semantics of the word by extracting the textual features related to
the bi-attentive visual features by applying semantic attention. Finally,
self-attention is employed to automatically fetch the sentiment-rich multimodal
features for the classification. We conduct extensive evaluations on four
real-world datasets, namely, MVSA-Single, MVSA-Multiple, Flickr, and Getty
Images, which verifies the superiority of our method.Comment: 11 pages, 7 figure
A Visually Attentive Splice Localization Network with Multi-Domain Feature Extractor and Multi-Receptive Field Upsampler
Image splice manipulation presents a severe challenge in today's society.
With easy access to image manipulation tools, it is easier than ever to modify
images that can mislead individuals, organizations or society. In this work, a
novel, "Visually Attentive Splice Localization Network with Multi-Domain
Feature Extractor and Multi-Receptive Field Upsampler" has been proposed. It
contains a unique "visually attentive multi-domain feature extractor" (VA-MDFE)
that extracts attentional features from the RGB, edge and depth domains. Next,
a "visually attentive downsampler" (VA-DS) is responsible for fusing and
downsampling the multi-domain features. Finally, a novel "visually attentive
multi-receptive field upsampler" (VA-MRFU) module employs multiple receptive
field-based convolutions to upsample attentional features by focussing on
different information scales. Experimental results conducted on the public
benchmark dataset CASIA v2.0 prove the potency of the proposed model. It
comfortably beats the existing state-of-the-arts by achieving an IoU score of
0.851, pixel F1 score of 0.9195 and pixel AUC score of 0.8989
Datasets, Clues and State-of-the-Arts for Multimedia Forensics: An Extensive Review
With the large chunks of social media data being created daily and the
parallel rise of realistic multimedia tampering methods, detecting and
localising tampering in images and videos has become essential. This survey
focusses on approaches for tampering detection in multimedia data using deep
learning models. Specifically, it presents a detailed analysis of benchmark
datasets for malicious manipulation detection that are publicly available. It
also offers a comprehensive list of tampering clues and commonly used deep
learning architectures. Next, it discusses the current state-of-the-art
tampering detection methods, categorizing them into meaningful types such as
deepfake detection methods, splice tampering detection methods, copy-move
tampering detection methods, etc. and discussing their strengths and
weaknesses. Top results achieved on benchmark datasets, comparison of deep
learning approaches against traditional methods and critical insights from the
recent tampering detection methods are also discussed. Lastly, the research
gaps, future direction and conclusion are discussed to provide an in-depth
understanding of the tampering detection research arena
- …