67 research outputs found
Localization of JPEG double compression through multi-domain convolutional neural networks
When an attacker wants to falsify an image, in most of cases she/he will
perform a JPEG recompression. Different techniques have been developed based on
diverse theoretical assumptions but very effective solutions have not been
developed yet. Recently, machine learning based approaches have been started to
appear in the field of image forensics to solve diverse tasks such as
acquisition source identification and forgery detection. In this last case, the
aim ahead would be to get a trained neural network able, given a to-be-checked
image, to reliably localize the forged areas. With this in mind, our paper
proposes a step forward in this direction by analyzing how a single or double
JPEG compression can be revealed and localized using convolutional neural
networks (CNNs). Different kinds of input to the CNN have been taken into
consideration, and various experiments have been carried out trying also to
evidence potential issues to be further investigated.Comment: Accepted to CVPRW 2017, Workshop on Media Forensic
Real-time monocular depth estimation on embedded devices: challenges and performances in terrestrial and underwater scenarios
The knowledge of the environmental depth is essential in multiple robotics and computer vision tasks for both terrestrial and underwater scenarios.
Recent works aim at enabling depth perception using single RGB images on deep architectures, such as convolutional neural networks and vision transformers, which are generally unsuitable for real-time inference on low-power embedded hardwares. Moreover, such architectures are trained to estimate depth maps mainly on terrestrial scenarios, due to the scarcity of underwater depth data.
Purposely, we present two lightweight architectures based on optimized MobileNetV3 encoders an a specifically designed decoder to achieve fast inferences and accurate estimations over embedded devices, and a feasibility study to predict depth maps over underwater scenarios.
Precisely, we propose the MobileNetV3_S75 configuration to infer on the 32-bit ARM CPU and the MobileNetV3_LMin for the 8-bit Edge TPU hardwares.
In underwater settings, the proposed design achieves comparable estimations with fast inference performances compared to state of the art methods.
The proposed architectures would be considered a promising approach for real-time monocular depth estimation with the aim of improving the environment perception for underwater drones, lightweight robots and internet-of-things
Tracing images back to their social network of origin: A CNN-based approach
Recovering information about the history of a digital content, such as an image or a video, can be strategic to address an investigation from the early stages. Storage devices, smart-phones and PCs, belonging to a suspect, are usually confiscated as soon as a warrant is issued. Any multimedia content found is analyzed in depth, in order to trace back its provenance and, if possible, its original source. This is particularly important when dealing with social networks, where most of the user-generated photos and videos are uploaded and shared daily. Being able to discern if images are downloaded from a social network or directly captured by a digital camera, can be crucial in leading consecutive investigations. In this paper, we propose a novel method based on convolutional neural networks (CNN) to determine the image provenance, whether it originates from a social network, a messaging application or directly from a photo-camera. By considering only the visual content, the method works irrespective of an eventual manipulation of metadata performed by an attacker. We have tested the proposed technique on three publicly available datasets of images downloaded from seven popular social networks, obtaining state-of-the-art results
Counter-forensics of SIFT-based copy-move detection by means of keypoint classification
Copy-move forgeries are very common image manipulations that are often carried out with malicious intents. Among the techniques devised by the 'Image Forensic' community, those relying on scale invariant feature transform (SIFT) features are the most effective ones. In this paper, we approach the copy-move scenario from the perspective of an attacker whose goal is to remove such features. The attacks conceived so far against SIFT-based forensic techniques implicitly assume that all SIFT keypoints have similar properties. On the contrary, we base our attacking strategy on the observation that it is possible to classify them in different typologies. Also, one may devise attacks tailored to each specific SIFT class, thus improving the performance in terms of removal rate and visual quality. To validate our ideas, we propose to use a SIFT classification scheme based on the gray scale histogram of the neighborhood of SIFT keypoints. Once the classification is performed, we then attack the different classes by means of class-specific methods. Our experiments lead to three interesting results: (1) there is a significant advantage in using SIFT classification, (2) the classification-based attack is robust against different SIFT implementations, and (3) we are able to impair a state-of-the-art SIFT-based copy-move detector in realistic cases
DepthFake: a depth-based strategy for detecting Deepfake videos
Fake content has grown at an incredible rate over the past few years. The
spread of social media and online platforms makes their dissemination on a
large scale increasingly accessible by malicious actors. In parallel, due to
the growing diffusion of fake image generation methods, many Deep
Learning-based detection techniques have been proposed. Most of those methods
rely on extracting salient features from RGB images to detect through a binary
classifier if the image is fake or real. In this paper, we proposed DepthFake,
a study on how to improve classical RGB-based approaches with depth-maps. The
depth information is extracted from RGB images with recent monocular depth
estimation techniques. Here, we demonstrate the effective contribution of
depth-maps to the deepfake detection task on robust pre-trained architectures.
The proposed RGBD approach is in fact able to achieve an average improvement of
3.20% and up to 11.7% for some deepfake attacks with respect to standard RGB
architectures over the FaceForensic++ dataset.Comment: 2022 ICPR Workshop on Artificial Intelligence for Multimedia
Forensics and Disinformation Detectio
A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking
Vision Transformer (ViT) architectures are becoming increasingly popular and
widely employed to tackle computer vision applications. Their main feature is
the capacity to extract global information through the self-attention
mechanism, outperforming earlier convolutional neural networks. However, ViT
deployment and performance have grown steadily with their size, number of
trainable parameters, and operations. Furthermore, self-attention's
computational and memory cost quadratically increases with the image
resolution. Generally speaking, it is challenging to employ these architectures
in real-world applications due to many hardware and environmental restrictions,
such as processing and computational capabilities. Therefore, this survey
investigates the most efficient methodologies to ensure sub-optimal estimation
performances. More in detail, four efficient categories will be analyzed:
compact architecture, pruning, knowledge distillation, and quantization
strategies. Moreover, a new metric called Efficient Error Rate has been
introduced in order to normalize and compare models' features that affect
hardware devices at inference time, such as the number of parameters, bits,
FLOPs, and model size. Summarizing, this paper firstly mathematically defines
the strategies used to make Vision Transformer efficient, describes and
discusses state-of-the-art methodologies, and analyzes their performances over
different application scenarios. Toward the end of this paper, we also discuss
open challenges and promising research directions
Removal and injection of keypoints for SIFT-based copy-move counter-forensics
Recent studies exposed the weaknesses of scale-invariant feature transform (SIFT)-based analysis by removing keypoints without significantly deteriorating the visual quality of the counterfeited image. As a consequence, an attacker can leverage on such weaknesses to impair or directly bypass with alarming efficacy some applications that rely on SIFT. In this paper, we further investigate this topic by addressing the dual problem of keypoint removal, i.e., the injection of fake SIFT keypoints in an image whose authentic keypoints have been previously deleted. Our interest stemmed from the consideration that an image with too few keypoints is per se a clue of counterfeit, which can be used by the forensic analyst to reveal the removal attack. Therefore, we analyse five injection tools reducing the perceptibility of keypoint removal and compare them experimentally. The results are encouraging and show that injection is feasible without causing a successive detection at SIFT matching level. To demonstrate the practical effectiveness of our procedure, we apply the best performing tool to create a forensically undetectable copy-move forgery, whereby traces of keypoint removal are hidden by means of keypoint injection
Lightweight and Energy-Aware Monocular Depth Estimation Models for IoT Embedded Devices: Challenges and Performances in Terrestrial and Underwater Scenarios
The knowledge of the environmental depth is essential in multiple robotics and computer vision tasks for both terrestrial and underwater scenarios.
Moreover, the hardware on which this technology runs, generally IoT and embedded devices, are limited in terms of power consumption, and therefore models with low energy footprint are required to be designed.
Recent works aim at enabling depth perception using single RGB images on deep architectures, such as convolutional neural networks and vision transformers, which are generally unsuitable for real-time inference on low-power embedded hardware.
Moreover, such architectures are trained to estimate depth maps mainly on terrestrial scenarios, due to the scarcity of underwater depth data.
Purposely, we present two lightweight architectures based on optimized MobileNetV3 encoders and a specifically designed decoder to achieve fast inferences and accurate estimations over embedded devices, a feasibility study to predict depth maps over underwater scenarios, and an energy assessment to understand which is the effective energy consumption during the inference.
Precisely, we propose the MobileNetV3_S75 configuration to infer on the 32-bit ARM CPU and the MobileNetV3_LMin for the 8-bit Edge TPU hardware.
In underwater settings, the proposed design achieves comparable estimations with fast inference performances compared to state of the art methods.
Moreover, we statistically proved that the architecture of the models has an impact on the energy footprint in terms of Watts required by the device during the inference.
Then, the proposed architectures would be considered a promising approach for real-time monocular depth estimation by offering the best trade-off between inference performances, estimation error and energy consumption, with the aim of improving the environment perception for underwater drones, lightweight robots and internet-of-things
Diffusion Models for Earth Observation Use-cases: from cloud removal to urban change detection
The advancements in the state of the art of generative Artificial
Intelligence (AI) brought by diffusion models can be highly beneficial in novel
contexts involving Earth observation data. After introducing this new family of
generative models, this work proposes and analyses three use cases which
demonstrate the potential of diffusion-based approaches for satellite image
data. Namely, we tackle cloud removal and inpainting, dataset generation for
change-detection tasks, and urban replanning.Comment: Presented at Big Data from Space 2023 (BiDS
- …