343 research outputs found
CNN-based fast source device identification
Source identification is an important topic in image forensics, since it
allows to trace back the origin of an image. This represents a precious
information to claim intellectual property but also to reveal the authors of
illicit materials. In this paper we address the problem of device
identification based on sensor noise and propose a fast and accurate solution
using convolutional neural networks (CNNs). Specifically, we propose a
2-channel-based CNN that learns a way of comparing camera fingerprint and image
noise at patch level. The proposed solution turns out to be much faster than
the conventional approach and to ensure an increased accuracy. This makes the
approach particularly suitable in scenarios where large databases of images are
analyzed, like over social networks. In this vein, since images uploaded on
social media usually undergo at least two compression stages, we include
investigations on double JPEG compressed images, always reporting higher
accuracy than standard approaches
Coding local and global binary visual features extracted from video sequences
Binary local features represent an effective alternative to real-valued
descriptors, leading to comparable results for many visual analysis tasks,
while being characterized by significantly lower computational complexity and
memory requirements. When dealing with large collections, a more compact
representation based on global features is often preferred, which can be
obtained from local features by means of, e.g., the Bag-of-Visual-Word (BoVW)
model. Several applications, including for example visual sensor networks and
mobile augmented reality, require visual features to be transmitted over a
bandwidth-limited network, thus calling for coding techniques that aim at
reducing the required bit budget, while attaining a target level of efficiency.
In this paper we investigate a coding scheme tailored to both local and global
binary features, which aims at exploiting both spatial and temporal redundancy
by means of intra- and inter-frame coding. In this respect, the proposed coding
scheme can be conveniently adopted to support the Analyze-Then-Compress (ATC)
paradigm. That is, visual features are extracted from the acquired content,
encoded at remote nodes, and finally transmitted to a central controller that
performs visual analysis. This is in contrast with the traditional approach, in
which visual content is acquired at a node, compressed and then sent to a
central unit for further processing, according to the Compress-Then-Analyze
(CTA) paradigm. In this paper we experimentally compare ATC and CTA by means of
rate-efficiency curves in the context of two different visual analysis tasks:
homography estimation and content-based retrieval. Our results show that the
novel ATC paradigm based on the proposed coding primitives can be competitive
with CTA, especially in bandwidth limited scenarios.Comment: submitted to IEEE Transactions on Image Processin
Reduced Memory Region Based Deep Convolutional Neural Network Detection
Accurate pedestrian detection has a primary role in automotive safety: for
example, by issuing warnings to the driver or acting actively on car's brakes,
it helps decreasing the probability of injuries and human fatalities. In order
to achieve very high accuracy, recent pedestrian detectors have been based on
Convolutional Neural Networks (CNN). Unfortunately, such approaches require
vast amounts of computational power and memory, preventing efficient
implementations on embedded systems. This work proposes a CNN-based detector,
adapting a general-purpose convolutional network to the task at hand. By
thoroughly analyzing and optimizing each step of the detection pipeline, we
develop an architecture that outperforms methods based on traditional image
features and achieves an accuracy close to the state-of-the-art while having
low computational complexity. Furthermore, the model is compressed in order to
fit the tight constrains of low power devices with a limited amount of embedded
memory available. This paper makes two main contributions: (1) it proves that a
region based deep neural network can be finely tuned to achieve adequate
accuracy for pedestrian detection (2) it achieves a very low memory usage
without reducing detection accuracy on the Caltech Pedestrian dataset.Comment: IEEE 2016 ICCE-Berli
Subpixel Edge Localization with Statistical Error Compensation
Subpixel Edge Localization (EL) techniques are often affected by an error that exhibits a systematic character
When this happens their performance can be improved
through compensation of the systematic portion of the
localization error In this paper we propose and analyze
a method for estimating the EL characteristic of subpixel EL techniques through statistical analysis of appropriate test images The impact of the compensation method on the accuracy of a camera calibration procedure has been proven to be quite signicant, which can be crucial especially in applications of low-cost photogrammetry and 3D reconstruction from multiple views
An In-Depth Study on Open-Set Camera Model Identification
Camera model identification refers to the problem of linking a picture to the
camera model used to shoot it. As this might be an enabling factor in different
forensic applications to single out possible suspects (e.g., detecting the
author of child abuse or terrorist propaganda material), many accurate camera
model attribution methods have been developed in the literature. One of their
main drawbacks, however, is the typical closed-set assumption of the problem.
This means that an investigated photograph is always assigned to one camera
model within a set of known ones present during investigation, i.e., training
time, and the fact that the picture can come from a completely unrelated camera
model during actual testing is usually ignored. Under realistic conditions, it
is not possible to assume that every picture under analysis belongs to one of
the available camera models. To deal with this issue, in this paper, we present
the first in-depth study on the possibility of solving the camera model
identification problem in open-set scenarios. Given a photograph, we aim at
detecting whether it comes from one of the known camera models of interest or
from an unknown one. We compare different feature extraction algorithms and
classifiers specially targeting open-set recognition. We also evaluate possible
open-set training protocols that can be applied along with any open-set
classifier, observing that a simple of those alternatives obtains best results.
Thorough testing on independent datasets shows that it is possible to leverage
a recently proposed convolutional neural network as feature extractor paired
with a properly trained open-set classifier aiming at solving the open-set
camera model attribution problem even to small-scale image patches, improving
over state-of-the-art available solutions.Comment: Published through IEEE Access journa
Discriminating multiple JPEG compression using first digit features
The analysis of JPEG double-compressed images is a problem largely studied by the multimedia forensics community, as it might be exploited, e.g., for tampering localization or source device identification. In many practical scenarios, like photos uploaded on blogs, on-line albums, and photo sharing web sites, images might be JPEG compressed several times. However, the identification of the number of compression stages applied to an image remains an open issue. We proposes a forensic method based on the analysis of the distribution of the first significant digits of the discrete cosine transform coefficients, which follow Benford's law in images compressed just once. Then, the detector is optimized and extended in order to identify accurately the number of compression stages applied to an image. The experimental validation considers up to four consecutive compression stages and shows that the proposed approach extends and outperforms the previously-published algorithms for double JPEG compression detection
Music genre visualization and classification exploiting a small set of high-level semantic features
In this paper a system for continuous analysis, visualization and classification of musical streams is proposed. The system performs visualization and classification task by means of three high-level, semantic features extracted computing a reduction on a multidimensional low-level feature vector through the usage of Gaussian Mixture Models. The visualization of the semantic characteristics of the audio stream has been implemented by mapping the value of the high-level features on a triangular plot and by assigning to each feature a primary color. In this manner, besides having the representation of musical evolution of the signal, we have also obtained representative colors for each musical part of the analyzed streams. The classification exploits a set of one-against-one threedimensional Support Vector Machines trained on some target genres.
The obtained results on visualization and classification tasks are very encouraging: our tests on heterogeneous genre streams have shown the validity of proposed approac
- …