14,822 research outputs found
Comparative study of motion detection methods for video surveillance systems
The objective of this study is to compare several change detection methods
for a mono static camera and identify the best method for different complex
environments and backgrounds in indoor and outdoor scenes. To this end, we used
the CDnet video dataset as a benchmark that consists of many challenging
problems, ranging from basic simple scenes to complex scenes affected by bad
weather and dynamic backgrounds. Twelve change detection methods, ranging from
simple temporal differencing to more sophisticated methods, were tested and
several performance metrics were used to precisely evaluate the results.
Because most of the considered methods have not previously been evaluated on
this recent large scale dataset, this work compares these methods to fill a
lack in the literature, and thus this evaluation joins as complementary
compared with the previous comparative evaluations. Our experimental results
show that there is no perfect method for all challenging cases, each method
performs well in certain cases and fails in others. However, this study enables
the user to identify the most suitable method for his or her needs.Comment: 69 pages, 18 figures, journal pape
Audio Surveillance: a Systematic Review
Despite surveillance systems are becoming increasingly ubiquitous in our
living environment, automated surveillance, currently based on video sensory
modality and machine intelligence, lacks most of the time the robustness and
reliability required in several real applications. To tackle this issue, audio
sensory devices have been taken into account, both alone or in combination with
video, giving birth, in the last decade, to a considerable amount of research.
In this paper audio-based automated surveillance methods are organized into a
comprehensive survey: a general taxonomy, inspired by the more widespread video
surveillance field, is proposed in order to systematically describe the methods
covering background subtraction, event classification, object tracking and
situation analysis. For each of these tasks, all the significant works are
reviewed, detailing their pros and cons and the context for which they have
been proposed. Moreover, a specific section is devoted to audio features,
discussing their expressiveness and their employment in the above described
tasks. Differently, from other surveys on audio processing and analysis, the
present one is specifically targeted to automated surveillance, highlighting
the target applications of each described methods and providing the reader
tables and schemes useful to retrieve the most suited algorithms for a specific
requirement
Background subtraction using the factored 3-way restricted Boltzmann machines
In this paper, we proposed a method for reconstructing the 3D model based on
continuous sensory input. The robot can draw on extremely large data from the
real world using various sensors. However, the sensory inputs are usually too
noisy and high-dimensional data. It is very difficult and time consuming for
robot to process using such raw data when the robot tries to construct 3D
model. Hence, there needs to be a method that can extract useful information
from such sensory inputs. To address this problem our method utilizes the
concept of Object Semantic Hierarchy (OSH). Different from the previous work
that used this hierarchy framework, we extract the motion information using the
Deep Belief Network technique instead of applying classical computer vision
approaches. We have trained on two large sets of random dot images (10,000)
which are translated and rotated, respectively, and have successfully extracted
several bases that explain the translation and rotation motion. Based on this
translation and rotation bases, background subtraction have become possible
using Object Semantic Hierarchy.Comment: EECS545 (2011 Winter) class project report at the University of
Michigan. This is for archiving purpos
Learning to Detect Instantaneous Changes with Retrospective Convolution and Static Sample Synthesis
Change detection has been a challenging visual task due to the dynamic nature
of real-world scenes. Good performance of existing methods depends largely on
prior background images or a long-term observation. These methods, however,
suffer severe degradation when they are applied to detection of instantaneously
occurred changes with only a few preceding frames provided. In this paper, we
exploit spatio-temporal convolutional networks to address this challenge, and
propose a novel retrospective convolution, which features efficient change
information extraction between the current frame and frames from historical
observation. To address the problem of foreground-specific over-fitting in
learning-based methods, we further propose a data augmentation method, named
static sample synthesis, to guide the network to focus on learning change-cued
information rather than specific spatial features of foreground. Trained
end-to-end with complex scenarios, our framework proves to be accurate in
detecting instantaneous changes and robust in combating diverse noises.
Extensive experiments demonstrate that our proposed method significantly
outperforms existing methods.Comment: 10 pages, 9 figure
From Brain Imaging to Graph Analysis: a study on ADNI's patient cohort
In this paper, we studied the association between the change of structural
brain volumes to the potential development of Alzheimer's disease (AD). Using a
simple abstraction technique, we converted regional cortical and subcortical
volume differences over two time points for each study subject into a graph. We
then obtained substructures of interest using a graph decomposition algorithm
in order to extract pivotal nodes via multi-view feature selection. Intensive
experiments using robust classification frameworks were conducted to evaluate
the performance of using the brain substructures obtained under different
thresholds. The results indicated that compact substructures acquired by
examining the differences between patient groups were sufficient to
discriminate between AD and healthy controls with an area under the receiver
operating curve of 0.72
Quantification of MagLIF morphology using the Mallat Scattering Transformation
The morphology of the stagnated plasma resulting from Magnetized Liner
Inertial Fusion (MagLIF) is measured by imaging the self-emission x-rays coming
from the multi-keV plasma. Equivalent diagnostic response can be generated by
integrated radiation-magnetohydrodynamic (rad-MHD) simulations from programs
such as HYDRA and GORGON. There have been only limited quantitative ways to
compare the image morphology, that is the texture, of simulations and
experiments. We have developed a metric of image morphology based on the Mallat
Scattering Transformation (MST), a transformation that has proved to be
effective at distinguishing textures, sounds, and written characters. This
metric is designed, demonstrated, and refined by classifying ensembles (i.e.,
classes) of synthetic stagnation images, and by regressing an ensemble of
synthetic stagnation images to the morphology (i.e., model) parameters used to
generate the synthetic images. We use this metric to quantitatively compare
simulations to experimental images, experimental images to each other, and to
estimate the morphological parameters of the experimental images with
uncertainty. This coordinate space has proved very adept at doing a
sophisticated relative background subtraction in the MST space. This was needed
to compare the experimental self emission images to the rad-MHD simulation
images.Comment: 19 pages, 18 figures, 3 tables, 4 animations, accepted for
publication in Physics of Plasmas; arXiv admin note: substantial text overlap
with arXiv:1911.0235
A Deep Convolutional Neural Network to Analyze Position Averaged Convergent Beam Electron Diffraction Patterns
We establish a series of deep convolutional neural networks to automatically
analyze position averaged convergent beam electron diffraction patterns. The
networks first calibrate the zero-order disk size, center position, and
rotation without the need for pretreating the data. With the aligned data,
additional networks then measure the sample thickness and tilt. The performance
of the network is explored as a function of a variety of variables including
thickness, tilt, and dose. A methodology to explore the response of the neural
network to various pattern features is also presented. Processing patterns at a
rate of 0.1 s/pattern, the network is shown to be orders of magnitude
faster than a brute force method while maintaining accuracy. The approach is
thus suitable for automatically processing big, 4D STEM data. We also discuss
the generality of the method to other materials/orientations as well as a
hybrid approach that combines the features of the neural network with least
squares fitting for even more robust analysis. The source code is available at
https://github.com/subangstrom/DeepDiffraction
DeepPBM: Deep Probabilistic Background Model Estimation from Video Sequences
This paper presents a novel unsupervised probabilistic model estimation of
visual background in video sequences using a variational autoencoder framework.
Due to the redundant nature of the backgrounds in surveillance videos, visual
information of the background can be compressed into a low-dimensional subspace
in the encoder part of the variational autoencoder, while the highly variant
information of its moving foreground gets filtered throughout its
encoding-decoding process. Our deep probabilistic background model (DeepPBM)
estimation approach is enabled by the power of deep neural networks in learning
compressed representations of video frames and reconstructing them back to the
original domain. We evaluated the performance of our DeepPBM in background
subtraction on 9 surveillance videos from the background model challenge
(BMC2012) dataset, and compared that with a standard subspace learning
technique, robust principle component analysis (RPCA), which similarly
estimates a deterministic low dimensional representation of the background in
videos and is widely used for this application. Our method outperforms RPCA on
BMC2012 dataset with 23% in average in F-measure score, emphasizing that
background subtraction using the trained model can be done in more than 10
times faster
Spoofing Detection Goes Noisy: An Analysis of Synthetic Speech Detection in the Presence of Additive Noise
Automatic speaker verification (ASV) technology is recently finding its way
to end-user applications for secure access to personal data, smart services or
physical facilities. Similar to other biometric technologies, speaker
verification is vulnerable to spoofing attacks where an attacker masquerades as
a particular target speaker via impersonation, replay, text-to-speech (TTS) or
voice conversion (VC) techniques to gain illegitimate access to the system. We
focus on TTS and VC that represent the most flexible, high-end spoofing
attacks. Most of the prior studies on synthesized or converted speech detection
report their findings using high-quality clean recordings. Meanwhile, the
performance of spoofing detectors in the presence of additive noise, an
important consideration in practical ASV implementations, remains largely
unknown. To this end, we analyze the suitability of state-of-the-art synthetic
speech detectors under additive noise with a special focus on front-end
features. Our comparison includes eight acoustic feature sets, five related to
spectral magnitude and three to spectral phase information. Our extensive
experiments on ASVSpoof 2015 corpus reveal several important findings. Firstly,
all the countermeasures break down even at relatively high signal-to-noise
ratios (SNRs) and fail to generalize to noisy conditions. Secondly, speech
enhancement is not found helpful. Thirdly, GMM back-end generally outperforms
the more involved i-vector back-end. Fourthly, concerning the compared
features, the Mel-frequency cepstral coefficients (MFCCs) and subband spectral
centroid magnitude coefficients (SCMCs) perform the best on average though the
winner method depends on SNR and noise type. Finally, a study with two score
fusion strategies shows that combining different feature based systems improves
recognition accuracy for known and unknown attacks in both clean and noisy
conditions.Comment: 23 Pages, 7 figure
Joint Background Reconstruction and Foreground Segmentation via A Two-stage Convolutional Neural Network
Foreground segmentation in video sequences is a classic topic in computer
vision. Due to the lack of semantic and prior knowledge, it is difficult for
existing methods to deal with sophisticated scenes well. Therefore, in this
paper, we propose an end-to-end two-stage deep convolutional neural network
(CNN) framework for foreground segmentation in video sequences. In the first
stage, a convolutional encoder-decoder sub-network is employed to reconstruct
the background images and encode rich prior knowledge of background scenes. In
the second stage, the reconstructed background and current frame are input into
a multi-channel fully-convolutional sub-network (MCFCN) for accurate foreground
segmentation. In the two-stage CNN, the reconstruction loss and segmentation
loss are jointly optimized. The background images and foreground objects are
output simultaneously in an end-to-end way. Moreover, by incorporating the
prior semantic knowledge of foreground and background in the pre-training
process, our method could restrain the background noise and keep the integrity
of foreground objects at the same time. Experiments on CDNet 2014 show that our
method outperforms the state-of-the-art by 4.9%.Comment: ICME 201
- …