14,822 research outputs found

    Comparative study of motion detection methods for video surveillance systems

    Full text link
    The objective of this study is to compare several change detection methods for a mono static camera and identify the best method for different complex environments and backgrounds in indoor and outdoor scenes. To this end, we used the CDnet video dataset as a benchmark that consists of many challenging problems, ranging from basic simple scenes to complex scenes affected by bad weather and dynamic backgrounds. Twelve change detection methods, ranging from simple temporal differencing to more sophisticated methods, were tested and several performance metrics were used to precisely evaluate the results. Because most of the considered methods have not previously been evaluated on this recent large scale dataset, this work compares these methods to fill a lack in the literature, and thus this evaluation joins as complementary compared with the previous comparative evaluations. Our experimental results show that there is no perfect method for all challenging cases, each method performs well in certain cases and fails in others. However, this study enables the user to identify the most suitable method for his or her needs.Comment: 69 pages, 18 figures, journal pape

    Audio Surveillance: a Systematic Review

    Full text link
    Despite surveillance systems are becoming increasingly ubiquitous in our living environment, automated surveillance, currently based on video sensory modality and machine intelligence, lacks most of the time the robustness and reliability required in several real applications. To tackle this issue, audio sensory devices have been taken into account, both alone or in combination with video, giving birth, in the last decade, to a considerable amount of research. In this paper audio-based automated surveillance methods are organized into a comprehensive survey: a general taxonomy, inspired by the more widespread video surveillance field, is proposed in order to systematically describe the methods covering background subtraction, event classification, object tracking and situation analysis. For each of these tasks, all the significant works are reviewed, detailing their pros and cons and the context for which they have been proposed. Moreover, a specific section is devoted to audio features, discussing their expressiveness and their employment in the above described tasks. Differently, from other surveys on audio processing and analysis, the present one is specifically targeted to automated surveillance, highlighting the target applications of each described methods and providing the reader tables and schemes useful to retrieve the most suited algorithms for a specific requirement

    Background subtraction using the factored 3-way restricted Boltzmann machines

    Full text link
    In this paper, we proposed a method for reconstructing the 3D model based on continuous sensory input. The robot can draw on extremely large data from the real world using various sensors. However, the sensory inputs are usually too noisy and high-dimensional data. It is very difficult and time consuming for robot to process using such raw data when the robot tries to construct 3D model. Hence, there needs to be a method that can extract useful information from such sensory inputs. To address this problem our method utilizes the concept of Object Semantic Hierarchy (OSH). Different from the previous work that used this hierarchy framework, we extract the motion information using the Deep Belief Network technique instead of applying classical computer vision approaches. We have trained on two large sets of random dot images (10,000) which are translated and rotated, respectively, and have successfully extracted several bases that explain the translation and rotation motion. Based on this translation and rotation bases, background subtraction have become possible using Object Semantic Hierarchy.Comment: EECS545 (2011 Winter) class project report at the University of Michigan. This is for archiving purpos

    Learning to Detect Instantaneous Changes with Retrospective Convolution and Static Sample Synthesis

    Full text link
    Change detection has been a challenging visual task due to the dynamic nature of real-world scenes. Good performance of existing methods depends largely on prior background images or a long-term observation. These methods, however, suffer severe degradation when they are applied to detection of instantaneously occurred changes with only a few preceding frames provided. In this paper, we exploit spatio-temporal convolutional networks to address this challenge, and propose a novel retrospective convolution, which features efficient change information extraction between the current frame and frames from historical observation. To address the problem of foreground-specific over-fitting in learning-based methods, we further propose a data augmentation method, named static sample synthesis, to guide the network to focus on learning change-cued information rather than specific spatial features of foreground. Trained end-to-end with complex scenarios, our framework proves to be accurate in detecting instantaneous changes and robust in combating diverse noises. Extensive experiments demonstrate that our proposed method significantly outperforms existing methods.Comment: 10 pages, 9 figure

    From Brain Imaging to Graph Analysis: a study on ADNI's patient cohort

    Full text link
    In this paper, we studied the association between the change of structural brain volumes to the potential development of Alzheimer's disease (AD). Using a simple abstraction technique, we converted regional cortical and subcortical volume differences over two time points for each study subject into a graph. We then obtained substructures of interest using a graph decomposition algorithm in order to extract pivotal nodes via multi-view feature selection. Intensive experiments using robust classification frameworks were conducted to evaluate the performance of using the brain substructures obtained under different thresholds. The results indicated that compact substructures acquired by examining the differences between patient groups were sufficient to discriminate between AD and healthy controls with an area under the receiver operating curve of 0.72

    Quantification of MagLIF morphology using the Mallat Scattering Transformation

    Full text link
    The morphology of the stagnated plasma resulting from Magnetized Liner Inertial Fusion (MagLIF) is measured by imaging the self-emission x-rays coming from the multi-keV plasma. Equivalent diagnostic response can be generated by integrated radiation-magnetohydrodynamic (rad-MHD) simulations from programs such as HYDRA and GORGON. There have been only limited quantitative ways to compare the image morphology, that is the texture, of simulations and experiments. We have developed a metric of image morphology based on the Mallat Scattering Transformation (MST), a transformation that has proved to be effective at distinguishing textures, sounds, and written characters. This metric is designed, demonstrated, and refined by classifying ensembles (i.e., classes) of synthetic stagnation images, and by regressing an ensemble of synthetic stagnation images to the morphology (i.e., model) parameters used to generate the synthetic images. We use this metric to quantitatively compare simulations to experimental images, experimental images to each other, and to estimate the morphological parameters of the experimental images with uncertainty. This coordinate space has proved very adept at doing a sophisticated relative background subtraction in the MST space. This was needed to compare the experimental self emission images to the rad-MHD simulation images.Comment: 19 pages, 18 figures, 3 tables, 4 animations, accepted for publication in Physics of Plasmas; arXiv admin note: substantial text overlap with arXiv:1911.0235

    A Deep Convolutional Neural Network to Analyze Position Averaged Convergent Beam Electron Diffraction Patterns

    Full text link
    We establish a series of deep convolutional neural networks to automatically analyze position averaged convergent beam electron diffraction patterns. The networks first calibrate the zero-order disk size, center position, and rotation without the need for pretreating the data. With the aligned data, additional networks then measure the sample thickness and tilt. The performance of the network is explored as a function of a variety of variables including thickness, tilt, and dose. A methodology to explore the response of the neural network to various pattern features is also presented. Processing patterns at a rate of ∼\sim0.1 s/pattern, the network is shown to be orders of magnitude faster than a brute force method while maintaining accuracy. The approach is thus suitable for automatically processing big, 4D STEM data. We also discuss the generality of the method to other materials/orientations as well as a hybrid approach that combines the features of the neural network with least squares fitting for even more robust analysis. The source code is available at https://github.com/subangstrom/DeepDiffraction

    DeepPBM: Deep Probabilistic Background Model Estimation from Video Sequences

    Full text link
    This paper presents a novel unsupervised probabilistic model estimation of visual background in video sequences using a variational autoencoder framework. Due to the redundant nature of the backgrounds in surveillance videos, visual information of the background can be compressed into a low-dimensional subspace in the encoder part of the variational autoencoder, while the highly variant information of its moving foreground gets filtered throughout its encoding-decoding process. Our deep probabilistic background model (DeepPBM) estimation approach is enabled by the power of deep neural networks in learning compressed representations of video frames and reconstructing them back to the original domain. We evaluated the performance of our DeepPBM in background subtraction on 9 surveillance videos from the background model challenge (BMC2012) dataset, and compared that with a standard subspace learning technique, robust principle component analysis (RPCA), which similarly estimates a deterministic low dimensional representation of the background in videos and is widely used for this application. Our method outperforms RPCA on BMC2012 dataset with 23% in average in F-measure score, emphasizing that background subtraction using the trained model can be done in more than 10 times faster

    Spoofing Detection Goes Noisy: An Analysis of Synthetic Speech Detection in the Presence of Additive Noise

    Full text link
    Automatic speaker verification (ASV) technology is recently finding its way to end-user applications for secure access to personal data, smart services or physical facilities. Similar to other biometric technologies, speaker verification is vulnerable to spoofing attacks where an attacker masquerades as a particular target speaker via impersonation, replay, text-to-speech (TTS) or voice conversion (VC) techniques to gain illegitimate access to the system. We focus on TTS and VC that represent the most flexible, high-end spoofing attacks. Most of the prior studies on synthesized or converted speech detection report their findings using high-quality clean recordings. Meanwhile, the performance of spoofing detectors in the presence of additive noise, an important consideration in practical ASV implementations, remains largely unknown. To this end, we analyze the suitability of state-of-the-art synthetic speech detectors under additive noise with a special focus on front-end features. Our comparison includes eight acoustic feature sets, five related to spectral magnitude and three to spectral phase information. Our extensive experiments on ASVSpoof 2015 corpus reveal several important findings. Firstly, all the countermeasures break down even at relatively high signal-to-noise ratios (SNRs) and fail to generalize to noisy conditions. Secondly, speech enhancement is not found helpful. Thirdly, GMM back-end generally outperforms the more involved i-vector back-end. Fourthly, concerning the compared features, the Mel-frequency cepstral coefficients (MFCCs) and subband spectral centroid magnitude coefficients (SCMCs) perform the best on average though the winner method depends on SNR and noise type. Finally, a study with two score fusion strategies shows that combining different feature based systems improves recognition accuracy for known and unknown attacks in both clean and noisy conditions.Comment: 23 Pages, 7 figure

    Joint Background Reconstruction and Foreground Segmentation via A Two-stage Convolutional Neural Network

    Full text link
    Foreground segmentation in video sequences is a classic topic in computer vision. Due to the lack of semantic and prior knowledge, it is difficult for existing methods to deal with sophisticated scenes well. Therefore, in this paper, we propose an end-to-end two-stage deep convolutional neural network (CNN) framework for foreground segmentation in video sequences. In the first stage, a convolutional encoder-decoder sub-network is employed to reconstruct the background images and encode rich prior knowledge of background scenes. In the second stage, the reconstructed background and current frame are input into a multi-channel fully-convolutional sub-network (MCFCN) for accurate foreground segmentation. In the two-stage CNN, the reconstruction loss and segmentation loss are jointly optimized. The background images and foreground objects are output simultaneously in an end-to-end way. Moreover, by incorporating the prior semantic knowledge of foreground and background in the pre-training process, our method could restrain the background noise and keep the integrity of foreground objects at the same time. Experiments on CDNet 2014 show that our method outperforms the state-of-the-art by 4.9%.Comment: ICME 201
    • …
    corecore