170 research outputs found
Moving Object Detection by Detecting Contiguous Outliers in the Low-Rank Representation
Object detection is a fundamental step for automated video analysis in many
vision applications. Object detection in a video is usually performed by object
detectors or background subtraction techniques. Often, an object detector
requires manually labeled examples to train a binary classifier, while
background subtraction needs a training sequence that contains no objects to
build a background model. To automate the analysis, object detection without a
separate training phase becomes a critical task. People have tried to tackle
this task by using motion information. But existing motion-based methods are
usually limited when coping with complex scenarios such as nonrigid motion and
dynamic background. In this paper, we show that above challenges can be
addressed in a unified framework named DEtecting Contiguous Outliers in the
LOw-rank Representation (DECOLOR). This formulation integrates object detection
and background learning into a single process of optimization, which can be
solved by an alternating algorithm efficiently. We explain the relations
between DECOLOR and other sparsity-based methods. Experiments on both simulated
data and real sequences demonstrate that DECOLOR outperforms the
state-of-the-art approaches and it can work effectively on a wide range of
complex scenarios.Comment: 30 page
Convex and Network Flow Optimization for Structured Sparsity
We consider a class of learning problems regularized by a structured
sparsity-inducing norm defined as the sum of l_2- or l_infinity-norms over
groups of variables. Whereas much effort has been put in developing fast
optimization techniques when the groups are disjoint or embedded in a
hierarchy, we address here the case of general overlapping groups. To this end,
we present two different strategies: On the one hand, we show that the proximal
operator associated with a sum of l_infinity-norms can be computed exactly in
polynomial time by solving a quadratic min-cost flow problem, allowing the use
of accelerated proximal gradient methods. On the other hand, we use proximal
splitting techniques, and address an equivalent formulation with
non-overlapping groups, but in higher dimension and with additional
constraints. We propose efficient and scalable algorithms exploiting these two
strategies, which are significantly faster than alternative approaches. We
illustrate these methods with several problems such as CUR matrix
factorization, multi-task learning of tree-structured dictionaries, background
subtraction in video sequences, image denoising with wavelets, and topographic
dictionary learning of natural image patches.Comment: to appear in the Journal of Machine Learning Research (JMLR
Structured random measurements in signal processing
Compressed sensing and its extensions have recently triggered interest in
randomized signal acquisition. A key finding is that random measurements
provide sparse signal reconstruction guarantees for efficient and stable
algorithms with a minimal number of samples. While this was first shown for
(unstructured) Gaussian random measurement matrices, applications require
certain structure of the measurements leading to structured random measurement
matrices. Near optimal recovery guarantees for such structured measurements
have been developed over the past years in a variety of contexts. This article
surveys the theory in three scenarios: compressed sensing (sparse recovery),
low rank matrix recovery, and phaseless estimation. The random measurement
matrices to be considered include random partial Fourier matrices, partial
random circulant matrices (subsampled convolutions), matrix completion, and
phase estimation from magnitudes of Fourier type measurements. The article
concludes with a brief discussion of the mathematical techniques for the
analysis of such structured random measurements.Comment: 22 pages, 2 figure
Object detection in videos using principal component pursuit and convolutional neural networks
Object recognition in videos is one of the main challenges in computer vision. Several methods have been proposed to achieve this task, such as background subtraction, temporal differencing, optical
flow, particle filtering among others. Since the introduction of Convolutonal Neural Networks (CNN)
for object detection in the Imagenet Large Scale Visual Recognition Competition (ILSVRC), its use
for image detection and classification has increased, becoming the state-of-the-art for such task, being
Faster R-CNN the preferred model in the latest ILSVRC challenges. Moreover, the Faster R-CNN
model, with minimum modifications, has been succesfully used to detect and classify objects (either
static or dynamic) in video sequences; in such setup, the frames of the video are input “as is” i.e.
without any pre-processing. In this thesis work we propose to use Robust PCA (RPCA, a.k.a. Principal Component Pursuit, PCP), as a video background modeling pre-processing step, before using the Faster R-CNN model, in order to improve the overall performance of detection and classification of, specifically, the moving objects. We hypothesize that such pre-processing step, which segments the moving objects from the background, would reduce the amount of regions to be analyzed in a given frame and thus (i) improve the classification time and (ii) reduce the error in classification for the dynamic objects present in the video. In particular, we use a fully incremental RPCA / PCP algorithm that is suitable for real-time or on-line processing. Furthermore, we present extensive computational results that were carried out in three different platforms: A high-end server with a Tesla K40m GPU, a desktop with a Tesla K10m GPU and the embedded system Jetson TK1. Our classification results attain competitive or superior performance in terms of Fmeasure, achieving an improvement ranging from 3.7% to 97.2%, with a mean improvement of 22% when the sparse image was used to detect and classify the object with the neural network, while at the same time, reducing the classification time in all architectures by a factor raging between 2% and 25%.Tesi
- …