30,571 research outputs found
Efficient Discriminative Nonorthogonal Binary Subspace with its Application to Visual Tracking
One of the crucial problems in visual tracking is how the object is
represented. Conventional appearance-based trackers are using increasingly more
complex features in order to be robust. However, complex representations
typically not only require more computation for feature extraction, but also
make the state inference complicated. We show that with a careful feature
selection scheme, extremely simple yet discriminative features can be used for
robust object tracking. The central component of the proposed method is a
succinct and discriminative representation of the object using discriminative
non-orthogonal binary subspace (DNBS) which is spanned by Haar-like features.
The DNBS representation inherits the merits of the original NBS in that it
efficiently describes the object. It also incorporates the discriminative
information to distinguish foreground from background. However, the problem of
finding the DNBS bases from an over-complete dictionary is NP-hard. We propose
a greedy algorithm called discriminative optimized orthogonal matching pursuit
(D-OOMP) to solve this problem. An iterative formulation named iterative D-OOMP
is further developed to drastically reduce the redundant computation between
iterations and a hierarchical selection strategy is integrated for reducing the
search space of features. The proposed DNBS representation is applied to object
tracking through SSD-based template matching. We validate the effectiveness of
our method through extensive experiments on challenging videos with comparisons
against several state-of-the-art trackers and demonstrate its capability to
track objects in clutter and moving background.Comment: 15 page
Decomposition into Low-rank plus Additive Matrices for Background/Foreground Separation: A Review for a Comparative Evaluation with a Large-Scale Dataset
Recent research on problem formulations based on decomposition into low-rank
plus sparse matrices shows a suitable framework to separate moving objects from
the background. The most representative problem formulation is the Robust
Principal Component Analysis (RPCA) solved via Principal Component Pursuit
(PCP) which decomposes a data matrix in a low-rank matrix and a sparse matrix.
However, similar robust implicit or explicit decompositions can be made in the
following problem formulations: Robust Non-negative Matrix Factorization
(RNMF), Robust Matrix Completion (RMC), Robust Subspace Recovery (RSR), Robust
Subspace Tracking (RST) and Robust Low-Rank Minimization (RLRM). The main goal
of these similar problem formulations is to obtain explicitly or implicitly a
decomposition into low-rank matrix plus additive matrices. In this context,
this work aims to initiate a rigorous and comprehensive review of the similar
problem formulations in robust subspace learning and tracking based on
decomposition into low-rank plus additive matrices for testing and ranking
existing algorithms for background/foreground separation. For this, we first
provide a preliminary review of the recent developments in the different
problem formulations which allows us to define a unified view that we called
Decomposition into Low-rank plus Additive Matrices (DLAM). Then, we examine
carefully each method in each robust subspace learning/tracking frameworks with
their decomposition, their loss functions, their optimization problem and their
solvers. Furthermore, we investigate if incremental algorithms and real-time
implementations can be achieved for background/foreground separation. Finally,
experimental results on a large-scale dataset called Background Models
Challenge (BMC 2012) show the comparative performance of 32 different robust
subspace learning/tracking methods.Comment: 121 pages, 5 figures, submitted to Computer Science Review. arXiv
admin note: text overlap with arXiv:1312.7167, arXiv:1109.6297,
arXiv:1207.3438, arXiv:1105.2126, arXiv:1404.7592, arXiv:1210.0805,
arXiv:1403.8067 by other authors, Computer Science Review, November 201
Unsupervised Deep Context Prediction for Background Foreground Separation
In many advanced video based applications background modeling is a
pre-processing step to eliminate redundant data, for instance in tracking or
video surveillance applications. Over the past years background subtraction is
usually based on low level or hand-crafted features such as raw color
components, gradients, or local binary patterns. The background subtraction
algorithms performance suffer in the presence of various challenges such as
dynamic backgrounds, photometric variations, camera jitters, and shadows. To
handle these challenges for the purpose of accurate background modeling we
propose a unified framework based on the algorithm of image inpainting. It is
an unsupervised visual feature learning hybrid Generative Adversarial algorithm
based on context prediction. We have also presented the solution of random
region inpainting by the fusion of center region inpaiting and random region
inpainting with the help of poisson blending technique. Furthermore we also
evaluated foreground object detection with the fusion of our proposed method
and morphological operations. The comparison of our proposed method with 12
state-of-the-art methods shows its stability in the application of background
estimation and foreground detection.Comment: 17 page
Unsupervised learning of foreground object detection
Unsupervised learning poses one of the most difficult challenges in computer
vision today. The task has an immense practical value with many applications in
artificial intelligence and emerging technologies, as large quantities of
unlabeled videos can be collected at relatively low cost. In this paper, we
address the unsupervised learning problem in the context of detecting the main
foreground objects in single images. We train a student deep network to predict
the output of a teacher pathway that performs unsupervised object discovery in
videos or large image collections. Our approach is different from published
methods on unsupervised object discovery. We move the unsupervised learning
phase during training time, then at test time we apply the standard
feed-forward processing along the student pathway. This strategy has the
benefit of allowing increased generalization possibilities during training,
while remaining fast at testing. Our unsupervised learning algorithm can run
over several generations of student-teacher training. Thus, a group of student
networks trained in the first generation collectively create the teacher at the
next generation. In experiments our method achieves top results on three
current datasets for object discovery in video, unsupervised image segmentation
and saliency detection. At test time the proposed system is fast, being one to
two orders of magnitude faster than published unsupervised methods.Comment: International Journal of Computer Vision (IJCV), 201
Stories in the Eye: Contextual Visual Interactions for Efficient Video to Language Translation
Integrating higher level visual and linguistic interpretations is at the
heart of human intelligence. As automatic visual category recognition in images
is approaching human performance, the high level understanding in the dynamic
spatiotemporal domain of videos and its translation into natural language is
still far from being solved. While most works on vision-to-text translations
use pre-learned or pre-established computational linguistic models, in this
paper we present an approach that uses vision alone to efficiently learn how to
translate into language the video content. We discover, in simple form, the
story played by main actors, while using only visual cues for representing
objects and their interactions. Our method learns in a hierarchical manner
higher level representations for recognizing subjects, actions and objects
involved, their relevant contextual background and their interaction to one
another over time. We have a three stage approach: first we take in
consideration features of the individual entities at the local level of
appearance, then we consider the relationship between these objects and actions
and their video background, and third, we consider their spatiotemporal
relations as inputs to classifiers at the highest level of interpretation.
Thus, our approach finds a coherent linguistic description of videos in the
form of a subject, verb and object based on their role played in the overall
visual story learned directly from training data, without using a known
language model. We test the efficiency of our approach on a large scale dataset
containing YouTube clips taken in the wild and demonstrate state-of-the-art
performance, often superior to current approaches that use more complex,
pre-learned linguistic knowledge
Efficient Image Splicing Localization via Contrastive Feature Extraction
In this work, we propose a new data visualization and clustering technique
for discovering discriminative structures in high-dimensional data. This
technique, referred to as cPCA++, utilizes the fact that the interesting
features of a "target" dataset may be obscured by high variance components
during traditional PCA. By analyzing what is referred to as a "background"
dataset (i.e., one that exhibits the high variance principal components but not
the interesting structures), our technique is capable of efficiently
highlighting the structure that is unique to the "target" dataset. Similar to
another recently proposed algorithm called "contrastive PCA" (cPCA), the
proposed cPCA++ method identifies important dataset specific patterns that are
not detected by traditional PCA in a wide variety of settings. However, the
proposed cPCA++ method is significantly more efficient than cPCA, because it
does not require the parameter sweep in the latter approach. We applied the
cPCA++ method to the problem of image splicing localization. In this
application, we utilize authentic edges as the background dataset and the
spliced edges as the target dataset. The proposed method is significantly more
efficient than state-of-the-art methods, as the former does not require
iterative updates of filter weights via stochastic gradient descent and
backpropagation, nor the training of a classifier. Furthermore, the cPCA++
method is shown to provide performance scores comparable to the
state-of-the-art Multi-task Fully Convolutional Network (MFCN).Comment: This manuscript was submitted for publicatio
Unsupervised learning from video to detect foreground objects in single images
Unsupervised learning from visual data is one of the most difficult
challenges in computer vision, being a fundamental task for understanding how
visual recognition works. From a practical point of view, learning from
unsupervised visual input has an immense practical value, as very large
quantities of unlabeled videos can be collected at low cost. In this paper, we
address the task of unsupervised learning to detect and segment foreground
objects in single images. We achieve our goal by training a student pathway,
consisting of a deep neural network. It learns to predict from a single input
image (a video frame) the output for that particular frame, of a teacher
pathway that performs unsupervised object discovery in video. Our approach is
different from the published literature that performs unsupervised discovery in
videos or in collections of images at test time. We move the unsupervised
discovery phase during the training stage, while at test time we apply the
standard feed-forward processing along the student pathway. This has a dual
benefit: firstly, it allows in principle unlimited possibilities of learning
and generalization during training, while remaining very fast at testing.
Secondly, the student not only becomes able to detect in single images
significantly better than its unsupervised video discovery teacher, but it also
achieves state of the art results on two important current benchmarks, YouTube
Objects and Object Discovery datasets. Moreover, at test time, our system is at
least two orders of magnitude faster than other previous methods
Background Subtraction using Compressed Low-resolution Images
Image processing and recognition are an important part of the modern society,
with applications in fields such as advanced artificial intelligence, smart
assistants, and security surveillance. The essential first step involved in
almost all the visual tasks is background subtraction with a static camera.
Ensuring that this critical step is performed in the most efficient manner
would therefore improve all aspects related to objects recognition and
tracking, behavior comprehension, etc.. Although background subtraction method
has been applied for many years, its application suffers from real-time
requirement. In this letter, we present a novel approach in implementing the
background subtraction. The proposed method uses compressed, low-resolution
grayscale image for the background subtraction. These low-resolution grayscale
images were found to preserve the salient information very well. To verify the
feasibility of our methodology, two prevalent methods, ViBe and GMM, are used
in the experiment. The results of the proposed methodology confirm the
effectiveness of our approach.Comment: 4 pages,36 figure
Screen Content Image Segmentation Using Robust Regression and Sparse Decomposition
This paper considers how to separate text and/or graphics from smooth
background in screen content and mixed document images and proposes two
approaches to perform this segmentation task. The proposed methods make use of
the fact that the background in each block is usually smoothly varying and can
be modeled well by a linear combination of a few smoothly varying basis
functions, while the foreground text and graphics create sharp discontinuity.
The algorithms separate the background and foreground pixels by trying to fit
background pixel values in the block into a smooth function using two different
schemes. One is based on robust regression, where the inlier pixels will be
considered as background, while remaining outlier pixels will be considered
foreground. The second approach uses a sparse decomposition framework where the
background and foreground layers are modeled with a smooth and sparse
components respectively. These algorithms have been tested on images extracted
from HEVC standard test sequences for screen content coding, and are shown to
have superior performance over previous approaches. The proposed methods can be
used in different applications such as text extraction, separate coding of
background and foreground for compression of screen content, and medical image
segmentation
The DDO IVC Distance Project
We present the first set of distance limits from the David Dunlap Observatory
Intermediate Velocity Cloud (DDO IVC) distance project. Such distance measures
are crucial to understanding the origins and dynamics of IVCs, as the distances
set most of the basic physical parameters for the clouds. Currently there are
very few IVCs with reliably known distances. This paper describes in some
detail the basic techniques used to measure distances, with particular emphasis
on the the analysis of interstellar absorption line data, which forms the basis
of our distance determinations. As an example, we provide a detailed
description of our distance determination for the Draco Cloud. Preliminary
distance limits for a total of eleven clouds are provided.Comment: 11 pages, 5 figures, to appear in 'High-Velocity Clouds', ASP
Conference Serie
- …