5,853 research outputs found
A Deep-structured Conditional Random Field Model for Object Silhouette Tracking
In this work, we introduce a deep-structured conditional random field
(DS-CRF) model for the purpose of state-based object silhouette tracking. The
proposed DS-CRF model consists of a series of state layers, where each state
layer spatially characterizes the object silhouette at a particular point in
time. The interactions between adjacent state layers are established by
inter-layer connectivity dynamically determined based on inter-frame optical
flow. By incorporate both spatial and temporal context in a dynamic fashion
within such a deep-structured probabilistic graphical model, the proposed
DS-CRF model allows us to develop a framework that can accurately and
efficiently track object silhouettes that can change greatly over time, as well
as under different situations such as occlusion and multiple targets within the
scene. Experiment results using video surveillance datasets containing
different scenarios such as occlusion and multiple targets showed that the
proposed DS-CRF approach provides strong object silhouette tracking performance
when compared to baseline methods such as mean-shift tracking, as well as
state-of-the-art methods such as context tracking and boosted particle
filtering.Comment: 17 page
Online Domain Adaptation for Multi-Object Tracking
Automatically detecting, labeling, and tracking objects in videos depends
first and foremost on accurate category-level object detectors. These might,
however, not always be available in practice, as acquiring high-quality large
scale labeled training datasets is either too costly or impractical for all
possible real-world application scenarios. A scalable solution consists in
re-using object detectors pre-trained on generic datasets. This work is the
first to investigate the problem of on-line domain adaptation of object
detectors for causal multi-object tracking (MOT). We propose to alleviate the
dataset bias by adapting detectors from category to instances, and back: (i) we
jointly learn all target models by adapting them from the pre-trained one, and
(ii) we also adapt the pre-trained model on-line. We introduce an on-line
multi-task learning algorithm to efficiently share parameters and reduce drift,
while gradually improving recall. Our approach is applicable to any linear
object detector, and we evaluate both cheap "mini-Fisher Vectors" and expensive
"off-the-shelf" ConvNet features. We quantitatively measure the benefit of our
domain adaptation strategy on the KITTI tracking benchmark and on a new dataset
(PASCAL-to-KITTI) we introduce to study the domain mismatch problem in MOT.Comment: To appear at BMVC 201
Online learning and detection of faces with low human supervision
The final publication is available at link.springer.comWe present an efficient,online,and interactive approach for computing a classifier, called Wild Lady Ferns (WiLFs), for face learning and detection using small human supervision. More precisely, on the one hand, WiLFs combine online boosting and extremely randomized trees (Random Ferns) to compute progressively an efficient and discriminative classifier. On the other hand, WiLFs use an interactive human-machine approach that combines two complementary learning strategies to reduce considerably the degree of human supervision during learning. While the first strategy corresponds to query-by-boosting active learning, that requests human assistance over difficult samples in function of the classifier confidence, the second strategy refers to a memory-based learning which uses ¿ Exemplar-based Nearest Neighbors (¿ENN) to assist automatically the classifier. A pre-trained Convolutional Neural Network (CNN) is used to perform ¿ENN with high-level feature descriptors. The proposed approach is therefore fast (WilFs run in 1 FPS using a code not fully optimized), accurate (we obtain detection rates over 82% in complex datasets), and labor-saving (human assistance percentages of less than 20%).
As a byproduct, we demonstrate that WiLFs also perform semi-automatic annotation during learning, as while the classifier is being computed, WiLFs are discovering faces instances in input images which are used subsequently for training online the classifier. The advantages of our approach are demonstrated in synthetic and publicly available databases, showing comparable detection rates as offline approaches that require larger amounts of handmade training data.Peer ReviewedPostprint (author's final draft
Face analysis using curve edge maps
This paper proposes an automatic and real-time system for face analysis, usable in visual communication applications. In this approach, faces are represented with Curve Edge Maps, which are collections of polynomial segments with a convex region. The segments are extracted from edge pixels using an adaptive incremental linear-time fitting algorithm, which is based on constructive polynomial fitting. The face analysis system considers face tracking, face recognition and facial feature detection, using Curve Edge Maps driven by histograms of intensities and histograms of relative positions. When applied to different face databases and video sequences, the average face recognition rate is 95.51%, the average facial feature detection rate is 91.92% and the accuracy in location of the facial features is 2.18% in terms of the size of the face, which is comparable with or better than the results in literature. However, our method has the advantages of simplicity, real-time performance and extensibility to the different aspects of face analysis, such as recognition of facial expressions and talking
Good Features to Correlate for Visual Tracking
During the recent years, correlation filters have shown dominant and
spectacular results for visual object tracking. The types of the features that
are employed in these family of trackers significantly affect the performance
of visual tracking. The ultimate goal is to utilize robust features invariant
to any kind of appearance change of the object, while predicting the object
location as properly as in the case of no appearance change. As the deep
learning based methods have emerged, the study of learning features for
specific tasks has accelerated. For instance, discriminative visual tracking
methods based on deep architectures have been studied with promising
performance. Nevertheless, correlation filter based (CFB) trackers confine
themselves to use the pre-trained networks which are trained for object
classification problem. To this end, in this manuscript the problem of learning
deep fully convolutional features for the CFB visual tracking is formulated. In
order to learn the proposed model, a novel and efficient backpropagation
algorithm is presented based on the loss function of the network. The proposed
learning framework enables the network model to be flexible for a custom
design. Moreover, it alleviates the dependency on the network trained for
classification. Extensive performance analysis shows the efficacy of the
proposed custom design in the CFB tracking framework. By fine-tuning the
convolutional parts of a state-of-the-art network and integrating this model to
a CFB tracker, which is the top performing one of VOT2016, 18% increase is
achieved in terms of expected average overlap, and tracking failures are
decreased by 25%, while maintaining the superiority over the state-of-the-art
methods in OTB-2013 and OTB-2015 tracking datasets.Comment: Accepted version of IEEE Transactions on Image Processin
Data association and occlusion handling for vision-based people tracking by mobile robots
This paper presents an approach for tracking multiple persons on a mobile robot with a combination of colour and thermal vision sensors, using several new techniques. First, an adaptive colour model is incorporated into the measurement model of the tracker. Second, a new approach for detecting occlusions is introduced, using a machine learning classifier for pairwise comparison of persons (classifying which one is in front of the other). Third, explicit occlusion handling is incorporated into the tracker. The paper presents a comprehensive, quantitative evaluation of the whole system and its different components using several real world data sets
- …