41,048 research outputs found
Robust Visual Tracking via Convolutional Networks
Deep networks have been successfully applied to visual tracking by learning a
generic representation offline from numerous training images. However the
offline training is time-consuming and the learned generic representation may
be less discriminative for tracking specific objects. In this paper we present
that, even without offline training with a large amount of auxiliary data,
simple two-layer convolutional networks can be powerful enough to develop a
robust representation for visual tracking. In the first frame, we employ the
k-means algorithm to extract a set of normalized patches from the target region
as fixed filters, which integrate a series of adaptive contextual filters
surrounding the target to define a set of feature maps in the subsequent
frames. These maps measure similarities between each filter and the useful
local intensity patterns across the target, thereby encoding its local
structural information. Furthermore, all the maps form together a global
representation, which is built on mid-level features, thereby remaining close
to image-level information, and hence the inner geometric layout of the target
is also well preserved. A simple soft shrinkage method with an adaptive
threshold is employed to de-noise the global representation, resulting in a
robust sparse representation. The representation is updated via a simple and
effective online strategy, allowing it to robustly adapt to target appearance
variations. Our convolution networks have surprisingly lightweight structure,
yet perform favorably against several state-of-the-art methods on the CVPR2013
tracking benchmark dataset with 50 challenging videos
Enforcing Template Representability and Temporal Consistency for Adaptive Sparse Tracking
Sparse representation has been widely studied in visual tracking, which has
shown promising tracking performance. Despite a lot of progress, the visual
tracking problem is still a challenging task due to appearance variations over
time. In this paper, we propose a novel sparse tracking algorithm that well
addresses temporal appearance changes, by enforcing template representability
and temporal consistency (TRAC). By modeling temporal consistency, our
algorithm addresses the issue of drifting away from a tracking target. By
exploring the templates' long-term-short-term representability, the proposed
method adaptively updates the dictionary using the most descriptive templates,
which significantly improves the robustness to target appearance changes. We
compare our TRAC algorithm against the state-of-the-art approaches on 12
challenging benchmark image sequences. Both qualitative and quantitative
results demonstrate that our algorithm significantly outperforms previous
state-of-the-art trackers.Comment: 8 pages. It has been accepted for publication in 25th International
Joint Conference on Artificial Intelligence (IJCAI-16
Robust Visual Tracking Using Dynamic Classifier Selection with Sparse Representation of Label Noise
Recently a category of tracking methods based on "tracking-by-detection" is
widely used in visual tracking problem. Most of these methods update the
classifier online using the samples generated by the tracker to handle the
appearance changes. However, the self-updating scheme makes these methods
suffer from drifting problem because of the incorrect labels of weak
classifiers in training samples. In this paper, we split the class labels into
true labels and noise labels and model them by sparse representation. A novel
dynamic classifier selection method, robust to noisy training data, is
proposed. Moreover, we apply the proposed classifier selection algorithm to
visual tracking by integrating a part based online boosting framework. We have
evaluated our proposed method on 12 challenging sequences involving severe
occlusions, significant illumination changes and large pose variations. Both
the qualitative and quantitative evaluations demonstrate that our approach
tracks objects accurately and robustly and outperforms state-of-the-art
trackers.Comment: accepted at ACCV2012, Ora
Robust Structured Group Local Sparse Tracker Using Deep Features
Sparse representation has recently been successfully applied in visual
tracking. It utilizes a set of templates to represent target candidates and
find the best one with the minimum reconstruction error as the tracking result.
In this paper, we propose a robust deep features-based structured group local
sparse tracker (DF-SGLST), which exploits the deep features of local patches
inside target candidates and represents them by a set of templates in the
particle filter framework. Unlike the conventional local sparse trackers, the
proposed optimization model in DF-SGLST employs a group-sparsity regularization
term to seamlessly adopt local and spatial information of the target candidates
and attain the spatial layout structure among them. To solve the optimization
model, we propose an efficient and fast numerical algorithm that consists of
two subproblems with the closed-form solutions. Different evaluations in terms
of success and precision on the benchmarks of challenging image sequences
(e.g., OTB50 and OTB100) demonstrate the superior performance of the proposed
tracker against several state-of-the-art trackers.Comment: This submission is similar version of Structured Group Local Sparse
Tracker arXiv:1902.0618
Hierarchical Spatial-aware Siamese Network for Thermal Infrared Object Tracking
Most thermal infrared (TIR) tracking methods are discriminative, treating the
tracking problem as a classification task. However, the objective of the
classifier (label prediction) is not coupled to the objective of the tracker
(location estimation). The classification task focuses on the between-class
difference of the arbitrary objects, while the tracking task mainly deals with
the within-class difference of the same objects. In this paper, we cast the TIR
tracking problem as a similarity verification task, which is coupled well to
the objective of the tracking task. We propose a TIR tracker via a Hierarchical
Spatial-aware Siamese Convolutional Neural Network (CNN), named HSSNet. To
obtain both spatial and semantic features of the TIR object, we design a
Siamese CNN that coalesces the multiple hierarchical convolutional layers.
Then, we propose a spatial-aware network to enhance the discriminative ability
of the coalesced hierarchical feature. Subsequently, we train this network end
to end on a large visible video detection dataset to learn the similarity
between paired objects before we transfer the network into the TIR domain.
Next, this pre-trained Siamese network is used to evaluate the similarity
between the target template and target candidates. Finally, we locate the
candidate that is most similar to the tracked target. Extensive experimental
results on the benchmarks VOT-TIR 2015 and VOT-TIR 2016 show that our proposed
method achieves favourable performance compared to the state-of-the-art
methods.Comment: 20 pages, 7 figure
CARRADA Dataset: Camera and Automotive Radar with Range-Angle-Doppler Annotations
High quality perception is essential for autonomous driving (AD) systems. To
reach the accuracy and robustness that are required by such systems, several
types of sensors must be combined. Currently, mostly cameras and laser scanners
(lidar) are deployed to build a representation of the world around the vehicle.
While radar sensors have been used for a long time in the automotive industry,
they are still under-used for AD despite their appealing characteristics
(notably, their ability to measure the relative speed of obstacles and to
operate even in adverse weather conditions). To a large extent, this situation
is due to the relative lack of automotive datasets with real radar signals that
are both raw and annotated. In this work, we introduce CARRADA, a dataset of
synchronized camera and radar recordings with range-angle-Doppler annotations.
We also present a semi-automatic annotation approach, which was used to
annotate the dataset, and a radar semantic segmentation baseline, which we
evaluate on several metrics. Both our code and dataset are available online.Comment: 8 pages, 5 figues. Accepted at ICPR 2020. Erratum: results in Table
III have been updated since the ICPR proceedings, models are selected using
the PP metric instead of the previously used PR metri
A Collaborative Computer Aided Diagnosis (C-CAD) System with Eye-Tracking, Sparse Attentional Model, and Deep Learning
There are at least two categories of errors in radiology screening that can
lead to suboptimal diagnostic decisions and interventions:(i)human fallibility
and (ii)complexity of visual search. Computer aided diagnostic (CAD) tools are
developed to help radiologists to compensate for some of these errors. However,
despite their significant improvements over conventional screening strategies,
most CAD systems do not go beyond their use as second opinion tools due to
producing a high number of false positives, which human interpreters need to
correct. In parallel with efforts in computerized analysis of radiology scans,
several researchers have examined behaviors of radiologists while screening
medical images to better understand how and why they miss tumors, how they
interact with the information in an image, and how they search for unknown
pathology in the images. Eye-tracking tools have been instrumental in exploring
answers to these fundamental questions. In this paper, we aim to develop a
paradigm shift CAD system, called collaborative CAD (C-CAD), that unifies both
of the above mentioned research lines: CAD and eye-tracking. We design an
eye-tracking interface providing radiologists with a real radiology reading
room experience. Then, we propose a novel algorithm that unifies eye-tracking
data and a CAD system. Specifically, we present a new graph based clustering
and sparsification algorithm to transform eye-tracking data (gaze) into a
signal model to interpret gaze patterns quantitatively and qualitatively. The
proposed C-CAD collaborates with radiologists via eye-tracking technology and
helps them to improve diagnostic decisions. The C-CAD learns radiologists'
search efficiency by processing their gaze patterns. To do this, the C-CAD uses
a deep learning algorithm in a newly designed multi-task learning platform to
segment and diagnose cancers simultaneously.Comment: Submitted to Medical Image Analysis Journal (MedIA
cvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey
The "cvpaper.challenge" is a group composed of members from AIST, Tokyo Denki
Univ. (TDU), and Univ. of Tsukuba that aims to systematically summarize papers
on computer vision, pattern recognition, and related fields. For this
particular review, we focused on reading the ALL 602 conference papers
presented at the CVPR2015, the premier annual computer vision event held in
June 2015, in order to grasp the trends in the field. Further, we are proposing
"DeepSurvey" as a mechanism embodying the entire process from the reading
through all the papers, the generation of ideas, and to the writing of paper.Comment: Survey Pape
Unsupervised Object-Level Video Summarization with Online Motion Auto-Encoder
Unsupervised video summarization plays an important role on digesting,
browsing, and searching the ever-growing videos every day, and the underlying
fine-grained semantic and motion information (i.e., objects of interest and
their key motions) in online videos has been barely touched. In this paper, we
investigate a pioneer research direction towards the fine-grained unsupervised
object-level video summarization. It can be distinguished from existing
pipelines in two aspects: extracting key motions of participated objects, and
learning to summarize in an unsupervised and online manner. To achieve this
goal, we propose a novel online motion Auto-Encoder (online motion-AE)
framework that functions on the super-segmented object motion clips.
Comprehensive experiments on a newly-collected surveillance dataset and public
datasets have demonstrated the effectiveness of our proposed method
Unsupervised Person Re-identification by Deep Learning Tracklet Association
Mostexistingpersonre-identification(re-id)methods relyon supervised model
learning on per-camera-pair manually labelled pairwise training data. This
leads to poor scalability in practical re-id deployment due to the lack of
exhaustive identity labelling of image positive and negative pairs for every
camera pair. In this work, we address this problem by proposing an unsupervised
re-id deep learning approach capable of incrementally discovering and
exploiting the underlying re-id discriminative information from automatically
generated person tracklet data from videos in an end-to-end model optimisation.
We formulate a Tracklet Association Unsupervised Deep Learning (TAUDL)
framework characterised by jointly learning per-camera (within-camera) tracklet
association (labelling) and cross-camera tracklet correlation by maximising the
discovery of most likely tracklet relationships across camera views. Extensive
experiments demonstrate the superiority of the proposed TAUDL model over the
state-of-the-art unsupervised and domain adaptation re- id methods using six
person re-id benchmarking datasets.Comment: ECCV 2018 Ora
- …