100,918 research outputs found
Discriminative Scale Space Tracking
Accurate scale estimation of a target is a challenging research problem in
visual object tracking. Most state-of-the-art methods employ an exhaustive
scale search to estimate the target size. The exhaustive search strategy is
computationally expensive and struggles when encountered with large scale
variations. This paper investigates the problem of accurate and robust scale
estimation in a tracking-by-detection framework. We propose a novel scale
adaptive tracking approach by learning separate discriminative correlation
filters for translation and scale estimation. The explicit scale filter is
learned online using the target appearance sampled at a set of different
scales. Contrary to standard approaches, our method directly learns the
appearance change induced by variations in the target scale. Additionally, we
investigate strategies to reduce the computational cost of our approach.
Extensive experiments are performed on the OTB and the VOT2014 datasets.
Compared to the standard exhaustive scale search, our approach achieves a gain
of 2.5% in average overlap precision on the OTB dataset. Additionally, our
method is computationally efficient, operating at a 50% higher frame rate
compared to the exhaustive scale search. Our method obtains the top rank in
performance by outperforming 19 state-of-the-art trackers on OTB and 37
state-of-the-art trackers on VOT2014.Comment: To appear in TPAMI. This is the journal extension of the
VOT2014-winning DSST tracking metho
Selective Deep Convolutional Features for Image Retrieval
Convolutional Neural Network (CNN) is a very powerful approach to extract
discriminative local descriptors for effective image search. Recent work adopts
fine-tuned strategies to further improve the discriminative power of the
descriptors. Taking a different approach, in this paper, we propose a novel
framework to achieve competitive retrieval performance. Firstly, we propose
various masking schemes, namely SIFT-mask, SUM-mask, and MAX-mask, to select a
representative subset of local convolutional features and remove a large number
of redundant features. We demonstrate that this can effectively address the
burstiness issue and improve retrieval accuracy. Secondly, we propose to employ
recent embedding and aggregating methods to further enhance feature
discriminability. Extensive experiments demonstrate that our proposed framework
achieves state-of-the-art retrieval accuracy.Comment: Accepted to ACM MM 201
Cross-dimensional Weighting for Aggregated Deep Convolutional Features
We propose a simple and straightforward way of creating powerful image
representations via cross-dimensional weighting and aggregation of deep
convolutional neural network layer outputs. We first present a generalized
framework that encompasses a broad family of approaches and includes
cross-dimensional pooling and weighting steps. We then propose specific
non-parametric schemes for both spatial- and channel-wise weighting that boost
the effect of highly active spatial responses and at the same time regulate
burstiness effects. We experiment on different public datasets for image search
and show that our approach outperforms the current state-of-the-art for
approaches based on pre-trained networks. We also provide an easy-to-use, open
source implementation that reproduces our results.Comment: Accepted for publications at the 4th Workshop on Web-scale Vision and
Social Media (VSM), ECCV 201
Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking
Discriminative Correlation Filters (DCF) have demonstrated excellent
performance for visual object tracking. The key to their success is the ability
to efficiently exploit available negative data by including all shifted
versions of a training sample. However, the underlying DCF formulation is
restricted to single-resolution feature maps, significantly limiting its
potential. In this paper, we go beyond the conventional DCF framework and
introduce a novel formulation for training continuous convolution filters. We
employ an implicit interpolation model to pose the learning problem in the
continuous spatial domain. Our proposed formulation enables efficient
integration of multi-resolution deep feature maps, leading to superior results
on three object tracking benchmarks: OTB-2015 (+5.1% in mean OP), Temple-Color
(+4.6% in mean OP), and VOT2015 (20% relative reduction in failure rate).
Additionally, our approach is capable of sub-pixel localization, crucial for
the task of accurate feature point tracking. We also demonstrate the
effectiveness of our learning formulation in extensive feature point tracking
experiments. Code and supplementary material are available at
http://www.cvl.isy.liu.se/research/objrec/visualtracking/conttrack/index.html.Comment: Accepted at ECCV 201
Deformable GANs for Pose-based Human Image Generation
In this paper we address the problem of generating person images conditioned
on a given pose. Specifically, given an image of a person and a target pose, we
synthesize a new image of that person in the novel pose. In order to deal with
pixel-to-pixel misalignments caused by the pose differences, we introduce
deformable skip connections in the generator of our Generative Adversarial
Network. Moreover, a nearest-neighbour loss is proposed instead of the common
L1 and L2 losses in order to match the details of the generated image with the
target image. We test our approach using photos of persons in different poses
and we compare our method with previous work in this area showing
state-of-the-art results in two benchmarks. Our method can be applied to the
wider field of deformable object generation, provided that the pose of the
articulated object can be extracted using a keypoint detector.Comment: CVPR 2018 versio
- …