2,174 research outputs found
Good Features to Correlate for Visual Tracking
During the recent years, correlation filters have shown dominant and
spectacular results for visual object tracking. The types of the features that
are employed in these family of trackers significantly affect the performance
of visual tracking. The ultimate goal is to utilize robust features invariant
to any kind of appearance change of the object, while predicting the object
location as properly as in the case of no appearance change. As the deep
learning based methods have emerged, the study of learning features for
specific tasks has accelerated. For instance, discriminative visual tracking
methods based on deep architectures have been studied with promising
performance. Nevertheless, correlation filter based (CFB) trackers confine
themselves to use the pre-trained networks which are trained for object
classification problem. To this end, in this manuscript the problem of learning
deep fully convolutional features for the CFB visual tracking is formulated. In
order to learn the proposed model, a novel and efficient backpropagation
algorithm is presented based on the loss function of the network. The proposed
learning framework enables the network model to be flexible for a custom
design. Moreover, it alleviates the dependency on the network trained for
classification. Extensive performance analysis shows the efficacy of the
proposed custom design in the CFB tracking framework. By fine-tuning the
convolutional parts of a state-of-the-art network and integrating this model to
a CFB tracker, which is the top performing one of VOT2016, 18% increase is
achieved in terms of expected average overlap, and tracking failures are
decreased by 25%, while maintaining the superiority over the state-of-the-art
methods in OTB-2013 and OTB-2015 tracking datasets.Comment: Accepted version of IEEE Transactions on Image Processin
Deep Attributes Driven Multi-Camera Person Re-identification
The visual appearance of a person is easily affected by many factors like
pose variations, viewpoint changes and camera parameter differences. This makes
person Re-Identification (ReID) among multiple cameras a very challenging task.
This work is motivated to learn mid-level human attributes which are robust to
such visual appearance variations. And we propose a semi-supervised attribute
learning framework which progressively boosts the accuracy of attributes only
using a limited number of labeled data. Specifically, this framework involves a
three-stage training. A deep Convolutional Neural Network (dCNN) is first
trained on an independent dataset labeled with attributes. Then it is
fine-tuned on another dataset only labeled with person IDs using our defined
triplet loss. Finally, the updated dCNN predicts attribute labels for the
target dataset, which is combined with the independent dataset for the final
round of fine-tuning. The predicted attributes, namely \emph{deep attributes}
exhibit superior generalization ability across different datasets. By directly
using the deep attributes with simple Cosine distance, we have obtained
surprisingly good accuracy on four person ReID datasets. Experiments also show
that a simple metric learning modular further boosts our method, making it
significantly outperform many recent works.Comment: Person Re-identification; 17 pages; 5 figures; In IEEE ECCV 201
- …