415 research outputs found
Self-tuned Visual Subclass Learning with Shared Samples An Incremental Approach
Computer vision tasks are traditionally defined and evaluated using semantic
categories. However, it is known to the field that semantic classes do not
necessarily correspond to a unique visual class (e.g. inside and outside of a
car). Furthermore, many of the feasible learning techniques at hand cannot
model a visual class which appears consistent to the human eye. These problems
have motivated the use of 1) Unsupervised or supervised clustering as a
preprocessing step to identify the visual subclasses to be used in a
mixture-of-experts learning regime. 2) Felzenszwalb et al. part model and other
works model mixture assignment with latent variables which is optimized during
learning 3) Highly non-linear classifiers which are inherently capable of
modelling multi-modal input space but are inefficient at the test time. In this
work, we promote an incremental view over the recognition of semantic classes
with varied appearances. We propose an optimization technique which
incrementally finds maximal visual subclasses in a regularized risk
minimization framework. Our proposed approach unifies the clustering and
classification steps in a single algorithm. The importance of this approach is
its compliance with the classification via the fact that it does not need to
know about the number of clusters, the representation and similarity measures
used in pre-processing clustering methods a priori. Following this approach we
show both qualitatively and quantitatively significant results. We show that
the visual subclasses demonstrate a long tail distribution. Finally, we show
that state of the art object detection methods (e.g. DPM) are unable to use the
tails of this distribution comprising 50\% of the training samples. In fact we
show that DPM performance slightly increases on average by the removal of this
half of the data.Comment: Updated ICCV 2013 submissio
Unsupervised Domain Adaptation: A Multi-task Learning-based Method
This paper presents a novel multi-task learning-based method for unsupervised
domain adaptation. Specifically, the source and target domain classifiers are
jointly learned by considering the geometry of target domain and the divergence
between the source and target domains based on the concept of multi-task
learning. Two novel algorithms are proposed upon the method using Regularized
Least Squares and Support Vector Machines respectively. Experiments on both
synthetic and real world cross domain recognition tasks have shown that the
proposed methods outperform several state-of-the-art domain adaptation methods
Adaptive Locality Preserving Regression
This paper proposes a novel discriminative regression method, called adaptive
locality preserving regression (ALPR) for classification. In particular, ALPR
aims to learn a more flexible and discriminative projection that not only
preserves the intrinsic structure of data, but also possesses the properties of
feature selection and interpretability. To this end, we introduce a target
learning technique to adaptively learn a more discriminative and flexible
target matrix rather than the pre-defined strict zero-one label matrix for
regression. Then a locality preserving constraint regularized by the adaptive
learned weights is further introduced to guide the projection learning, which
is beneficial to learn a more discriminative projection and avoid overfitting.
Moreover, we replace the conventional `Frobenius norm' with the special l21
norm to constrain the projection, which enables the method to adaptively select
the most important features from the original high-dimensional data for feature
extraction. In this way, the negative influence of the redundant features and
noises residing in the original data can be greatly eliminated. Besides, the
proposed method has good interpretability for features owing to the
row-sparsity property of the l21 norm. Extensive experiments conducted on the
synthetic database with manifold structure and many real-world databases prove
the effectiveness of the proposed method.Comment: The paper has been accepted by IEEE Transactions on Circuits and
Systems for Video Technology (TCSVT), and the code can be available at
https://drive.google.com/file/d/1iNzONkRByIaUhXwdEhOkkh_0d2AAXNE8/vie
Deep Motion Features for Visual Tracking
Robust visual tracking is a challenging computer vision problem, with many
real-world applications. Most existing approaches employ hand-crafted
appearance features, such as HOG or Color Names. Recently, deep RGB features
extracted from convolutional neural networks have been successfully applied for
tracking. Despite their success, these features only capture appearance
information. On the other hand, motion cues provide discriminative and
complementary information that can improve tracking performance. Contrary to
visual tracking, deep motion features have been successfully applied for action
recognition and video classification tasks. Typically, the motion features are
learned by training a CNN on optical flow images extracted from large amounts
of labeled videos.
This paper presents an investigation of the impact of deep motion features in
a tracking-by-detection framework. We further show that hand-crafted, deep RGB,
and deep motion features contain complementary information. To the best of our
knowledge, we are the first to propose fusing appearance information with deep
motion features for visual tracking. Comprehensive experiments clearly suggest
that our fusion approach with deep motion features outperforms standard methods
relying on appearance information alone.Comment: ICPR 2016. Best paper award in the "Computer Vision and Robot Vision"
trac
Learning joint feature adaptation for zero-shot recognition
Zero-shot recognition (ZSR) aims to recognize target-domain data instances of unseen classes based on the models learned from associated pairs of seen-class source and target domain data. One of the key challenges in ZSR is the relative scarcity of source-domain features (e.g. one feature vector per class), which do not fully account for wide variability in target-domain instances. In this paper we propose a novel framework of learning data-dependent feature transforms for scoring similarity between an arbitrary pair of source and target data instances to account for the wide variability in target domain. Our proposed approach is based on optimizing over a parameterized family of local feature displacements that maximize the source-target adaptive similarity functions. Accordingly we propose formulating zero-shot learning (ZSL) using latent structural SVMs to learn our similarity functions from training data. As demonstration we design a specific algorithm under the proposed framework involving bilinear similarity functions and regularized least squares as penalties for feature displacement. We test our approach on several benchmark datasets for ZSR and show significant improvement over the state-of-the-art. For instance, on aP&Y dataset we can achieve 80.89% in terms of recognition accuracy, outperforming the state-of-the-art by 11.15%
- …