3,050 research outputs found
Regularized Discriminant Embedding for Visual Descriptor Learning
Images can vary according to changes in viewpoint, resolution, noise, and
illumination. In this paper, we aim to learn representations for an image,
which are robust to wide changes in such environmental conditions, using
training pairs of matching and non-matching local image patches that are
collected under various environmental conditions. We present a regularized
discriminant analysis that emphasizes two challenging categories among the
given training pairs: (1) matching, but far apart pairs and (2) non-matching,
but close pairs in the original feature space (e.g., SIFT feature space).
Compared to existing work on metric learning and discriminant analysis, our
method can better distinguish relevant images from irrelevant, but look-alike
images.Comment: 3 pages + 1 additional page containing only cited references; The
full version of this manuscript is currently under review in an international
journa
Soft Locality Preserving Map (SLPM) for Facial Expression Recognition
For image recognition, an extensive number of methods have been proposed to
overcome the high-dimensionality problem of feature vectors being used. These
methods vary from unsupervised to supervised, and from statistics to
graph-theory based. In this paper, the most popular and the state-of-the-art
methods for dimensionality reduction are firstly reviewed, and then a new and
more efficient manifold-learning method, named Soft Locality Preserving Map
(SLPM), is presented. Furthermore, feature generation and sample selection are
proposed to achieve better manifold learning. SLPM is a graph-based
subspace-learning method, with the use of k-neighbourhood information and the
class information. The key feature of SLPM is that it aims to control the level
of spread of the different classes, because the spread of the classes in the
underlying manifold is closely connected to the generalizability of the learned
subspace. Our proposed manifold-learning method can be applied to various
pattern recognition applications, and we evaluate its performances on facial
expression recognition. Experiments on databases, such as the Bahcesehir
University Multilingual Affective Face Database (BAUM-2), the Extended
Cohn-Kanade (CK+) Database, the Japanese Female Facial Expression (JAFFE)
Database, and the Taiwanese Facial Expression Image Database (TFEID), show that
SLPM can effectively reduce the dimensionality of the feature vectors and
enhance the discriminative power of the extracted features for expression
recognition. Furthermore, the proposed feature-generation method can improve
the generalizability of the underlying manifolds for facial expression
recognition
Face Recognition: From Traditional to Deep Learning Methods
Starting in the seventies, face recognition has become one of the most
researched topics in computer vision and biometrics. Traditional methods based
on hand-crafted features and traditional machine learning techniques have
recently been superseded by deep neural networks trained with very large
datasets. In this paper we provide a comprehensive and up-to-date literature
review of popular face recognition methods including both traditional
(geometry-based, holistic, feature-based and hybrid methods) and deep learning
methods
Learning Expressionlets via Universal Manifold Model for Dynamic Facial Expression Recognition
Facial expression is temporally dynamic event which can be decomposed into a
set of muscle motions occurring in different facial regions over various time
intervals. For dynamic expression recognition, two key issues, temporal
alignment and semantics-aware dynamic representation, must be taken into
account. In this paper, we attempt to solve both problems via manifold modeling
of videos based on a novel mid-level representation, i.e.
\textbf{expressionlet}. Specifically, our method contains three key stages: 1)
each expression video clip is characterized as a spatial-temporal manifold
(STM) formed by dense low-level features; 2) a Universal Manifold Model (UMM)
is learned over all low-level features and represented as a set of local modes
to statistically unify all the STMs. 3) the local modes on each STM can be
instantiated by fitting to UMM, and the corresponding expressionlet is
constructed by modeling the variations in each local mode. With above strategy,
expression videos are naturally aligned both spatially and temporally. To
enhance the discriminative power, the expressionlet-based STM representation is
further processed with discriminant embedding. Our method is evaluated on four
public expression databases, CK+, MMI, Oulu-CASIA, and FERA. In all cases, our
method outperforms the known state-of-the-art by a large margin.Comment: 12 page
Kernel Methods on Riemannian Manifolds with Gaussian RBF Kernels
In this paper, we develop an approach to exploiting kernel methods with
manifold-valued data. In many computer vision problems, the data can be
naturally represented as points on a Riemannian manifold. Due to the
non-Euclidean geometry of Riemannian manifolds, usual Euclidean computer vision
and machine learning algorithms yield inferior results on such data. In this
paper, we define Gaussian radial basis function (RBF)-based positive definite
kernels on manifolds that permit us to embed a given manifold with a
corresponding metric in a high dimensional reproducing kernel Hilbert space.
These kernels make it possible to utilize algorithms developed for linear
spaces on nonlinear manifold-valued data. Since the Gaussian RBF defined with
any given metric is not always positive definite, we present a unified
framework for analyzing the positive definiteness of the Gaussian RBF on a
generic metric space. We then use the proposed framework to identify positive
definite kernels on two specific manifolds commonly encountered in computer
vision: the Riemannian manifold of symmetric positive definite matrices and the
Grassmann manifold, i.e., the Riemannian manifold of linear subspaces of a
Euclidean space. We show that many popular algorithms designed for Euclidean
spaces, such as support vector machines, discriminant analysis and principal
component analysis can be generalized to Riemannian manifolds with the help of
such positive definite Gaussian kernels
MKL-RT: Multiple Kernel Learning for Ratio-trace Problems via Convex Optimization
In the recent past, automatic selection or combination of kernels (or
features) based on multiple kernel learning (MKL) approaches has been receiving
significant attention from various research communities. Though MKL has been
extensively studied in the context of support vector machines (SVM), it is
relatively less explored for ratio-trace problems. In this paper, we show that
MKL can be formulated as a convex optimization problem for a general class of
ratio-trace problems that encompasses many popular algorithms used in various
computer vision applications. We also provide an optimization procedure that is
guaranteed to converge to the global optimum of the proposed optimization
problem. We experimentally demonstrate that the proposed MKL approach, which we
refer to as MKL-RT, can be successfully used to select features for
discriminative dimensionality reduction and cross-modal retrieval. We also show
that the proposed convex MKL-RT approach performs better than the recently
proposed non-convex MKL-DR approach
Scalable Similarity Learning using Large Margin Neighborhood Embedding
Classifying large-scale image data into object categories is an important
problem that has received increasing research attention. Given the huge amount
of data, non-parametric approaches such as nearest neighbor classifiers have
shown promising results, especially when they are underpinned by a learned
distance or similarity measurement. Although metric learning has been well
studied in the past decades, most existing algorithms are impractical to handle
large-scale data sets. In this paper, we present an image similarity learning
method that can scale well in both the number of images and the dimensionality
of image descriptors. To this end, similarity comparison is restricted to each
sample's local neighbors and a discriminative similarity measure is induced
from large margin neighborhood embedding. We also exploit the ensemble of
projections so that high-dimensional features can be processed in a set of
lower-dimensional subspaces in parallel without much performance compromise.
The similarity function is learned online using a stochastic gradient descent
algorithm in which the triplet sampling strategy is customized for quick
convergence of classification performance. The effectiveness of our proposed
model is validated on several data sets with scales varying from tens of
thousands to one million images. Recognition accuracies competitive with the
state-of-the-art performance are achieved with much higher efficiency and
scalability
Deep Linear Discriminant Analysis on Fisher Networks: A Hybrid Architecture for Person Re-identification
Person re-identification is to seek a correct match for a person of interest
across views among a large number of imposters. It typically involves two
procedures of non-linear feature extractions against dramatic appearance
changes, and subsequent discriminative analysis in order to reduce intra-
personal variations while enlarging inter-personal differences. In this paper,
we introduce a hybrid architecture which combines Fisher vectors and deep
neural networks to learn non-linear representations of person images to a space
where data can be linearly separable. We reinforce a Linear Discriminant
Analysis (LDA) on top of the deep neural network such that linearly separable
latent representations can be learnt in an end-to-end fashion. By optimizing an
objective function modified from LDA, the network is enforced to produce
feature distributions which have a low variance within the same class and high
variance between classes. The objective is essentially derived from the general
LDA eigenvalue problem and allows to train the network with stochastic gradient
descent and back-propagate LDA gradients to compute the gradients involved in
Fisher vector encoding. For evaluation we test our approach on four benchmark
data sets in person re-identification (VIPeR [1], CUHK03 [2], CUHK01 [3], and
Market1501 [4]). Extensive experiments on these benchmarks show that our model
can achieve state-of-the-art results.Comment: 12 page
Optimized Kernel-based Projection Space of Riemannian Manifolds
It is proven that encoding images and videos through Symmetric Positive
Definite (SPD) matrices, and considering the Riemannian geometry of the
resulting space, can lead to increased classification performance. Taking into
account manifold geometry is typically done via embedding the manifolds in
tangent spaces, or Reproducing Kernel Hilbert Spaces (RKHS). Recently, it was
shown that embedding such manifolds into a Random Projection Spaces (RPS),
rather than RKHS or tangent space, leads to higher classification and
clustering performance. However, based on structure and dimensionality of the
randomly generated hyperplanes, the classification performance over RPS may
vary significantly. In addition, fine-tuning RPS is data expensive (as it
requires validation-data), time consuming, and resource demanding. In this
paper, we introduce an approach to learn an optimized kernel-based projection
(with fixed dimensionality), by employing the concept of subspace clustering.
As such, we encode the association of data points to the underlying subspace of
each point, to generate meaningful hyperplanes. Further, we adopt the concept
of dictionary learning and sparse coding, and discriminative analysis, for the
optimized kernel-based projection space (OPS) on SPD manifolds. We validate our
algorithm on several classification tasks. The experiment results also
demonstrate that the proposed method outperforms state-of-the-art methods on
such manifolds.Comment: 14 pages, 6 figures, conferenc
Enhancing Person Re-identification in a Self-trained Subspace
Despite the promising progress made in recent years, person re-identification
(re-ID) remains a challenging task due to the complex variations in human
appearances from different camera views. For this challenging problem, a large
variety of algorithms have been developed in the fully-supervised setting,
requiring access to a large amount of labeled training data. However, the main
bottleneck for fully-supervised re-ID is the limited availability of labeled
training samples. To address this problem, in this paper, we propose a
self-trained subspace learning paradigm for person re-ID which effectively
utilizes both labeled and unlabeled data to learn a discriminative subspace
where person images across disjoint camera views can be easily matched. The
proposed approach first constructs pseudo pairwise relationships among
unlabeled persons using the k-nearest neighbors algorithm. Then, with the
pseudo pairwise relationships, the unlabeled samples can be easily combined
with the labeled samples to learn a discriminative projection by solving an
eigenvalue problem. In addition, we refine the pseudo pairwise relationships
iteratively, which further improves the learning performance. A multi-kernel
embedding strategy is also incorporated into the proposed approach to cope with
the non-linearity in person's appearance and explore the complementation of
multiple kernels. In this way, the performance of person re-ID can be greatly
enhanced when training data are insufficient. Experimental results on six
widely-used datasets demonstrate the effectiveness of our approach and its
performance can be comparable to the reported results of most state-of-the-art
fully-supervised methods while using much fewer labeled data.Comment: Accepted by ACM Transactions on Multimedia Computing, Communications,
and Applications (TOMM
- …