243 research outputs found
Canonical Correlation Analysis (CCA) Based Multi-View Learning: An Overview
Multi-view learning (MVL) is a strategy for fusing data from different
sources or subsets. Canonical correlation analysis (CCA) is very important in
MVL, whose main idea is to map data from different views onto a common space
with the maximum correlation. The traditional CCA can only be used to calculate
the linear correlation between two views. Moreover, it is unsupervised, and the
label information is wasted in supervised learning tasks. Many nonlinear,
supervised, or generalized extensions have been proposed to overcome these
limitations. However, to our knowledge, there is no up-to-date overview of
these approaches. This paper fills this gap, by providing a comprehensive
overview of many classical and latest CCA approaches, and describing their
typical applications in pattern recognition, multi-modal retrieval and
classification, and multi-view embedding
Embedded Deep Bilinear Interactive Information and Selective Fusion for Multi-view Learning
As a concrete application of multi-view learning, multi-view classification
improves the traditional classification methods significantly by integrating
various views optimally. Although most of the previous efforts have been
demonstrated the superiority of multi-view learning, it can be further improved
by comprehensively embedding more powerful cross-view interactive information
and a more reliable multi-view fusion strategy in intensive studies. To fulfill
this goal, we propose a novel multi-view learning framework to make the
multi-view classification better aimed at the above-mentioned two aspects. That
is, we seamlessly embed various intra-view information, cross-view
multi-dimension bilinear interactive information, and a new view ensemble
mechanism into a unified framework to make a decision via the optimization. In
particular, we train different deep neural networks to learn various intra-view
representations, and then dynamically learn multi-dimension bilinear
interactive information from different bilinear similarities via the bilinear
function between views. After that, we adaptively fuse the representations of
multiple views by flexibly tuning the parameters of the view-weight, which not
only avoids the trivial solution of weight but also provides a new way to
select a few discriminative views that are beneficial to make a decision for
the multi-view classification. Extensive experiments on six publicly available
datasets demonstrate the effectiveness of the proposed method
A Comprehensive Survey on Cross-modal Retrieval
In recent years, cross-modal retrieval has drawn much attention due to the
rapid growth of multimodal data. It takes one type of data as the query to
retrieve relevant data of another type. For example, a user can use a text to
retrieve relevant pictures or videos. Since the query and its retrieved results
can be of different modalities, how to measure the content similarity between
different modalities of data remains a challenge. Various methods have been
proposed to deal with such a problem. In this paper, we first review a number
of representative methods for cross-modal retrieval and classify them into two
main groups: 1) real-valued representation learning, and 2) binary
representation learning. Real-valued representation learning methods aim to
learn real-valued common representations for different modalities of data. To
speed up the cross-modal retrieval, a number of binary representation learning
methods are proposed to map different modalities of data into a common Hamming
space. Then, we introduce several multimodal datasets in the community, and
show the experimental results on two commonly used multimodal datasets. The
comparison reveals the characteristic of different kinds of cross-modal
retrieval methods, which is expected to benefit both practical applications and
future research. Finally, we discuss open problems and future research
directions.Comment: 20 pages, 11 figures, 9 table
A Multi-view Dimensionality Reduction Algorithm Based on Smooth Representation Model
Over the past few decades, we have witnessed a large family of algorithms
that have been designed to provide different solutions to the problem of
dimensionality reduction (DR). The DR is an essential tool to excavate the
important information from the high-dimensional data by mapping the data to a
low-dimensional subspace. Furthermore, for the diversity of varied
high-dimensional data, the multi-view features can be utilized for improving
the learning performance. However, many DR methods fail to integrating multiple
views. Although the features from different views are extracted by different
manners, they are utilized to describe the same sample, which implies that they
are highly related. Therefore, how to learn the subspace for high-dimensional
features by utilizing the consistency and complementary properties of
multi-view features is important in the present. In this paper, we propose an
effective multi-view dimensionality reduction algorithm named Multi-view Smooth
Preserve Projection. Firstly, we construct a single view DR method named Smooth
Preserve Projection based on the Smooth Representation model. The proposed
method aims to find a subspace for the high-dimensional data, in which the
smooth reconstructive weights are preserved as much as possible. Then, we
extend it to a multi-view version in which we exploits Hilbert-Schmidt
Independence Criterion to jointly learn one common subspace for all views. A
plenty of experiments on multi-view datasets show the excellent performance of
the proposed method.Comment: Revise some experimental results and formulate
Multi-view Locality Low-rank Embedding for Dimension Reduction
During the last decades, we have witnessed a surge of interests of learning a
low-dimensional space with discriminative information from one single view.
Even though most of them can achieve satisfactory performance in some certain
situations, they fail to fully consider the information from multiple views
which are highly relevant but sometimes look different from each other.
Besides, correlations between features from multiple views always vary greatly,
which challenges multi-view subspace learning. Therefore, how to learn an
appropriate subspace which can maintain valuable information from multi-view
features is of vital importance but challenging. To tackle this problem, this
paper proposes a novel multi-view dimension reduction method named Multi-view
Locality Low-rank Embedding for Dimension Reduction (MvL2E). MvL2E makes full
use of correlations between multi-view features by adopting low-rank
representations. Meanwhile, it aims to maintain the correlations and construct
a suitable manifold space to capture the low-dimensional embedding for
multi-view features. A centroid based scheme is designed to force multiple
views to learn from each other. And an iterative alternating strategy is
developed to obtain the optimal solution of MvL2E. The proposed method is
evaluated on 5 benchmark datasets. Comprehensive experiments show that our
proposed MvL2E can achieve comparable performance with previous approaches
proposed in recent literatures
Marrying Tracking with ELM: A Metric Constraint Guided Multiple Feature Fusion Method
Object Tracking is one important problem in computer vision and surveillance
system. The existing models mainly exploit the single-view feature (i.e. color,
texture, shape) to solve the problem, failing to describe the objects
comprehensively. In this paper, we solve the problem from multi-view
perspective by leveraging multi-view complementary and latent information, so
as to be robust to the partial occlusion and background clutter especially when
the objects are similar to the target, meanwhile addressing tracking drift.
However, one big problem is that multi-view fusion strategy can inevitably
result tracking into non-efficiency. To this end, we propose to marry ELM
(Extreme learning machine) to multi-view fusion to train the global hidden
output weight, to effectively exploit the local information from each view.
Following this principle, we propose a novel method to obtain the optimal
sample as the target object, which avoids tracking drift resulting from noisy
samples. Our method is evaluated over 12 challenge image sequences challenged
with different attributes including illumination, occlusion, deformation, etc.,
which demonstrates better performance than several state-of-the-art methods in
terms of effectiveness and robustness.Comment: arXiv admin note: substantial text overlap with arXiv:1807.1021
Unsupervised Multi-modal Hashing for Cross-modal retrieval
With the advantage of low storage cost and high efficiency, hashing learning
has received much attention in the domain of Big Data. In this paper, we
propose a novel unsupervised hashing learning method to cope with this open
problem to directly preserve the manifold structure by hashing. To address this
problem, both the semantic correlation in textual space and the locally
geometric structure in the visual space are explored simultaneously in our
framework. Besides, the `2;1-norm constraint is imposed on the projection
matrices to learn the discriminative hash function for each modality. Extensive
experiments are performed to evaluate the proposed method on the three publicly
available datasets and the experimental results show that our method can
achieve superior performance over the state-of-the-art methods.Comment: 4 pages, 4 figure
Pose and Shape Estimation with Discriminatively Learned Parts
We introduce a new approach for estimating the 3D pose and the 3D shape of an
object from a single image. Given a training set of view exemplars, we learn
and select appearance-based discriminative parts which are mapped onto the 3D
model from the training set through a facil- ity location optimization. The
training set of 3D models is summarized into a sparse set of shapes from which
we can generalize by linear combination. Given a test picture, we detect
hypotheses for each part. The main challenge is to select from these hypotheses
and compute the 3D pose and shape coefficients at the same time. To achieve
this, we optimize a function that minimizes simultaneously the geometric
reprojection error as well as the appearance matching of the parts. We apply
the alternating direction method of multipliers (ADMM) to minimize the
resulting convex function. We evaluate our approach on the Fine Grained 3D Car
dataset with superior performance in shape and pose errors. Our main and novel
contribution is the simultaneous solution for part localization, 3D pose and
shape by maximizing both geometric and appearance compatibility
Graph Multiview Canonical Correlation Analysis
Multiview canonical correlation analysis (MCCA) seeks latent low-dimensional
representations encountered with multiview data of shared entities (a.k.a.
common sources). However, existing MCCA approaches do not exploit the geometry
of the common sources, which may be available \emph{a priori}, or can be
constructed using certain domain knowledge. This prior information about the
common sources can be encoded by a graph, and be invoked as a regularizer to
enrich the maximum variance MCCA framework. In this context, the present
paper's novel graph-regularized (G) MCCA approach minimizes the distance
between the wanted canonical variables and the common low-dimensional
representations, while accounting for graph-induced knowledge of the common
sources. Relying on a function capturing the extent low-dimensional
representations of the multiple views are similar, a generalization bound of
GMCCA is established based on Rademacher's complexity. Tailored for setups
where the number of data pairs is smaller than the data vector dimensions, a
graph-regularized dual MCCA approach is also developed. To further deal with
nonlinearities present in the data, graph-regularized kernel MCCA variants are
put forward too. Interestingly, solutions of the graph-regularized linear,
dual, and kernel MCCA, are all provided in terms of generalized eigenvalue
decomposition. Several corroborating numerical tests using real datasets are
provided to showcase the merits of the graph-regularized MCCA variants relative
to several competing alternatives including MCCA, Laplacian-regularized MCCA,
and (graph-regularized) PCA
Deep Learning for LiDAR Point Clouds in Autonomous Driving: A Review
Recently, the advancement of deep learning in discriminative feature learning
from 3D LiDAR data has led to rapid development in the field of autonomous
driving. However, automated processing uneven, unstructured, noisy, and massive
3D point clouds is a challenging and tedious task. In this paper, we provide a
systematic review of existing compelling deep learning architectures applied in
LiDAR point clouds, detailing for specific tasks in autonomous driving such as
segmentation, detection, and classification. Although several published
research papers focus on specific topics in computer vision for autonomous
vehicles, to date, no general survey on deep learning applied in LiDAR point
clouds for autonomous vehicles exists. Thus, the goal of this paper is to
narrow the gap in this topic. More than 140 key contributions in the recent
five years are summarized in this survey, including the milestone 3D deep
architectures, the remarkable deep learning applications in 3D semantic
segmentation, object detection, and classification; specific datasets,
evaluation metrics, and the state of the art performance. Finally, we conclude
the remaining challenges and future researches.Comment: 21 pages, submitted to IEEE Transactions on Neural Networks and
Learning System
- …