17,186 research outputs found
Multi-Label Dimensionality Reduction
abstract: Multi-label learning, which deals with data associated with multiple labels simultaneously, is ubiquitous in real-world applications. To overcome the curse of dimensionality in multi-label learning, in this thesis I study multi-label dimensionality reduction, which extracts a small number of features by removing the irrelevant, redundant, and noisy information while considering the correlation among different labels in multi-label learning. Specifically, I propose Hypergraph Spectral Learning (HSL) to perform dimensionality reduction for multi-label data by exploiting correlations among different labels using a hypergraph. The regularization effect on the classical dimensionality reduction algorithm known as Canonical Correlation Analysis (CCA) is elucidated in this thesis. The relationship between CCA and Orthonormalized Partial Least Squares (OPLS) is also investigated. To perform dimensionality reduction efficiently for large-scale problems, two efficient implementations are proposed for a class of dimensionality reduction algorithms, including canonical correlation analysis, orthonormalized partial least squares, linear discriminant analysis, and hypergraph spectral learning. The first approach is a direct least squares approach which allows the use of different regularization penalties, but is applicable under a certain assumption; the second one is a two-stage approach which can be applied in the regularization setting without any assumption. Furthermore, an online implementation for the same class of dimensionality reduction algorithms is proposed when the data comes sequentially. A Matlab toolbox for multi-label dimensionality reduction has been developed and released. The proposed algorithms have been applied successfully in the Drosophila gene expression pattern image annotation. The experimental results on some benchmark data sets in multi-label learning also demonstrate the effectiveness and efficiency of the proposed algorithms.Dissertation/ThesisPh.D. Computer Science 201
A Generalized EigenGame with Extensions to Multiview Representation Learning
Generalized Eigenvalue Problems (GEPs) encompass a range of interesting
dimensionality reduction methods. Development of efficient stochastic
approaches to these problems would allow them to scale to larger datasets.
Canonical Correlation Analysis (CCA) is one example of a GEP for dimensionality
reduction which has found extensive use in problems with two or more views of
the data. Deep learning extensions of CCA require large mini-batch sizes, and
therefore large memory consumption, in the stochastic setting to achieve good
performance and this has limited its application in practice. Inspired by the
Generalized Hebbian Algorithm, we develop an approach to solving stochastic
GEPs in which all constraints are softly enforced by Lagrange multipliers. Then
by considering the integral of this Lagrangian function, its pseudo-utility,
and inspired by recent formulations of Principal Components Analysis and GEPs
as games with differentiable utilities, we develop a game-theory inspired
approach to solving GEPs. We show that our approaches share much of the
theoretical grounding of the previous Hebbian and game theoretic approaches for
the linear case but our method permits extension to general function
approximators like neural networks for certain GEPs for dimensionality
reduction including CCA which means our method can be used for deep multiview
representation learning. We demonstrate the effectiveness of our method for
solving GEPs in the stochastic setting using canonical multiview datasets and
demonstrate state-of-the-art performance for optimizing Deep CCA
Canonical correlation analysis and DEA for azorean agriculture efficiency
In this paper we will document the application of canonical correlation analysis to variable aggregation using the correlations of the original variables with the canonical variates. A case study, about farms in Terceira Island, with a small data set is presented. In this data set of 30 farms we intend to use 17 input variables and 2 output variables to measure DEA efficiency. Without any data reduction procedure several problems known as “curse of dimensionality” are expected. With the data reduction procedures suggested it was possible to conclude quite acceptable and domain consistent conclusions.N/
A Comparison of Relaxations of Multiset Cannonical Correlation Analysis and Applications
Canonical correlation analysis is a statistical technique that is used to
find relations between two sets of variables. An important extension in pattern
analysis is to consider more than two sets of variables. This problem can be
expressed as a quadratically constrained quadratic program (QCQP), commonly
referred to Multi-set Canonical Correlation Analysis (MCCA). This is a
non-convex problem and so greedy algorithms converge to local optima without
any guarantees on global optimality. In this paper, we show that despite being
highly structured, finding the optimal solution is NP-Hard. This motivates our
relaxation of the QCQP to a semidefinite program (SDP). The SDP is convex, can
be solved reasonably efficiently and comes with both absolute and
output-sensitive approximation quality. In addition to theoretical guarantees,
we do an extensive comparison of the QCQP method and the SDP relaxation on a
variety of synthetic and real world data. Finally, we present two useful
extensions: we incorporate kernel methods and computing multiple sets of
canonical vectors
Bayesian Inference on Matrix Manifolds for Linear Dimensionality Reduction
We reframe linear dimensionality reduction as a problem of Bayesian inference
on matrix manifolds. This natural paradigm extends the Bayesian framework to
dimensionality reduction tasks in higher dimensions with simpler models at
greater speeds. Here an orthogonal basis is treated as a single point on a
manifold and is associated with a linear subspace on which observations vary
maximally. Throughout this paper, we employ the Grassmann and Stiefel manifolds
for various dimensionality reduction problems, explore the connection between
the two manifolds, and use Hybrid Monte Carlo for posterior sampling on the
Grassmannian for the first time. We delineate in which situations either
manifold should be considered. Further, matrix manifold models are used to
yield scientific insight in the context of cognitive neuroscience, and we
conclude that our methods are suitable for basic inference as well as accurate
prediction.Comment: All datasets and computer programs are publicly available at
http://www.ics.uci.edu/~babaks/Site/Codes.htm
- …