1,812 research outputs found
Semi-supervised Eigenvectors for Large-scale Locally-biased Learning
In many applications, one has side information, e.g., labels that are
provided in a semi-supervised manner, about a specific target region of a large
data set, and one wants to perform machine learning and data analysis tasks
"nearby" that prespecified target region. For example, one might be interested
in the clustering structure of a data graph near a prespecified "seed set" of
nodes, or one might be interested in finding partitions in an image that are
near a prespecified "ground truth" set of pixels. Locally-biased problems of
this sort are particularly challenging for popular eigenvector-based machine
learning and data analysis tools. At root, the reason is that eigenvectors are
inherently global quantities, thus limiting the applicability of
eigenvector-based methods in situations where one is interested in very local
properties of the data.
In this paper, we address this issue by providing a methodology to construct
semi-supervised eigenvectors of a graph Laplacian, and we illustrate how these
locally-biased eigenvectors can be used to perform locally-biased machine
learning. These semi-supervised eigenvectors capture
successively-orthogonalized directions of maximum variance, conditioned on
being well-correlated with an input seed set of nodes that is assumed to be
provided in a semi-supervised manner. We show that these semi-supervised
eigenvectors can be computed quickly as the solution to a system of linear
equations; and we also describe several variants of our basic method that have
improved scaling properties. We provide several empirical examples
demonstrating how these semi-supervised eigenvectors can be used to perform
locally-biased learning; and we discuss the relationship between our results
and recent machine learning algorithms that use global eigenvectors of the
graph Laplacian
Out-of-sample generalizations for supervised manifold learning for classification
Supervised manifold learning methods for data classification map data samples
residing in a high-dimensional ambient space to a lower-dimensional domain in a
structure-preserving way, while enhancing the separation between different
classes in the learned embedding. Most nonlinear supervised manifold learning
methods compute the embedding of the manifolds only at the initially available
training points, while the generalization of the embedding to novel points,
known as the out-of-sample extension problem in manifold learning, becomes
especially important in classification applications. In this work, we propose a
semi-supervised method for building an interpolation function that provides an
out-of-sample extension for general supervised manifold learning algorithms
studied in the context of classification. The proposed algorithm computes a
radial basis function (RBF) interpolator that minimizes an objective function
consisting of the total embedding error of unlabeled test samples, defined as
their distance to the embeddings of the manifolds of their own class, as well
as a regularization term that controls the smoothness of the interpolation
function in a direction-dependent way. The class labels of test data and the
interpolation function parameters are estimated jointly with a progressive
procedure. Experimental results on face and object images demonstrate the
potential of the proposed out-of-sample extension algorithm for the
classification of manifold-modeled data sets
Local Component Analysis
Kernel density estimation, a.k.a. Parzen windows, is a popular density
estimation method, which can be used for outlier detection or clustering. With
multivariate data, its performance is heavily reliant on the metric used within
the kernel. Most earlier work has focused on learning only the bandwidth of the
kernel (i.e., a scalar multiplicative factor). In this paper, we propose to
learn a full Euclidean metric through an expectation-minimization (EM)
procedure, which can be seen as an unsupervised counterpart to neighbourhood
component analysis (NCA). In order to avoid overfitting with a fully
nonparametric density estimator in high dimensions, we also consider a
semi-parametric Gaussian-Parzen density model, where some of the variables are
modelled through a jointly Gaussian density, while others are modelled through
Parzen windows. For these two models, EM leads to simple closed-form updates
based on matrix inversions and eigenvalue decompositions. We show empirically
that our method leads to density estimators with higher test-likelihoods than
natural competing methods, and that the metrics may be used within most
unsupervised learning techniques that rely on such metrics, such as spectral
clustering or manifold learning methods. Finally, we present a stochastic
approximation scheme which allows for the use of this method in a large-scale
setting
Approximate Computation and Implicit Regularization for Very Large-scale Data Analysis
Database theory and database practice are typically the domain of computer
scientists who adopt what may be termed an algorithmic perspective on their
data. This perspective is very different than the more statistical perspective
adopted by statisticians, scientific computers, machine learners, and other who
work on what may be broadly termed statistical data analysis. In this article,
I will address fundamental aspects of this algorithmic-statistical disconnect,
with an eye to bridging the gap between these two very different approaches. A
concept that lies at the heart of this disconnect is that of statistical
regularization, a notion that has to do with how robust is the output of an
algorithm to the noise properties of the input data. Although it is nearly
completely absent from computer science, which historically has taken the input
data as given and modeled algorithms discretely, regularization in one form or
another is central to nearly every application domain that applies algorithms
to noisy data. By using several case studies, I will illustrate, both
theoretically and empirically, the nonobvious fact that approximate
computation, in and of itself, can implicitly lead to statistical
regularization. This and other recent work suggests that, by exploiting in a
more principled way the statistical properties implicit in worst-case
algorithms, one can in many cases satisfy the bicriteria of having algorithms
that are scalable to very large-scale databases and that also have good
inferential or predictive properties.Comment: To appear in the Proceedings of the 2012 ACM Symposium on Principles
of Database Systems (PODS 2012
Simplified Energy Landscape for Modularity Using Total Variation
Networks capture pairwise interactions between entities and are frequently
used in applications such as social networks, food networks, and protein
interaction networks, to name a few. Communities, cohesive groups of nodes,
often form in these applications, and identifying them gives insight into the
overall organization of the network. One common quality function used to
identify community structure is modularity. In Hu et al. [SIAM J. App. Math.,
73(6), 2013], it was shown that modularity optimization is equivalent to
minimizing a particular nonconvex total variation (TV) based functional over a
discrete domain. They solve this problem, assuming the number of communities is
known, using a Merriman, Bence, Osher (MBO) scheme.
We show that modularity optimization is equivalent to minimizing a convex
TV-based functional over a discrete domain, again, assuming the number of
communities is known. Furthermore, we show that modularity has no convex
relaxation satisfying certain natural conditions. We therefore, find a
manageable non-convex approximation using a Ginzburg Landau functional, which
provably converges to the correct energy in the limit of a certain parameter.
We then derive an MBO algorithm with fewer hand-tuned parameters than in Hu et
al. and which is 7 times faster at solving the associated diffusion equation
due to the fact that the underlying discretization is unconditionally stable.
Our numerical tests include a hyperspectral video whose associated graph has
2.9x10^7 edges, which is roughly 37 times larger than was handled in the paper
of Hu et al.Comment: 25 pages, 3 figures, 3 tables, submitted to SIAM J. App. Mat
- …