21,832 research outputs found
Generalized Optimization Framework for Graph-based Semi-supervised Learning
We develop a generalized optimization framework for graph-based
semi-supervised learning. The framework gives as particular cases the Standard
Laplacian, Normalized Laplacian and PageRank based methods. We have also
provided new probabilistic interpretation based on random walks and
characterized the limiting behaviour of the methods. The random walk based
interpretation allows us to explain di erences between the performances of
methods with di erent smoothing kernels. It appears that the PageRank based
method is robust with respect to the choice of the regularization parameter and
the labelled data. We illustrate our theoretical results with two realistic
datasets, characterizing di erent challenges: Les Miserables characters social
network and Wikipedia hyper-link graph. The graph-based semi-supervised
learning classi- es the Wikipedia articles with very good precision and perfect
recall employing only the information about the hyper-text links
On multi-view learning with additive models
In many scientific settings data can be naturally partitioned into variable
groupings called views. Common examples include environmental (1st view) and
genetic information (2nd view) in ecological applications, chemical (1st view)
and biological (2nd view) data in drug discovery. Multi-view data also occur in
text analysis and proteomics applications where one view consists of a graph
with observations as the vertices and a weighted measure of pairwise similarity
between observations as the edges. Further, in several of these applications
the observations can be partitioned into two sets, one where the response is
observed (labeled) and the other where the response is not (unlabeled). The
problem for simultaneously addressing viewed data and incorporating unlabeled
observations in training is referred to as multi-view transductive learning. In
this work we introduce and study a comprehensive generalized fixed point
additive modeling framework for multi-view transductive learning, where any
view is represented by a linear smoother. The problem of view selection is
discussed using a generalized Akaike Information Criterion, which provides an
approach for testing the contribution of each view. An efficient implementation
is provided for fitting these models with both backfitting and local-scoring
type algorithms adjusted to semi-supervised graph-based learning. The proposed
technique is assessed on both synthetic and real data sets and is shown to be
competitive to state-of-the-art co-training and graph-based techniques.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS202 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A random matrix analysis and improvement of semi-supervised learning for large dimensional data
This article provides an original understanding of the behavior of a class of
graph-oriented semi-supervised learning algorithms in the limit of large and
numerous data. It is demonstrated that the intuition at the root of these
methods collapses in this limit and that, as a result, most of them become
inconsistent. Corrective measures and a new data-driven parametrization scheme
are proposed along with a theoretical analysis of the asymptotic performances
of the resulting approach. A surprisingly close behavior between theoretical
performances on Gaussian mixture models and on real datasets is also
illustrated throughout the article, thereby suggesting the importance of the
proposed analysis for dealing with practical data. As a result, significant
performance gains are observed on practical data classification using the
proposed parametrization
Semi-Supervised Sound Source Localization Based on Manifold Regularization
Conventional speaker localization algorithms, based merely on the received
microphone signals, are often sensitive to adverse conditions, such as: high
reverberation or low signal to noise ratio (SNR). In some scenarios, e.g. in
meeting rooms or cars, it can be assumed that the source position is confined
to a predefined area, and the acoustic parameters of the environment are
approximately fixed. Such scenarios give rise to the assumption that the
acoustic samples from the region of interest have a distinct geometrical
structure. In this paper, we show that the high dimensional acoustic samples
indeed lie on a low dimensional manifold and can be embedded into a low
dimensional space. Motivated by this result, we propose a semi-supervised
source localization algorithm which recovers the inverse mapping between the
acoustic samples and their corresponding locations. The idea is to use an
optimization framework based on manifold regularization, that involves
smoothness constraints of possible solutions with respect to the manifold. The
proposed algorithm, termed Manifold Regularization for Localization (MRL), is
implemented in an adaptive manner. The initialization is conducted with only
few labelled samples attached with their respective source locations, and then
the system is gradually adapted as new unlabelled samples (with unknown source
locations) are received. Experimental results show superior localization
performance when compared with a recently presented algorithm based on a
manifold learning approach and with the generalized cross-correlation (GCC)
algorithm as a baseline
- …