2,840 research outputs found
On Consistency of Graph-based Semi-supervised Learning
Graph-based semi-supervised learning is one of the most popular methods in
machine learning. Some of its theoretical properties such as bounds for the
generalization error and the convergence of the graph Laplacian regularizer
have been studied in computer science and statistics literatures. However, a
fundamental statistical property, the consistency of the estimator from this
method has not been proved. In this article, we study the consistency problem
under a non-parametric framework. We prove the consistency of graph-based
learning in the case that the estimated scores are enforced to be equal to the
observed responses for the labeled data. The sample sizes of both labeled and
unlabeled data are allowed to grow in this result. When the estimated scores
are not required to be equal to the observed responses, a tuning parameter is
used to balance the loss function and the graph Laplacian regularizer. We give
a counterexample demonstrating that the estimator for this case can be
inconsistent. The theoretical findings are supported by numerical studies.Comment: This paper is accepted by 2019 IEEE 39th International Conference on
Distributed Computing Systems (ICDCS
Semi-Supervised Single- and Multi-Domain Regression with Multi-Domain Training
We address the problems of multi-domain and single-domain regression based on
distinct and unpaired labeled training sets for each of the domains and a large
unlabeled training set from all domains. We formulate these problems as a
Bayesian estimation with partial knowledge of statistical relations. We propose
a worst-case design strategy and study the resulting estimators. Our analysis
explicitly accounts for the cardinality of the labeled sets and includes the
special cases in which one of the labeled sets is very large or, in the other
extreme, completely missing. We demonstrate our estimators in the context of
removing expressions from facial images and in the context of audio-visual word
recognition, and provide comparisons to several recently proposed multi-modal
learning algorithms.Comment: 24 pages, 6 figures, 2 table
Semi-supervised cross-entropy clustering with information bottleneck constraint
In this paper, we propose a semi-supervised clustering method, CEC-IB, that
models data with a set of Gaussian distributions and that retrieves clusters
based on a partial labeling provided by the user (partition-level side
information). By combining the ideas from cross-entropy clustering (CEC) with
those from the information bottleneck method (IB), our method trades between
three conflicting goals: the accuracy with which the data set is modeled, the
simplicity of the model, and the consistency of the clustering with side
information. Experiments demonstrate that CEC-IB has a performance comparable
to Gaussian mixture models (GMM) in a classical semi-supervised scenario, but
is faster, more robust to noisy labels, automatically determines the optimal
number of clusters, and performs well when not all classes are present in the
side information. Moreover, in contrast to other semi-supervised models, it can
be successfully applied in discovering natural subgroups if the partition-level
side information is derived from the top levels of a hierarchical clustering
Density-sensitive semisupervised inference
Semisupervised methods are techniques for using labeled data
together with unlabeled data
to make predictions. These methods invoke some assumptions that link the
marginal distribution of X to the regression function f(x). For example,
it is common to assume that f is very smooth over high density regions of
. Many of the methods are ad-hoc and have been shown to work in specific
examples but are lacking a theoretical foundation. We provide a minimax
framework for analyzing semisupervised methods. In particular, we study methods
based on metrics that are sensitive to the distribution . Our model
includes a parameter that controls the strength of the semisupervised
assumption. We then use the data to adapt to .Comment: Published in at http://dx.doi.org/10.1214/13-AOS1092 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …