81,195 research outputs found
On Consistency of Graph-based Semi-supervised Learning
Graph-based semi-supervised learning is one of the most popular methods in
machine learning. Some of its theoretical properties such as bounds for the
generalization error and the convergence of the graph Laplacian regularizer
have been studied in computer science and statistics literatures. However, a
fundamental statistical property, the consistency of the estimator from this
method has not been proved. In this article, we study the consistency problem
under a non-parametric framework. We prove the consistency of graph-based
learning in the case that the estimated scores are enforced to be equal to the
observed responses for the labeled data. The sample sizes of both labeled and
unlabeled data are allowed to grow in this result. When the estimated scores
are not required to be equal to the observed responses, a tuning parameter is
used to balance the loss function and the graph Laplacian regularizer. We give
a counterexample demonstrating that the estimator for this case can be
inconsistent. The theoretical findings are supported by numerical studies.Comment: This paper is accepted by 2019 IEEE 39th International Conference on
Distributed Computing Systems (ICDCS
Graph Laplacian for Semi-Supervised Learning
Semi-supervised learning is highly useful in common scenarios where labeled
data is scarce but unlabeled data is abundant. The graph (or nonlocal)
Laplacian is a fundamental smoothing operator for solving various learning
tasks. For unsupervised clustering, a spectral embedding is often used, based
on graph-Laplacian eigenvectors. For semi-supervised problems, the common
approach is to solve a constrained optimization problem, regularized by a
Dirichlet energy, based on the graph-Laplacian. However, as supervision
decreases, Dirichlet optimization becomes suboptimal. We therefore would like
to obtain a smooth transition between unsupervised clustering and
low-supervised graph-based classification. In this paper, we propose a new type
of graph-Laplacian which is adapted for Semi-Supervised Learning (SSL)
problems. It is based on both density and contrastive measures and allows the
encoding of the labeled data directly in the operator. Thus, we can perform
successfully semi-supervised learning using spectral clustering. The benefits
of our approach are illustrated for several SSL problems.Comment: 12 pages, 6 figure
Graph-based Semi-Supervised & Active Learning for Edge Flows
We present a graph-based semi-supervised learning (SSL) method for learning
edge flows defined on a graph. Specifically, given flow measurements on a
subset of edges, we want to predict the flows on the remaining edges. To this
end, we develop a computational framework that imposes certain constraints on
the overall flows, such as (approximate) flow conservation. These constraints
render our approach different from classical graph-based SSL for vertex labels,
which posits that tightly connected nodes share similar labels and leverages
the graph structure accordingly to extrapolate from a few vertex labels to the
unlabeled vertices. We derive bounds for our method's reconstruction error and
demonstrate its strong performance on synthetic and real-world flow networks
from transportation, physical infrastructure, and the Web. Furthermore, we
provide two active learning algorithms for selecting informative edges on which
to measure flow, which has applications for optimal sensor deployment. The
first strategy selects edges to minimize the reconstruction error bound and
works well on flows that are approximately divergence-free. The second approach
clusters the graph and selects bottleneck edges that cross cluster-boundaries,
which works well on flows with global trends
- …