8,465 research outputs found
Graph-based Semi-Supervised & Active Learning for Edge Flows
We present a graph-based semi-supervised learning (SSL) method for learning
edge flows defined on a graph. Specifically, given flow measurements on a
subset of edges, we want to predict the flows on the remaining edges. To this
end, we develop a computational framework that imposes certain constraints on
the overall flows, such as (approximate) flow conservation. These constraints
render our approach different from classical graph-based SSL for vertex labels,
which posits that tightly connected nodes share similar labels and leverages
the graph structure accordingly to extrapolate from a few vertex labels to the
unlabeled vertices. We derive bounds for our method's reconstruction error and
demonstrate its strong performance on synthetic and real-world flow networks
from transportation, physical infrastructure, and the Web. Furthermore, we
provide two active learning algorithms for selecting informative edges on which
to measure flow, which has applications for optimal sensor deployment. The
first strategy selects edges to minimize the reconstruction error bound and
works well on flows that are approximately divergence-free. The second approach
clusters the graph and selects bottleneck edges that cross cluster-boundaries,
which works well on flows with global trends
Semi-Supervised Overlapping Community Finding based on Label Propagation with Pairwise Constraints
Algorithms for detecting communities in complex networks are generally
unsupervised, relying solely on the structure of the network. However, these
methods can often fail to uncover meaningful groupings that reflect the
underlying communities in the data, particularly when those structures are
highly overlapping. One way to improve the usefulness of these algorithms is by
incorporating additional background information, which can be used as a source
of constraints to direct the community detection process. In this work, we
explore the potential of semi-supervised strategies to improve algorithms for
finding overlapping communities in networks. Specifically, we propose a new
method, based on label propagation, for finding communities using a limited
number of pairwise constraints. Evaluations on synthetic and real-world
datasets demonstrate the potential of this approach for uncovering meaningful
community structures in cases where each node can potentially belong to more
than one community.Comment: Fix table
Hypergraph -Laplacian: A Differential Geometry View
The graph Laplacian plays key roles in information processing of relational
data, and has analogies with the Laplacian in differential geometry. In this
paper, we generalize the analogy between graph Laplacian and differential
geometry to the hypergraph setting, and propose a novel hypergraph
-Laplacian. Unlike the existing two-node graph Laplacians, this
generalization makes it possible to analyze hypergraphs, where the edges are
allowed to connect any number of nodes. Moreover, we propose a semi-supervised
learning method based on the proposed hypergraph -Laplacian, and formalize
them as the analogue to the Dirichlet problem, which often appears in physics.
We further explore theoretical connections to normalized hypergraph cut on a
hypergraph, and propose normalized cut corresponding to hypergraph
-Laplacian. The proposed -Laplacian is shown to outperform standard
hypergraph Laplacians in the experiment on a hypergraph semi-supervised
learning and normalized cut setting.Comment: Extended version of our AAAI-18 pape
A random matrix analysis and improvement of semi-supervised learning for large dimensional data
This article provides an original understanding of the behavior of a class of
graph-oriented semi-supervised learning algorithms in the limit of large and
numerous data. It is demonstrated that the intuition at the root of these
methods collapses in this limit and that, as a result, most of them become
inconsistent. Corrective measures and a new data-driven parametrization scheme
are proposed along with a theoretical analysis of the asymptotic performances
of the resulting approach. A surprisingly close behavior between theoretical
performances on Gaussian mixture models and on real datasets is also
illustrated throughout the article, thereby suggesting the importance of the
proposed analysis for dealing with practical data. As a result, significant
performance gains are observed on practical data classification using the
proposed parametrization
- …