16 research outputs found
SIAM Data Mining Brings It to Annual Meeting
The Data Mining Activity Group is one of SIAM\u27s most vibrant and dynamic activity groups. To better share our enthusiasm for data mining with the broader SIAM community, our activity group organized six minisymposia at the 2016 Annual Meeting. These minisymposia included 48 talks organized by 11 SIAM members on - GraphBLAS (Aydın Buluç) - Algorithms and statistical methods for noisy network analysis (Sanjukta Bhowmick & Ben Miller) - Inferring networks from non-network data (Rajmonda Caceres, Ivan Brugere & Tanya Y. Berger-Wolf) - Visual analytics (Jordan Crouser) - Mining in graph data (Jennifer Webster, Mahantesh Halappanavar & Emilie Hogan) - Scientific computing and big data (Vijay Gadepally) These minisymposia were well received by the broader SIAM community, and below are some of the key highlights
Graph-based Semi-Supervised & Active Learning for Edge Flows
We present a graph-based semi-supervised learning (SSL) method for learning
edge flows defined on a graph. Specifically, given flow measurements on a
subset of edges, we want to predict the flows on the remaining edges. To this
end, we develop a computational framework that imposes certain constraints on
the overall flows, such as (approximate) flow conservation. These constraints
render our approach different from classical graph-based SSL for vertex labels,
which posits that tightly connected nodes share similar labels and leverages
the graph structure accordingly to extrapolate from a few vertex labels to the
unlabeled vertices. We derive bounds for our method's reconstruction error and
demonstrate its strong performance on synthetic and real-world flow networks
from transportation, physical infrastructure, and the Web. Furthermore, we
provide two active learning algorithms for selecting informative edges on which
to measure flow, which has applications for optimal sensor deployment. The
first strategy selects edges to minimize the reconstruction error bound and
works well on flows that are approximately divergence-free. The second approach
clusters the graph and selects bottleneck edges that cross cluster-boundaries,
which works well on flows with global trends
Distributed and Asynchronous Methods for Semi-supervised Learning
International audienceWe propose two asynchronously distributed approaches for graph-based semi-supervised learning. The first approach is based on stochastic approximation, whereas the second approach is based on randomized Kaczmarz algorithm. In addition to the possibility of distributed implementation, both approaches can be naturally applied online to streaming data. We analyse both approaches theoretically and by experiments. It appears that there is no clear winner and we provide indications about cases of superiority for each approach
Local Hypergraph Clustering using Capacity Releasing Diffusion
Local graph clustering is an important machine learning task that aims to
find a well-connected cluster near a set of seed nodes. Recent results have
revealed that incorporating higher order information significantly enhances the
results of graph clustering techniques. The majority of existing research in
this area focuses on spectral graph theory-based techniques. However, an
alternative perspective on local graph clustering arises from using max-flow
and min-cut on the objectives, which offer distinctly different guarantees. For
instance, a new method called capacity releasing diffusion (CRD) was recently
proposed and shown to preserve local structure around the seeds better than
spectral methods. The method was also the first local clustering technique that
is not subject to the quadratic Cheeger inequality by assuming a good cluster
near the seed nodes. In this paper, we propose a local hypergraph clustering
technique called hypergraph CRD (HG-CRD) by extending the CRD process to
cluster based on higher order patterns, encoded as hyperedges of a hypergraph.
Moreover, we theoretically show that HG-CRD gives results about a quantity
called motif conductance, rather than a biased version used in previous
experiments. Experimental results on synthetic datasets and real world graphs
show that HG-CRD enhances the clustering quality.Comment: 18 pages, 6 figure