18 research outputs found
Graph-based Semi-Supervised & Active Learning for Edge Flows
We present a graph-based semi-supervised learning (SSL) method for learning
edge flows defined on a graph. Specifically, given flow measurements on a
subset of edges, we want to predict the flows on the remaining edges. To this
end, we develop a computational framework that imposes certain constraints on
the overall flows, such as (approximate) flow conservation. These constraints
render our approach different from classical graph-based SSL for vertex labels,
which posits that tightly connected nodes share similar labels and leverages
the graph structure accordingly to extrapolate from a few vertex labels to the
unlabeled vertices. We derive bounds for our method's reconstruction error and
demonstrate its strong performance on synthetic and real-world flow networks
from transportation, physical infrastructure, and the Web. Furthermore, we
provide two active learning algorithms for selecting informative edges on which
to measure flow, which has applications for optimal sensor deployment. The
first strategy selects edges to minimize the reconstruction error bound and
works well on flows that are approximately divergence-free. The second approach
clusters the graph and selects bottleneck edges that cross cluster-boundaries,
which works well on flows with global trends
ALPINE : Active Link Prediction using Network Embedding
Many real-world problems can be formalized as predicting links in a partially observed network. Examples include Facebook friendship suggestions, consumer-product recommendations, and the identification of hidden interactions between actors in a crime network. Several link prediction algorithms, notably those recently introduced using network embedding, are capable of doing this by just relying on the observed part of the network.
Often, the link status of a node pair can be queried, which can be used as additional information by the link prediction algorithm. Unfortunately, such queries can be expensive or time-consuming, mandating the careful consideration of which node pairs to query. In this paper we estimate the improvement in link prediction accuracy after querying any particular node pair, to use in an active learning setup.
Specifically, we propose ALPINE (Active Link Prediction usIng Network Embedding), the first method to achieve this for link prediction based on network embedding. To this end, we generalized the notion of V-optimality from experimental design to this setting, as well as more basic active learning heuristics originally developed in standard classification settings. Empirical results on real data show that ALPINE is scalable, and boosts link prediction accuracy with far fewer queries
Sampling and Recovery of Signals on a Simplicial Complex using Neighbourhood Aggregation
In this work, we focus on sampling and recovery of signals over simplicial
complexes. In particular, we subsample a simplicial signal of a certain order
and focus on recovering multi-order bandlimited simplicial signals of one order
higher and one order lower. To do so, we assume that the simplicial signal
admits the Helmholtz decomposition that relates simplicial signals of these
different orders. Next, we propose an aggregation sampling scheme for
simplicial signals based on the Hodge Laplacian matrix and a simple least
squares estimator for recovery. We also provide theoretical conditions on the
number of aggregations and size of the sampling set required for faithful
reconstruction as a function of the bandwidth of simplicial signals to be
recovered. Numerical experiments are provided to show the effectiveness of the
proposed method
Online Edge Flow Imputation on Networks
Author's accepted manuscript© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.An online algorithm for missing data imputation for networks with signals defined on the edges is presented. Leveraging the prior knowledge intrinsic to real-world networks, we propose a bi-level optimization scheme that exploits the causal dependencies and the flow conservation, respectively via (i) a sparse line graph identification strategy based on a group-Lasso and (ii) a Kalman filtering-based signal reconstruction strategy developed using simplicial complex (SC) formulation. The advantages of this first SC-based attempt for time-varying signal imputation have been demonstrated through numerical experiments using EPANET models of both synthetic and real water distribution networks.acceptedVersio
Sensor Placement for Learning in Flow Networks
Large infrastructure networks (e.g. for transportation and power
distribution) require constant monitoring for failures, congestion, and other
adversarial events. However, assigning a sensor to every link in the network is
often infeasible due to placement and maintenance costs. Instead, sensors can
be placed only on a few key links, and machine learning algorithms can be
leveraged for the inference of missing measurements (e.g. traffic counts, power
flows) across the network. This paper investigates the sensor placement problem
for networks. We first formalize the problem under a flow conservation
assumption and show that it is NP-hard to place a fixed set of sensors
optimally. Next, we propose an efficient and adaptive greedy heuristic for
sensor placement that scales to large networks. Our experiments, using datasets
from real-world application domains, show that the proposed approach enables
more accurate inference than existing alternatives from the literature. We
demonstrate that considering even imperfect or incomplete ground-truth
estimates can vastly improve the prediction error, especially when a small
number of sensors is available.Comment: 9 pages, 6 figure
ALPINE: Active Link Prediction using Network Embedding
Many real-world problems can be formalized as predicting links in a partially
observed network. Examples include Facebook friendship suggestions,
consumer-product recommendations, and the identification of hidden interactions
between actors in a crime network. Several link prediction algorithms, notably
those recently introduced using network embedding, are capable of doing this by
just relying on the observed part of the network. Often, the link status of a
node pair can be queried, which can be used as additional information by the
link prediction algorithm. Unfortunately, such queries can be expensive or
time-consuming, mandating the careful consideration of which node pairs to
query. In this paper we estimate the improvement in link prediction accuracy
after querying any particular node pair, to use in an active learning setup.
Specifically, we propose ALPINE (Active Link Prediction usIng Network
Embedding), the first method to achieve this for link prediction based on
network embedding. To this end, we generalized the notion of V-optimality from
experimental design to this setting, as well as more basic active learning
heuristics originally developed in standard classification settings. Empirical
results on real data show that ALPINE is scalable, and boosts link prediction
accuracy with far fewer queries
Residual Correlation in Graph Neural Network Regression
A graph neural network transforms features in each vertex's neighborhood into
a vector representation of the vertex. Afterward, each vertex's representation
is used independently for predicting its label. This standard pipeline
implicitly assumes that vertex labels are conditionally independent given their
neighborhood features. However, this is a strong assumption, and we show that
it is far from true on many real-world graph datasets. Focusing on regression
tasks, we find that this conditional independence assumption severely limits
predictive power. This should not be that surprising, given that traditional
graph-based semi-supervised learning methods such as label propagation work in
the opposite fashion by explicitly modeling the correlation in predicted
outcomes.
Here, we address this problem with an interpretable and efficient framework
that can improve any graph neural network architecture simply by exploiting
correlation structure in the regression residuals. In particular, we model the
joint distribution of residuals on vertices with a parameterized multivariate
Gaussian, and estimate the parameters by maximizing the marginal likelihood of
the observed labels. Our framework achieves substantially higher accuracy than
competing baselines, and the learned parameters can be interpreted as the
strength of correlation among connected vertices. Furthermore, we develop
linear time algorithms for low-variance, unbiased model parameter estimates,
allowing us to scale to large networks. We also provide a basic version of our
method that makes stronger assumptions on correlation structure but is painless
to implement, often leading to great practical performance with minimal
overhead