7,930 research outputs found
A Latent Parameter Node-Centric Model for Spatial Networks
Spatial networks, in which nodes and edges are embedded in space, play a
vital role in the study of complex systems. For example, many social networks
attach geo-location information to each user, allowing the study of not only
topological interactions between users, but spatial interactions as well. The
defining property of spatial networks is that edge distances are associated
with a cost, which may subtly influence the topology of the network. However,
the cost function over distance is rarely known, thus developing a model of
connections in spatial networks is a difficult task.
In this paper, we introduce a novel model for capturing the interaction
between spatial effects and network structure. Our approach represents a unique
combination of ideas from latent variable statistical models and spatial
network modeling. In contrast to previous work, we view the ability to form
long/short-distance connections to be dependent on the individual nodes
involved. For example, a node's specific surroundings (e.g. network structure
and node density) may make it more likely to form a long distance link than
other nodes with the same degree. To capture this information, we attach a
latent variable to each node which represents a node's spatial reach. These
variables are inferred from the network structure using a Markov Chain Monte
Carlo algorithm.
We experimentally evaluate our proposed model on 4 different types of
real-world spatial networks (e.g. transportation, biological, infrastructure,
and social). We apply our model to the task of link prediction and achieve up
to a 35% improvement over previous approaches in terms of the area under the
ROC curve. Additionally, we show that our model is particularly helpful for
predicting links between nodes with low degrees. In these cases, we see much
larger improvements over previous models
Inference of Sparse Networks with Unobserved Variables. Application to Gene Regulatory Networks
Networks are a unifying framework for modeling complex systems and network
inference problems are frequently encountered in many fields. Here, I develop
and apply a generative approach to network inference (RCweb) for the case when
the network is sparse and the latent (not observed) variables affect the
observed ones. From all possible factor analysis (FA) decompositions explaining
the variance in the data, RCweb selects the FA decomposition that is consistent
with a sparse underlying network. The sparsity constraint is imposed by a novel
method that significantly outperforms (in terms of accuracy, robustness to
noise, complexity scaling, and computational efficiency) Bayesian methods and
MLE methods using l1 norm relaxation such as K-SVD and l1--based sparse
principle component analysis (PCA). Results from simulated models demonstrate
that RCweb recovers exactly the model structures for sparsity as low (as
non-sparse) as 50% and with ratio of unobserved to observed variables as high
as 2. RCweb is robust to noise, with gradual decrease in the parameter ranges
as the noise level increases.Comment: 8 pages, 5 figure
Transforming Graph Representations for Statistical Relational Learning
Relational data representations have become an increasingly important topic
due to the recent proliferation of network datasets (e.g., social, biological,
information networks) and a corresponding increase in the application of
statistical relational learning (SRL) algorithms to these domains. In this
article, we examine a range of representation issues for graph-based relational
data. Since the choice of relational data representation for the nodes, links,
and features can dramatically affect the capabilities of SRL algorithms, we
survey approaches and opportunities for relational representation
transformation designed to improve the performance of these algorithms. This
leads us to introduce an intuitive taxonomy for data representation
transformations in relational domains that incorporates link transformation and
node transformation as symmetric representation tasks. In particular, the
transformation tasks for both nodes and links include (i) predicting their
existence, (ii) predicting their label or type, (iii) estimating their weight
or importance, and (iv) systematically constructing their relevant features. We
motivate our taxonomy through detailed examples and use it to survey and
compare competing approaches for each of these tasks. We also discuss general
conditions for transforming links, nodes, and features. Finally, we highlight
challenges that remain to be addressed
Causal inference for social network data
We describe semiparametric estimation and inference for causal effects using
observational data from a single social network. Our asymptotic result is the
first to allow for dependence of each observation on a growing number of other
units as sample size increases. While previous methods have generally
implicitly focused on one of two possible sources of dependence among social
network observations, we allow for both dependence due to transmission of
information across network ties, and for dependence due to latent similarities
among nodes sharing ties. We describe estimation and inference for new causal
effects that are specifically of interest in social network settings, such as
interventions on network ties and network structure. Using our methods to
reanalyze the Framingham Heart Study data used in one of the most influential
and controversial causal analyses of social network data, we find that after
accounting for network structure there is no evidence for the causal effects
claimed in the original paper
Spectral partitioning of time-varying networks with unobserved edges
We discuss a variant of `blind' community detection, in which we aim to
partition an unobserved network from the observation of a (dynamical) graph
signal defined on the network. We consider a scenario where our observed graph
signals are obtained by filtering white noise input, and the underlying network
is different for every observation. In this fashion, the filtered graph signals
can be interpreted as defined on a time-varying network. We model each of the
underlying network realizations as generated by an independent draw from a
latent stochastic blockmodel (SBM). To infer the partition of the latent SBM,
we propose a simple spectral algorithm for which we provide a theoretical
analysis and establish consistency guarantees for the recovery. We illustrate
our results using numerical experiments on synthetic and real data,
highlighting the efficacy of our approach.Comment: 5 pages, 2 figure
Joint estimation of multiple related biological networks
Graphical models are widely used to make inferences concerning interplay in
multivariate systems. In many applications, data are collected from multiple
related but nonidentical units whose underlying networks may differ but are
likely to share features. Here we present a hierarchical Bayesian formulation
for joint estimation of multiple networks in this nonidentically distributed
setting. The approach is general: given a suitable class of graphical models,
it uses an exchangeability assumption on networks to provide a corresponding
joint formulation. Motivated by emerging experimental designs in molecular
biology, we focus on time-course data with interventions, using dynamic
Bayesian networks as the graphical models. We introduce a computationally
efficient, deterministic algorithm for exact joint inference in this setting.
We provide an upper bound on the gains that joint estimation offers relative to
separate estimation for each network and empirical results that support and
extend the theory, including an extensive simulation study and an application
to proteomic data from human cancer cell lines. Finally, we describe
approximations that are still more computationally efficient than the exact
algorithm and that also demonstrate good empirical performance.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS761 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Variational Inference for Stochastic Block Models from Sampled Data
This paper deals with non-observed dyads during the sampling of a network and
consecutive issues in the inference of the Stochastic Block Model (SBM). We
review sampling designs and recover Missing At Random (MAR) and Not Missing At
Random (NMAR) conditions for the SBM. We introduce variants of the variational
EM algorithm for inferring the SBM under various sampling designs (MAR and
NMAR) all available as an R package. Model selection criteria based on
Integrated Classification Likelihood are derived for selecting both the number
of blocks and the sampling design. We investigate the accuracy and the range of
applicability of these algorithms with simulations. We explore two real-world
networks from ethnology (seed circulation network) and biology (protein-protein
interaction network), where the interpretations considerably depends on the
sampling designs considered
- …