2,297 research outputs found
Stochastic Weighted Graphs: Flexible Model Specification and Simulation
In most domains of network analysis researchers consider networks that arise
in nature with weighted edges. Such networks are routinely dichotomized in the
interest of using available methods for statistical inference with networks.
The generalized exponential random graph model (GERGM) is a recently proposed
method used to simulate and model the edges of a weighted graph. The GERGM
specifies a joint distribution for an exponential family of graphs with
continuous-valued edge weights. However, current estimation algorithms for the
GERGM only allow inference on a restricted family of model specifications. To
address this issue, we develop a Metropolis--Hastings method that can be used
to estimate any GERGM specification, thereby significantly extending the family
of weighted graphs that can be modeled with the GERGM. We show that new
flexible model specifications are capable of avoiding likelihood degeneracy and
efficiently capturing network structure in applications where such models were
not previously available. We demonstrate the utility of this new class of
GERGMs through application to two real network data sets, and we further assess
the effectiveness of our proposed methodology by simulating non-degenerate
model specifications from the well-studied two-stars model. A working R version
of the GERGM code is available in the supplement and will be incorporated in
the gergm CRAN package.Comment: 33 pages, 6 figures. To appear in Social Network
Estimation of subgraph density in noisy networks
While it is common practice in applied network analysis to report various
standard network summary statistics, these numbers are rarely accompanied by
uncertainty quantification. Yet any error inherent in the measurements
underlying the construction of the network, or in the network construction
procedure itself, necessarily must propagate to any summary statistics
reported. Here we study the problem of estimating the density of an arbitrary
subgraph, given a noisy version of some underlying network as data. Under a
simple model of network error, we show that consistent estimation of such
densities is impossible when the rates of error are unknown and only a single
network is observed. Accordingly, we develop method-of-moment estimators of
network subgraph densities and error rates for the case where a minimal number
of network replicates are available. These estimators are shown to be
asymptotically normal as the number of vertices increases to infinity. We also
provide confidence intervals for quantifying the uncertainty in these estimates
based on the asymptotic normality. To construct the confidence intervals, a new
and non-standard bootstrap method is proposed to compute asymptotic variances,
which is infeasible otherwise. We illustrate the proposed methods in the
context of gene coexpression networks
Bayesian Inference of Online Social Network Statistics via Lightweight Random Walk Crawls
Online social networks (OSN) contain extensive amount of information about
the underlying society that is yet to be explored. One of the most feasible
technique to fetch information from OSN, crawling through Application
Programming Interface (API) requests, poses serious concerns over the the
guarantees of the estimates. In this work, we focus on making reliable
statistical inference with limited API crawls. Based on regenerative properties
of the random walks, we propose an unbiased estimator for the aggregated sum of
functions over edges and proved the connection between variance of the
estimator and spectral gap. In order to facilitate Bayesian inference on the
true value of the estimator, we derive the approximate posterior distribution
of the estimate. Later the proposed ideas are validated with numerical
experiments on inference problems in real-world networks
Node similarity within subgraphs of protein interaction networks
We propose a biologically motivated quantity, twinness, to evaluate local
similarity between nodes in a network. The twinness of a pair of nodes is the
number of connected, labeled subgraphs of size n in which the two nodes possess
identical neighbours. The graph animal algorithm is used to estimate twinness
for each pair of nodes (for subgraph sizes n=4 to n=12) in four different
protein interaction networks (PINs). These include an Escherichia coli PIN and
three Saccharomyces cerevisiae PINs -- each obtained using state-of-the-art
high throughput methods. In almost all cases, the average twinness of node
pairs is vastly higher than expected from a null model obtained by switching
links. For all n, we observe a difference in the ratio of type A twins (which
are unlinked pairs) to type B twins (which are linked pairs) distinguishing the
prokaryote E. coli from the eukaryote S. cerevisiae. Interaction similarity is
expected due to gene duplication, and whole genome duplication paralogues in S.
cerevisiae have been reported to co-cluster into the same complexes. Indeed, we
find that these paralogous proteins are over-represented as twins compared to
pairs chosen at random. These results indicate that twinness can detect
ancestral relationships from currently available PIN data.Comment: 10 pages, 5 figures. Edited for typos, clarity, figures improved for
readabilit
Subgraph covers -- An information theoretic approach to motif analysis in networks
Many real world networks contain a statistically surprising number of certain
subgraphs, called network motifs. In the prevalent approach to motif analysis,
network motifs are detected by comparing subgraph frequencies in the original
network with a statistical null model. In this paper we propose an alternative
approach to motif analysis where network motifs are defined to be connectivity
patterns that occur in a subgraph cover that represents the network using
minimal total information. A subgraph cover is defined to be a set of subgraphs
such that every edge of the graph is contained in at least one of the subgraphs
in the cover. Some recently introduced random graph models that can incorporate
significant densities of motifs have natural formulations in terms of subgraph
covers and the presented approach can be used to match networks with such
models. To prove the practical value of our approach we also present a
heuristic for the resulting NP-hard optimization problem and give results for
several real world networks.Comment: 10 pages, 7 tables, 1 Figur
- …