408 research outputs found
On the Question of Effective Sample Size in Network Modeling: An Asymptotic Inquiry
The modeling and analysis of networks and network data has seen an explosion
of interest in recent years and represents an exciting direction for potential
growth in statistics. Despite the already substantial amount of work done in
this area to date by researchers from various disciplines, however, there
remain many questions of a decidedly foundational nature - natural analogues of
standard questions already posed and addressed in more classical areas of
statistics - that have yet to even be posed, much less addressed. Here we raise
and consider one such question in connection with network modeling.
Specifically, we ask, "Given an observed network, what is the sample size?"
Using simple, illustrative examples from the class of exponential random graph
models, we show that the answer to this question can very much depend on basic
properties of the networks expected under the model, as the number of vertices
in the network grows. In particular, adopting the (asymptotic) scaling of
the variance of the maximum likelihood parameter estimates as a notion of
effective sample size (), we show that when modeling the
overall propensity to have ties and the propensity to reciprocate ties, whether
the networks are sparse or not under the model (i.e., having a constant or an
increasing number of ties per vertex, respectively) is sufficient to yield an
order of magnitude difference in , from to
. In addition, we report simulation study results that suggest
similar properties for models for triadic (friend-of-a-friend) effects. We then
explore some practical implications of this result, using both simulation and
data on food-sharing from Lamalera, Indonesia.Comment: Published at http://dx.doi.org/10.1214/14-STS502 in the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Multiscale likelihood analysis and complexity penalized estimation
We describe here a framework for a certain class of multiscale likelihood
factorizations wherein, in analogy to a wavelet decomposition of an L^2
function, a given likelihood function has an alternative representation as a
product of conditional densities reflecting information in both the data and
the parameter vector localized in position and scale. The framework is
developed as a set of sufficient conditions for the existence of such
factorizations, formulated in analogy to those underlying a standard
multiresolution analysis for wavelets, and hence can be viewed as a
multiresolution analysis for likelihoods. We then consider the use of these
factorizations in the task of nonparametric, complexity penalized likelihood
estimation. We study the risk properties of certain thresholding and
partitioning estimators, and demonstrate their adaptivity and near-optimality,
in a minimax sense over a broad range of function spaces, based on squared
Hellinger distance as a loss function. In particular, our results provide an
illustration of how properties of classical wavelet-based estimators can be
obtained in a single, unified framework that includes models for continuous,
count and categorical data types
Estimation of subgraph density in noisy networks
While it is common practice in applied network analysis to report various
standard network summary statistics, these numbers are rarely accompanied by
uncertainty quantification. Yet any error inherent in the measurements
underlying the construction of the network, or in the network construction
procedure itself, necessarily must propagate to any summary statistics
reported. Here we study the problem of estimating the density of an arbitrary
subgraph, given a noisy version of some underlying network as data. Under a
simple model of network error, we show that consistent estimation of such
densities is impossible when the rates of error are unknown and only a single
network is observed. Accordingly, we develop method-of-moment estimators of
network subgraph densities and error rates for the case where a minimal number
of network replicates are available. These estimators are shown to be
asymptotically normal as the number of vertices increases to infinity. We also
provide confidence intervals for quantifying the uncertainty in these estimates
based on the asymptotic normality. To construct the confidence intervals, a new
and non-standard bootstrap method is proposed to compute asymptotic variances,
which is infeasible otherwise. We illustrate the proposed methods in the
context of gene coexpression networks
- β¦