102 research outputs found
Multiscale likelihood analysis and complexity penalized estimation
We describe here a framework for a certain class of multiscale likelihood
factorizations wherein, in analogy to a wavelet decomposition of an L^2
function, a given likelihood function has an alternative representation as a
product of conditional densities reflecting information in both the data and
the parameter vector localized in position and scale. The framework is
developed as a set of sufficient conditions for the existence of such
factorizations, formulated in analogy to those underlying a standard
multiresolution analysis for wavelets, and hence can be viewed as a
multiresolution analysis for likelihoods. We then consider the use of these
factorizations in the task of nonparametric, complexity penalized likelihood
estimation. We study the risk properties of certain thresholding and
partitioning estimators, and demonstrate their adaptivity and near-optimality,
in a minimax sense over a broad range of function spaces, based on squared
Hellinger distance as a loss function. In particular, our results provide an
illustration of how properties of classical wavelet-based estimators can be
obtained in a single, unified framework that includes models for continuous,
count and categorical data types
On the Question of Effective Sample Size in Network Modeling: An Asymptotic Inquiry
The modeling and analysis of networks and network data has seen an explosion
of interest in recent years and represents an exciting direction for potential
growth in statistics. Despite the already substantial amount of work done in
this area to date by researchers from various disciplines, however, there
remain many questions of a decidedly foundational nature - natural analogues of
standard questions already posed and addressed in more classical areas of
statistics - that have yet to even be posed, much less addressed. Here we raise
and consider one such question in connection with network modeling.
Specifically, we ask, "Given an observed network, what is the sample size?"
Using simple, illustrative examples from the class of exponential random graph
models, we show that the answer to this question can very much depend on basic
properties of the networks expected under the model, as the number of vertices
in the network grows. In particular, adopting the (asymptotic) scaling of
the variance of the maximum likelihood parameter estimates as a notion of
effective sample size (), we show that when modeling the
overall propensity to have ties and the propensity to reciprocate ties, whether
the networks are sparse or not under the model (i.e., having a constant or an
increasing number of ties per vertex, respectively) is sufficient to yield an
order of magnitude difference in , from to
. In addition, we report simulation study results that suggest
similar properties for models for triadic (friend-of-a-friend) effects. We then
explore some practical implications of this result, using both simulation and
data on food-sharing from Lamalera, Indonesia.Comment: Published at http://dx.doi.org/10.1214/14-STS502 in the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Estimation of subgraph density in noisy networks
While it is common practice in applied network analysis to report various
standard network summary statistics, these numbers are rarely accompanied by
uncertainty quantification. Yet any error inherent in the measurements
underlying the construction of the network, or in the network construction
procedure itself, necessarily must propagate to any summary statistics
reported. Here we study the problem of estimating the density of an arbitrary
subgraph, given a noisy version of some underlying network as data. Under a
simple model of network error, we show that consistent estimation of such
densities is impossible when the rates of error are unknown and only a single
network is observed. Accordingly, we develop method-of-moment estimators of
network subgraph densities and error rates for the case where a minimal number
of network replicates are available. These estimators are shown to be
asymptotically normal as the number of vertices increases to infinity. We also
provide confidence intervals for quantifying the uncertainty in these estimates
based on the asymptotic normality. To construct the confidence intervals, a new
and non-standard bootstrap method is proposed to compute asymptotic variances,
which is infeasible otherwise. We illustrate the proposed methods in the
context of gene coexpression networks
Network Kriging
Network service providers and customers are often concerned with aggregate
performance measures that span multiple network paths. Unfortunately, forming
such network-wide measures can be difficult, due to the issues of scale
involved. In particular, the number of paths grows too rapidly with the number
of endpoints to make exhaustive measurement practical. As a result, it is of
interest to explore the feasibility of methods that dramatically reduce the
number of paths measured in such situations while maintaining acceptable
accuracy.
We cast the problem as one of statistical prediction--in the spirit of the
so-called `kriging' problem in spatial statistics--and show that end-to-end
network properties may be accurately predicted in many cases using a
surprisingly small set of carefully chosen paths. More precisely, we formulate
a general framework for the prediction problem, propose a class of linear
predictors for standard quantities of interest (e.g., averages, totals,
differences) and show that linear algebraic methods of subset selection may be
used to effectively choose which paths to measure. We characterize the
performance of the resulting methods, both analytically and numerically. The
success of our methods derives from the low effective rank of routing matrices
as encountered in practice, which appears to be a new observation in its own
right with potentially broad implications on network measurement generally.Comment: 16 pages, 9 figures, single-space
- β¦