408 research outputs found

    On the Question of Effective Sample Size in Network Modeling: An Asymptotic Inquiry

    Get PDF
    The modeling and analysis of networks and network data has seen an explosion of interest in recent years and represents an exciting direction for potential growth in statistics. Despite the already substantial amount of work done in this area to date by researchers from various disciplines, however, there remain many questions of a decidedly foundational nature - natural analogues of standard questions already posed and addressed in more classical areas of statistics - that have yet to even be posed, much less addressed. Here we raise and consider one such question in connection with network modeling. Specifically, we ask, "Given an observed network, what is the sample size?" Using simple, illustrative examples from the class of exponential random graph models, we show that the answer to this question can very much depend on basic properties of the networks expected under the model, as the number of vertices nVn_V in the network grows. In particular, adopting the (asymptotic) scaling of the variance of the maximum likelihood parameter estimates as a notion of effective sample size (neffn_{\mathrm{eff}}), we show that when modeling the overall propensity to have ties and the propensity to reciprocate ties, whether the networks are sparse or not under the model (i.e., having a constant or an increasing number of ties per vertex, respectively) is sufficient to yield an order of magnitude difference in neffn_{\mathrm{eff}}, from O(nV)O(n_V) to O(nV2)O(n^2_V). In addition, we report simulation study results that suggest similar properties for models for triadic (friend-of-a-friend) effects. We then explore some practical implications of this result, using both simulation and data on food-sharing from Lamalera, Indonesia.Comment: Published at http://dx.doi.org/10.1214/14-STS502 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Multiscale likelihood analysis and complexity penalized estimation

    Full text link
    We describe here a framework for a certain class of multiscale likelihood factorizations wherein, in analogy to a wavelet decomposition of an L^2 function, a given likelihood function has an alternative representation as a product of conditional densities reflecting information in both the data and the parameter vector localized in position and scale. The framework is developed as a set of sufficient conditions for the existence of such factorizations, formulated in analogy to those underlying a standard multiresolution analysis for wavelets, and hence can be viewed as a multiresolution analysis for likelihoods. We then consider the use of these factorizations in the task of nonparametric, complexity penalized likelihood estimation. We study the risk properties of certain thresholding and partitioning estimators, and demonstrate their adaptivity and near-optimality, in a minimax sense over a broad range of function spaces, based on squared Hellinger distance as a loss function. In particular, our results provide an illustration of how properties of classical wavelet-based estimators can be obtained in a single, unified framework that includes models for continuous, count and categorical data types

    Estimation of subgraph density in noisy networks

    Full text link
    While it is common practice in applied network analysis to report various standard network summary statistics, these numbers are rarely accompanied by uncertainty quantification. Yet any error inherent in the measurements underlying the construction of the network, or in the network construction procedure itself, necessarily must propagate to any summary statistics reported. Here we study the problem of estimating the density of an arbitrary subgraph, given a noisy version of some underlying network as data. Under a simple model of network error, we show that consistent estimation of such densities is impossible when the rates of error are unknown and only a single network is observed. Accordingly, we develop method-of-moment estimators of network subgraph densities and error rates for the case where a minimal number of network replicates are available. These estimators are shown to be asymptotically normal as the number of vertices increases to infinity. We also provide confidence intervals for quantifying the uncertainty in these estimates based on the asymptotic normality. To construct the confidence intervals, a new and non-standard bootstrap method is proposed to compute asymptotic variances, which is infeasible otherwise. We illustrate the proposed methods in the context of gene coexpression networks
    • …
    corecore