122 research outputs found

    Super-resolution community detection for layer-aggregated multilayer networks

    Get PDF
    Applied network science often involves preprocessing network data before applying a network-analysis method, and there is typically a theoretical disconnect between these steps. For example, it is common to aggregate time-varying network data into windows prior to analysis, and the tradeoffs of this preprocessing are not well understood. Focusing on the problem of detecting small communities in multilayer networks, we study the effects of layer aggregation by developing random-matrix theory for modularity matrices associated with layer-aggregated networks with NN nodes and LL layers, which are drawn from an ensemble of Erd\H{o}s-R\'enyi networks. We study phase transitions in which eigenvectors localize onto communities (allowing their detection) and which occur for a given community provided its size surpasses a detectability limit Kβˆ—K^*. When layers are aggregated via a summation, we obtain Kβˆ—βˆO(NL/T)K^*\varpropto \mathcal{O}(\sqrt{NL}/T), where TT is the number of layers across which the community persists. Interestingly, if TT is allowed to vary with LL then summation-based layer aggregation enhances small-community detection even if the community persists across a vanishing fraction of layers, provided that T/LT/L decays more slowly than O(Lβˆ’1/2) \mathcal{O}(L^{-1/2}). Moreover, we find that thresholding the summation can in some cases cause Kβˆ—K^* to decay exponentially, decreasing by orders of magnitude in a phenomenon we call super-resolution community detection. That is, layer aggregation with thresholding is a nonlinear data filter enabling detection of communities that are otherwise too small to detect. Importantly, different thresholds generally enhance the detectability of communities having different properties, illustrating that community detection can be obscured if one analyzes network data using a single threshold.Comment: 11 pages, 8 figure

    Computational barriers in minimax submatrix detection

    Get PDF
    This paper studies the minimax detection of a small submatrix of elevated mean in a large matrix contaminated by additive Gaussian noise. To investigate the tradeoff between statistical performance and computational cost from a complexity-theoretic perspective, we consider a sequence of discretized models which are asymptotically equivalent to the Gaussian model. Under the hypothesis that the planted clique detection problem cannot be solved in randomized polynomial time when the clique size is of smaller order than the square root of the graph size, the following phase transition phenomenon is established: when the size of the large matrix pβ†’βˆžp\to\infty, if the submatrix size k=Θ(pΞ±)k=\Theta(p^{\alpha}) for any α∈(0,2/3)\alpha\in(0,{2}/{3}), computational complexity constraints can incur a severe penalty on the statistical performance in the sense that any randomized polynomial-time test is minimax suboptimal by a polynomial factor in pp; if k=Θ(pΞ±)k=\Theta(p^{\alpha}) for any α∈(2/3,1)\alpha\in({2}/{3},1), minimax optimal detection can be attained within constant factors in linear time. Using Schatten norm loss as a representative example, we show that the hardness of attaining the minimax estimation rate can crucially depend on the loss function. Implications on the hardness of support recovery are also obtained.Comment: Published at http://dx.doi.org/10.1214/14-AOS1300 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A Tensor Approach to Learning Mixed Membership Community Models

    Get PDF
    Community detection is the task of detecting hidden communities from observed interactions. Guaranteed community detection has so far been mostly limited to models with non-overlapping communities such as the stochastic block model. In this paper, we remove this restriction, and provide guaranteed community detection for a family of probabilistic network models with overlapping communities, termed as the mixed membership Dirichlet model, first introduced by Airoldi et al. This model allows for nodes to have fractional memberships in multiple communities and assumes that the community memberships are drawn from a Dirichlet distribution. Moreover, it contains the stochastic block model as a special case. We propose a unified approach to learning these models via a tensor spectral decomposition method. Our estimator is based on low-order moment tensor of the observed network, consisting of 3-star counts. Our learning method is fast and is based on simple linear algebraic operations, e.g. singular value decomposition and tensor power iterations. We provide guaranteed recovery of community memberships and model parameters and present a careful finite sample analysis of our learning method. As an important special case, our results match the best known scaling requirements for the (homogeneous) stochastic block model
    • …
    corecore