36 research outputs found

    Improved Sum-of-Squares Lower Bounds for Hidden Clique and Hidden Submatrix Problems

    Full text link
    Given a large data matrix A∈RnΓ—nA\in\mathbb{R}^{n\times n}, we consider the problem of determining whether its entries are i.i.d. with some known marginal distribution Aij∼P0A_{ij}\sim P_0, or instead AA contains a principal submatrix AQ,QA_{{\sf Q},{\sf Q}} whose entries have marginal distribution Aij∼P1β‰ P0A_{ij}\sim P_1\neq P_0. As a special case, the hidden (or planted) clique problem requires to find a planted clique in an otherwise uniformly random graph. Assuming unbounded computational resources, this hypothesis testing problem is statistically solvable provided ∣Q∣β‰₯Clog⁑n|{\sf Q}|\ge C \log n for a suitable constant CC. However, despite substantial effort, no polynomial time algorithm is known that succeeds with high probability when ∣Q∣=o(n)|{\sf Q}| = o(\sqrt{n}). Recently Meka and Wigderson \cite{meka2013association}, proposed a method to establish lower bounds within the Sum of Squares (SOS) semidefinite hierarchy. Here we consider the degree-44 SOS relaxation, and study the construction of \cite{meka2013association} to prove that SOS fails unless kβ‰₯C n1/3/log⁑nk\ge C\, n^{1/3}/\log n. An argument presented by Barak implies that this lower bound cannot be substantially improved unless the witness construction is changed in the proof. Our proof uses the moments method to bound the spectrum of a certain random association scheme, i.e. a symmetric random matrix whose rows and columns are indexed by the edges of an Erd\"os-Renyi random graph.Comment: 40 pages, 1 table, conferenc

    Information-theoretically Optimal Sparse PCA

    Full text link
    Sparse Principal Component Analysis (PCA) is a dimensionality reduction technique wherein one seeks a low-rank representation of a data matrix with additional sparsity constraints on the obtained representation. We consider two probabilistic formulations of sparse PCA: a spiked Wigner and spiked Wishart (or spiked covariance) model. We analyze an Approximate Message Passing (AMP) algorithm to estimate the underlying signal and show, in the high dimensional limit, that the AMP estimates are information-theoretically optimal. As an immediate corollary, our results demonstrate that the posterior expectation of the underlying signal, which is often intractable to compute, can be obtained using a polynomial-time scheme. Our results also effectively provide a single-letter characterization of the sparse PCA problem.Comment: 5 pages, 1 figure, conferenc

    Asymptotic Mutual Information for the Two-Groups Stochastic Block Model

    Full text link
    We develop an information-theoretic view of the stochastic block model, a popular statistical model for the large-scale structure of complex networks. A graph GG from such a model is generated by first assigning vertex labels at random from a finite alphabet, and then connecting vertices with edge probabilities depending on the labels of the endpoints. In the case of the symmetric two-group model, we establish an explicit `single-letter' characterization of the per-vertex mutual information between the vertex labels and the graph. The explicit expression of the mutual information is intimately related to estimation-theoretic quantities, and --in particular-- reveals a phase transition at the critical point for community detection. Below the critical point the per-vertex mutual information is asymptotically the same as if edges were independent. Correspondingly, no algorithm can estimate the partition better than random guessing. Conversely, above the threshold, the per-vertex mutual information is strictly smaller than the independent-edges upper bound. In this regime there exists a procedure that estimates the vertex labels better than random guessing.Comment: 41 pages, 3 pdf figure

    Agreement and Statistical Efficiency in Bayesian Perception Models

    Full text link
    Bayesian models of group learning are studied in Economics since the 1970s and more recently in computational linguistics. The models from Economics postulate that agents maximize utility in their communication and actions. The Economics models do not explain the ``probability matching" phenomena that are observed in many experimental studies. To address these observations, Bayesian models that do not formally fit into the economic utility maximization framework were introduced. In these models individuals sample from their posteriors in communication. In this work we study the asymptotic behavior of such models on connected networks with repeated communication. Perhaps surprisingly, despite the fact that individual agents are not utility maximizers in the classical sense, we establish that the individuals ultimately agree and furthermore show that the limiting posterior is Bayes optimal.Comment: 15 pages, 2 figure
    corecore