12,935 research outputs found

    Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

    Get PDF
    We investigate the relationship between the structure of a discrete graphical model and the support of the inverse of a generalized covariance matrix. We show that for certain graph structures, the support of the inverse covariance matrix of indicator variables on the vertices of a graph reflects the conditional independence structure of the graph. Our work extends results that have previously been established only in the context of multivariate Gaussian graphical models, thereby addressing an open question about the significance of the inverse covariance matrix of a non-Gaussian distribution. The proof exploits a combination of ideas from the geometry of exponential families, junction tree theory and convex analysis. These population-level results have various consequences for graph selection methods, both known and novel, including a novel method for structure estimation for missing or corrupted observations. We provide nonasymptotic guarantees for such methods and illustrate the sharpness of these predictions via simulations.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1162 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    The Road From Classical to Quantum Codes: A Hashing Bound Approaching Design Procedure

    Full text link
    Powerful Quantum Error Correction Codes (QECCs) are required for stabilizing and protecting fragile qubits against the undesirable effects of quantum decoherence. Similar to classical codes, hashing bound approaching QECCs may be designed by exploiting a concatenated code structure, which invokes iterative decoding. Therefore, in this paper we provide an extensive step-by-step tutorial for designing EXtrinsic Information Transfer (EXIT) chart aided concatenated quantum codes based on the underlying quantum-to-classical isomorphism. These design lessons are then exemplified in the context of our proposed Quantum Irregular Convolutional Code (QIRCC), which constitutes the outer component of a concatenated quantum code. The proposed QIRCC can be dynamically adapted to match any given inner code using EXIT charts, hence achieving a performance close to the hashing bound. It is demonstrated that our QIRCC-based optimized design is capable of operating within 0.4 dB of the noise limit

    Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs

    Full text link
    Laplacian mixture models identify overlapping regions of influence in unlabeled graph and network data in a scalable and computationally efficient way, yielding useful low-dimensional representations. By combining Laplacian eigenspace and finite mixture modeling methods, they provide probabilistic or fuzzy dimensionality reductions or domain decompositions for a variety of input data types, including mixture distributions, feature vectors, and graphs or networks. Provable optimal recovery using the algorithm is analytically shown for a nontrivial class of cluster graphs. Heuristic approximations for scalable high-performance implementations are described and empirically tested. Connections to PageRank and community detection in network analysis demonstrate the wide applicability of this approach. The origins of fuzzy spectral methods, beginning with generalized heat or diffusion equations in physics, are reviewed and summarized. Comparisons to other dimensionality reduction and clustering methods for challenging unsupervised machine learning problems are also discussed.Comment: 13 figures, 35 reference

    Haplotype Assembly: An Information Theoretic View

    Full text link
    This paper studies the haplotype assembly problem from an information theoretic perspective. A haplotype is a sequence of nucleotide bases on a chromosome, often conveniently represented by a binary string, that differ from the bases in the corresponding positions on the other chromosome in a homologous pair. Information about the order of bases in a genome is readily inferred using short reads provided by high-throughput DNA sequencing technologies. In this paper, the recovery of the target pair of haplotype sequences using short reads is rephrased as a joint source-channel coding problem. Two messages, representing haplotypes and chromosome memberships of reads, are encoded and transmitted over a channel with erasures and errors, where the channel model reflects salient features of high-throughput sequencing. The focus of this paper is on the required number of reads for reliable haplotype reconstruction, and both the necessary and sufficient conditions are presented with order-wise optimal bounds.Comment: 30 pages, 5 figures, 1 tabel, journa

    Rank Minimization over Finite Fields: Fundamental Limits and Coding-Theoretic Interpretations

    Full text link
    This paper establishes information-theoretic limits in estimating a finite field low-rank matrix given random linear measurements of it. These linear measurements are obtained by taking inner products of the low-rank matrix with random sensing matrices. Necessary and sufficient conditions on the number of measurements required are provided. It is shown that these conditions are sharp and the minimum-rank decoder is asymptotically optimal. The reliability function of this decoder is also derived by appealing to de Caen's lower bound on the probability of a union. The sufficient condition also holds when the sensing matrices are sparse - a scenario that may be amenable to efficient decoding. More precisely, it is shown that if the n\times n-sensing matrices contain, on average, \Omega(nlog n) entries, the number of measurements required is the same as that when the sensing matrices are dense and contain entries drawn uniformly at random from the field. Analogies are drawn between the above results and rank-metric codes in the coding theory literature. In fact, we are also strongly motivated by understanding when minimum rank distance decoding of random rank-metric codes succeeds. To this end, we derive distance properties of equiprobable and sparse rank-metric codes. These distance properties provide a precise geometric interpretation of the fact that the sparse ensemble requires as few measurements as the dense one. Finally, we provide a non-exhaustive procedure to search for the unknown low-rank matrix.Comment: Accepted to the IEEE Transactions on Information Theory; Presented at IEEE International Symposium on Information Theory (ISIT) 201

    Robust Recovery of Subspace Structures by Low-Rank Representation

    Full text link
    In this work we address the subspace recovery problem. Given a set of data samples (vectors) approximately drawn from a union of multiple subspaces, our goal is to segment the samples into their respective subspaces and correct the possible errors as well. To this end, we propose a novel method termed Low-Rank Representation (LRR), which seeks the lowest-rank representation among all the candidates that can represent the data samples as linear combinations of the bases in a given dictionary. It is shown that LRR well solves the subspace recovery problem: when the data is clean, we prove that LRR exactly captures the true subspace structures; for the data contaminated by outliers, we prove that under certain conditions LRR can exactly recover the row space of the original data and detect the outlier as well; for the data corrupted by arbitrary errors, LRR can also approximately recover the row space with theoretical guarantees. Since the subspace membership is provably determined by the row space, these further imply that LRR can perform robust subspace segmentation and error correction, in an efficient way.Comment: IEEE Trans. Pattern Analysis and Machine Intelligenc

    Noise-Resilient Group Testing: Limitations and Constructions

    Full text link
    We study combinatorial group testing schemes for learning dd-sparse Boolean vectors using highly unreliable disjunctive measurements. We consider an adversarial noise model that only limits the number of false observations, and show that any noise-resilient scheme in this model can only approximately reconstruct the sparse vector. On the positive side, we take this barrier to our advantage and show that approximate reconstruction (within a satisfactory degree of approximation) allows us to break the information theoretic lower bound of Ω~(d2logn)\tilde{\Omega}(d^2 \log n) that is known for exact reconstruction of dd-sparse vectors of length nn via non-adaptive measurements, by a multiplicative factor Ω~(d)\tilde{\Omega}(d). Specifically, we give simple randomized constructions of non-adaptive measurement schemes, with m=O(dlogn)m=O(d \log n) measurements, that allow efficient reconstruction of dd-sparse vectors up to O(d)O(d) false positives even in the presence of δm\delta m false positives and O(m/d)O(m/d) false negatives within the measurement outcomes, for any constant δ<1\delta < 1. We show that, information theoretically, none of these parameters can be substantially improved without dramatically affecting the others. Furthermore, we obtain several explicit constructions, in particular one matching the randomized trade-off but using m=O(d1+o(1)logn)m = O(d^{1+o(1)} \log n) measurements. We also obtain explicit constructions that allow fast reconstruction in time \poly(m), which would be sublinear in nn for sufficiently sparse vectors. The main tool used in our construction is the list-decoding view of randomness condensers and extractors.Comment: Full version. A preliminary summary of this work appears (under the same title) in proceedings of the 17th International Symposium on Fundamentals of Computation Theory (FCT 2009
    corecore