Search CORE

15,943 research outputs found

Identifying statistical dependence in genomic sequences via mutual information estimates

Author: Aktulga HM
Grama AY
Kontoyiannis I
Lyznik LA
Szpankowski L
Szpankowski W
Publication venue
Publication date: 01/01/2007
Field of study

Questions of understanding and quantifying the representation and amount of information in organisms have become a central part of biological research, as they potentially hold the key to fundamental advances. In this paper, we demonstrate the use of information-theoretic tools for the task of identifying segments of biomolecules (DNA or RNA) that are statistically correlated. We develop a precise and reliable methodology, based on the notion of mutual information, for finding and extracting statistical as well as structural dependencies. A simple threshold function is defined, and its use in quantifying the level of significance of dependencies between biological segments is explored. These tools are used in two specific applications. First, for the identification of correlations between different parts of the maize zmSRp32 gene. There, we find significant dependencies between the 5' untranslated region in zmSRp32 and its alternatively spliced exons. This observation may indicate the presence of as-yet unknown alternative splicing mechanisms or structural scaffolds. Second, using data from the FBI's Combined DNA Index System (CODIS), we demonstrate that our approach is particularly well suited for the problem of discovering short tandem repeats, an application of importance in genetic profiling.Comment: Preliminary version. Final version in EURASIP Journal on Bioinformatics and Systems Biology. See http://www.hindawi.com/journals/bsb

arXiv.org e-Print Archive

CiteSeerX

Springer - Publisher Connector

Directory of Open Access Journals

Learning Graphs from Linear Measurements: Fundamental Trade-offs and Applications

Author: Li Tongxin
Low Steven
Werner Lucien
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/02/2020
Field of study

We consider a specific graph learning task: reconstructing a symmetric matrix that represents an underlying graph using linear measurements. We present a sparsity characterization for distributions of random graphs (that are allowed to contain high-degree nodes), based on which we study fundamental trade-offs between the number of measurements, the complexity of the graph class, and the probability of error. We first derive a necessary condition on the number of measurements. Then, by considering a three-stage recovery scheme, we give a sufficient condition for recovery. Furthermore, assuming the measurements are Gaussian IID, we prove upper and lower bounds on the (worst-case) sample complexity for both noisy and noiseless recovery. In the special cases of the uniform distribution on trees with n nodes and the Erdős-Rényi (n,p) class, the fundamental trade-offs are tight up to multiplicative factors with noiseless measurements. In addition, for practical applications, we design and implement a polynomial-time (in n ) algorithm based on the three-stage recovery scheme. Experiments show that the heuristic algorithm outperforms basis pursuit on star graphs. We apply the heuristic algorithm to learn admittance matrices in electric grids. Simulations for several canonical graph classes and IEEE power system test cases demonstrate the effectiveness and robustness of the proposed algorithm for parameter reconstruction

arXiv.org e-Print Archive

Caltech Authors