8 research outputs found

    Local conservation scores without a priori assumptions on neutral substitution rates

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Comparative genomics aims to detect signals of evolutionary conservation as an indicator of functional constraint. Surprisingly, results of the ENCODE project revealed that about half of the experimentally verified functional elements found in non-coding DNA were classified as unconstrained by computational predictions. Following this observation, it has been hypothesized that this may be partly explained by biased estimates on neutral evolutionary rates used by existing sequence conservation metrics. All methods we are aware of rely on a comparison with the neutral rate and conservation is estimated by measuring the deviation of a particular genomic region from this rate. Consequently, it is a reasonable assumption that inaccurate neutral rate estimates may lead to biased conservation and constraint estimates.</p> <p>Results</p> <p>We propose a conservation signal that is produced by local Maximum Likelihood estimation of evolutionary parameters using an optimized sliding window and present a Kullback-Leibler projection that allows multiple different estimated parameters to be transformed into a conservation measure. This conservation measure does not rely on assumptions about neutral evolutionary substitution rates and little a priori assumptions on the properties of the conserved regions are imposed. We show the accuracy of our approach (KuLCons) on synthetic data and compare it to the scores generated by state-of-the-art methods (phastCons, GERP, SCONE) in an ENCODE region. We find that KuLCons is most often in agreement with the conservation/constraint signatures detected by GERP and SCONE while qualitatively very different patterns from phastCons are observed. Opposed to standard methods KuLCons can be extended to more complex evolutionary models, e.g. taking insertion and deletion events into account and corresponding results show that scores obtained under this model can diverge significantly from scores using the simpler model.</p> <p>Conclusion</p> <p>Our results suggest that discriminating among the different degrees of conservation is possible without making assumptions about neutral rates. We find, however, that it cannot be expected to discover considerably different constraint regions than GERP and SCONE. Consequently, we conclude that the reported discrepancies between experimentally verified functional and computationally identified constraint elements are likely not to be explained by biased neutral rate estimates.</p

    Source Coding Scheme for Multiple Sequence Alignments

    No full text
    Rapid development of DNA sequencing technologies exponentially increases the amount of publicly available genomic data. Whole genome multiple sequence alignments represent a particularly voluminous, frequently downloaded static dataset. In this work we propose an asymmetric source coding scheme for such alignments using evolutionary prediction in combination with lossless black and white image compression. Compared to the Lempel-Ziv algorithm used so far the compression rates are almost halved.

    On genomic coding theory †

    No full text
    SUMMARY This paper gives a brief overview of several applications from the emerging interdisciplinary field of genomic coding theory that aims at applying concepts and techniques from the field of coding theory to problems from the field of molecular biology. This is motivated by the high precision and robustness found in genomic processes in addition to the increase in the availability of genomic data for a wide range of species. The considered applications include source coding for DNA classification, channel coding for modelling gene expression with emphasis on the process of translation, existence of error correcting codes in the DNA and channel coding structure in the genetic code. Example results are presented that demonstrate the relevance of the proposed approaches and open questions are formulated to suggest future research work

    On Genomic Coding Theory

    No full text
    Abstract. This paper gives a brief overview of several applications from the emerging interdisciplinary field of genomic coding theory that aims at applying concepts and techniques from the field of coding theory to problems from the field of molecular biology. This is motivated by the high precision and robustness found in genomic processes in addition to the increase in the availability of genomic data for a wide range of species. The considered applications include source coding for DNA classification, channel coding for modeling gene expression with emphasis on the process of translation, existence of error correcting codes in the DNA, and channel coding structure in the genetic code. Example results are presented that demonstrate the relevance of the proposed approaches and open questions are formulated to suggest future research work
    corecore