13 research outputs found
Identifying statistical dependence in genomic sequences via mutual information estimates
Questions of understanding and quantifying the representation and amount of
information in organisms have become a central part of biological research, as
they potentially hold the key to fundamental advances. In this paper, we
demonstrate the use of information-theoretic tools for the task of identifying
segments of biomolecules (DNA or RNA) that are statistically correlated. We
develop a precise and reliable methodology, based on the notion of mutual
information, for finding and extracting statistical as well as structural
dependencies. A simple threshold function is defined, and its use in
quantifying the level of significance of dependencies between biological
segments is explored. These tools are used in two specific applications. First,
for the identification of correlations between different parts of the maize
zmSRp32 gene. There, we find significant dependencies between the 5'
untranslated region in zmSRp32 and its alternatively spliced exons. This
observation may indicate the presence of as-yet unknown alternative splicing
mechanisms or structural scaffolds. Second, using data from the FBI's Combined
DNA Index System (CODIS), we demonstrate that our approach is particularly well
suited for the problem of discovering short tandem repeats, an application of
importance in genetic profiling.Comment: Preliminary version. Final version in EURASIP Journal on
Bioinformatics and Systems Biology. See http://www.hindawi.com/journals/bsb
Using Performance Forecasting to Accelerate Elasticity
Cloud computing facilitates dynamic resource provisioning. The automation of resource management, known as elasticity, has been subject to much research. In this context, monitoring of a running service plays a crucial role, and adjustments are made when certain thresholds are crossed. On such occasions, it is common practice to simply add or remove resources. In this paper we investigate how we can predict the performance of a service to dynamically adjust allocated resources based on predictions. In other words, instead of ârepairingâ because a threshold has been crossed, we attempt to stay ahead and allocate an optimized amount of resources in advance. To do so, we need to have accurate predictive models that are based on workloads. We present our approach, based on the Universal Scalability Law, and discuss initial experiments
Statistical dependence in biological sequences
We demonstrate the use of information-theoretic tools for the task of identifying segments of hiomolecules (DNA or RNA) that are statistically correlated. We develop a precise and reliable methodology, based on the notion of mutual information, for finding and extracting statistical as well as structural dependencies. A simple threshold function is defined, and its use in quantifying the level of significance of dependencies between biological segments is explored. These tools are used in two specific applications. First, for the identification of correlations between different parts of the maize zmSRp32 gene. There, we find significant dependencies between the 5' untranslated region and its alternatively spliced exons. This observation may indicate the presence of as-yet unknown alternative splicing mechanisms or structural scaffolds. Second, using data from CODIS, we demonstrate that our approach is well suited for the problem of discovering short tandem repeals (STRs). ©2007 IEEE
High performance tensor-vector multiplication on shared-memory systems
International audienc