Search CORE

3 research outputs found

Statistical analysis of the DNA sequence of human chromosome 22

Author: Grosse I.
Herzel H.
Holste D.
Publication venue: 'American Physical Society (APS)'
Publication date: 01/10/2001
Field of study

We study statistical patterns in the DNA sequence of human chromosome 22, the first completely sequenced human chromosome. We find that (i) the 33.4 x 10(6) nucleotide long human chromosome exhibits long-range power-law correlations over more than four orders of magnitude, (ii) the entropies H-n of the frequency distribution of oligonucleotides of length n (n-mers) grow sublinearly with increasing n, indicating the presence of higher-order correlations for all of the studied lengths 1 less than or equal to n less than or equal to 10, and (iii) the generalized entropies H-n(q) of n-mers decrease monotonically with increasing q and the decay of H-n(q) with q becomes steeper with increasing n less than or equal to 10, indicating that the frequency distribution of oligonucleotides becomes increasingly nonuniform as the length n increases. We investigate to what degree known biological features may explain the observed statistical patterns. We find that (iv) the presence of interspersed repeats may cause the sublinear increase of H-n with n, and that (v) the presence of monomeric tandem repeats as well as the suppression of CG dinucleotides may cause the observed decay of H-n(q) with q

Cold Spring Harbor Laboratory Institutional Repository

Entropy estimates of small data sets

Author: Abramowitz M
Blythe R A
Bonachela J A Hinrichsen H Muñoz M A
Cover T M
Grassberger P
Harris B
Haye Hinrichsen
Herzel H
Holste D
Holste D Herzel H Grosse I
Juan A Bonachela
Khazanie R G
Miguel A Muñoz
Miller G
Rényi A
Rényi A
Schürmann T
Shannon C E
Publication venue: 'IOP Publishing'
Publication date: 29/04/2008
Field of study

Estimating entropies from limited data series is known to be a non-trivial task. Naive estimations are plagued with both systematic (bias) and statistical errors. Here, we present a new 'balanced estimator' for entropy functionals Shannon, R\'enyi and Tsallis) specially devised to provide a compromise between low bias and small statistical errors, for short data series. This new estimator out-performs other currently available ones when the data sets are small and the probabilities of the possible outputs of the random variable are not close to zero. Otherwise, other well-known estimators remain a better choice. The potential range of applicability of this estimator is quite broad specially for biological and digital data series.Comment: 11 pages, 2 figure

arXiv.org e-Print Archive

Crossref

University of Strathclyde Institutional Repository

Repeats and correlations in human DNA sequences

Author: Beirer S.
Grosse I.
Herzel H.
Holste D.
Schieg P.
Publication venue: 'American Physical Society (APS)'
Publication date: 01/06/2003
Field of study

We study the nucleotide-nucleotide mutual information function I(k) of the DNA sequences of the three completely sequenced human chromosomes 20, 21, and 22. We find in each human chromosome (i) the absence of the k=3 base pair (bp) sequence periodicity characteristic for protein coding regions, (ii) the absence of the k=10-11 bp sequence periodicity characteristic for both protein secondary structure and DNA bendability, and (iii) the presence of significant statistical dependencies at about k=135 bp and at about k=165 bp. We investigate to which degree the density and composition of interspersed repeats might explain these observed statistical patterns in all three human chromosomes. We use simple stochastic models to substitute known interspersed repeats and find by numerical studies that (iv) the presence of interspersed repeats dominates short-range correlations as measured by I(k) on the scale of several hundred base pairs in human chromosomes 20, 21, and 22. On the other hand, we find that (v) interspersed repeats contribute only weakly to long-range correlations due to the clustering of highly abundant Alu repeats

Cold Spring Harbor Laboratory Institutional Repository