2 research outputs found
Statistical analysis of simple repeats in the human genome
The human genome contains repetitive DNA at different level of sequence
length, number and dispersion. Highly repetitive DNA is particularly rich in
homo-- and di--nucleotide repeats, while middle repetitive DNA is rich of
families of interspersed, mobile elements hundreds of base pairs (bp) long,
among which the Alu families. A link between homo- and di-polymeric tracts and
mobile elements has been recently highlighted. In particular, the mobility of
Alu repeats, which form 10% of the human genome, has been correlated with the
length of poly(A) tracts located at one end of the Alu. These tracts have a
rigid and non-bendable structure and have an inhibitory effect on nucleosomes,
which normally compact the DNA. We performed a statistical analysis of the
genome-wide distribution of lengths and inter--tract separations of poly(X) and
poly(XY) tracts in the human genome. Our study shows that in humans the length
distributions of these sequences reflect the dynamics of their expansion and
DNA replication. By means of general tools from linguistics, we show that the
latter play the role of highly-significant content-bearing terms in the DNA
text. Furthermore, we find that such tracts are positioned in a non-random
fashion, with an apparent periodicity of 150 bases. This allows us to extend
the link between repetitive, highly mobile elements such as Alus and
low-complexity words in human DNA. More precisely, we show that Alus are
sources of poly(X) tracts, which in turn affect in a subtle way the combination
and diversification of gene expression and the fixation of multigene families