3 research outputs found
Simplifying the mosaic description of DNA sequences
By using the Jensen-Shannon divergence, genomic DNA can be divided into
compositionally distinct domains through a standard recursive segmentation
procedure. Each domain, while significantly different from its neighbours, may
however share compositional similarity with one or more distant
(non--neighbouring) domains. We thus obtain a coarse--grained description of
the given DNA string in terms of a smaller set of distinct domain labels. This
yields a minimal domain description of a given DNA sequence, significantly
reducing its organizational complexity. This procedure gives a new means of
evaluating genomic complexity as one examines organisms ranging from bacteria
to human. The mosaic organization of DNA sequences could have originated from
the insertion of fragments of one genome (the parasite) inside another (the
host), and we present numerical experiments that are suggestive of this
scenario.Comment: 16 pages, 1 figure, Accepted for publication in Phys. Rev.