We propose a computational method to measure and visualize interrelationships
among any number of DNA sequences allowing, for example, the examination of
hundreds or thousands of complete mitochondrial genomes. An "image distance" is
computed for each pair of graphical representations of DNA sequences, and the
distances are visualized as a Molecular Distance Map: Each point on the map
represents a DNA sequence, and the spatial proximity between any two points
reflects the degree of structural similarity between the corresponding
sequences. The graphical representation of DNA sequences utilized, Chaos Game
Representation (CGR), is genome- and species-specific and can thus act as a
genomic signature. Consequently, Molecular Distance Maps could inform species
identification, taxonomic classifications and, to a certain extent,
evolutionary history. The image distance employed, Structural Dissimilarity
Index (DSSIM), implicitly compares the occurrences of oligomers of length up to
k (herein k=9) in DNA sequences. We computed DSSIM distances for more than
5 million pairs of complete mitochondrial genomes, and used Multi-Dimensional
Scaling (MDS) to obtain Molecular Distance Maps that visually display the
sequence relatedness in various subsets, at different taxonomic levels. This
general-purpose method does not require DNA sequence homology and can thus be
used to compare similar or vastly different DNA sequences, genomic or
computer-generated, of the same or different lengths. We illustrate potential
uses of this approach by applying it to several taxonomic subsets: phylum
Vertebrata, (super)kingdom Protista, classes Amphibia-Insecta-Mammalia, class
Amphibia, and order Primates. This analysis of an extensive dataset confirms
that the oligomer composition of full mtDNA sequences can be a source of
taxonomic information.Comment: 14 pages, 7 figures. arXiv admin note: substantial text overlap with
arXiv:1307.375