How to compare whole genome sequences at large scale has not been achieved
via conventional methods based on pair-wisely base-to-base comparison;
nevertheless, no attention was paid to handle in-one-sitting a number of
genomes crossing genetic category (chromosome, plasmid, and phage) with farther
divergences (much less or no homologous) over large size ranges (from Kbp to
Mbp). We created a new method, GenomeFingerprinter, to unambiguously produce
three-dimensional coordinates from a sequence, followed by one
three-dimensional plot and six two-dimensional trajectory projections to
illustrate whole genome fingerprints. We further developed a set of concepts
and tools and thereby established a new method, universal genome fingerprint
analysis. We demonstrated their applications through case studies on over a
hundred of genome sequences. Particularly, we defined the total genetic
component configuration (TGCC) (i.e., chromosome, plasmid, and phage) for
describing a strain as a system, and the universal genome fingerprint map
(UGFM) of TGCC for differentiating a strain as a universal system, as well as
the systematic comparative genomics (SCG) for comparing in-one-sitting a number
of genomes crossing genetic category in diverse strains. By using UGFM,
UGFM-TGCC, and UGFM-TGCC-SCG, we compared a number of genome sequences with
farther divergences (chromosome, plasmid, and phage; bacterium, archaeal
bacterium, and virus) over large size ranges (6Kbp~5Mbp), giving new insights
into critical problematic issues in microbial genomics in the post-genomic era.
This paper provided a new method for rapidly computing, geometrically
visualizing, and intuitively comparing genome sequences at fingerprint level,
and hence established a new method of universal genome fingerprint analysis for
systematic comparative genomics.Comment: 63 pages, 15 figures, 5 table