Background: The Z-curve is a three dimensional representation of DNA sequences proposed over a decade ago and
has been extensively applied to sequence segmentation, horizontal gene transfer detection, and sequence analysis.
Based on the Z-curve, a “genome order index,” was proposed, which is defined as S = a2
+ c
2
+t
2
+g2
, where a, c, t,
and g are the nucleotide frequencies of A, C, T, and G, respectively. This index was found to be smaller than 1/3 for
almost all tested genomes, which was taken as support for the existence of a constraint on genome composition.
A geometric explanation for this constraint has been suggested. Each genome was represented by a point P whose
distance from the four faces of a regular tetrahedron was given by the frequencies a, c, t, and g. They claimed that
an inscribed sphere of radius r = 1/ 3 contains almost all points corresponding to various genomes, implying that
S < r
2
. The distribution of the points P obtained by S was studied using the Z-curve.
Results: In this work, we studied the basic properties of the Z-curve using the “genome order index” as a case
study. We show that (1) the calculation of the radius of the inscribed sphere of a regular tetrahedron is incorrect,
(2) the S index is narrowly distributed, (3) based on the second parity rule, the S index can be derived directly from
the Shannon entropy and is, therefore, redundant, and (4) the Z-curve suffers from over dimensionality, and the
dimension stands for GC content alone suffices to represent any given genome.
Conclusion: The “genome order index” S does not represent a constraint on nucleotide composition. Moreover,
S can be easily computed from the Gini-Simpson index and be directly derived from entropy and is redundant.
Overall, the Z-curve and S are over-complicated measures to GC content and Shannon H index, respectively.
Reviewers: This article was reviewed by Claus Wilke, Joel Bader, Marek Kimmel and Uladzislau Hryshkevich
(nominated by Itai Yanai)