In this work we study reverse complementary genomic word pairs in the human
DNA, by comparing both the distance distribution and the frequency of a word to
those of its reverse complement. Several measures of dissimilarity between
distance distributions are considered, and it is found that the peak
dissimilarity works best in this setting. We report the existence of reverse
complementary word pairs with very dissimilar distance distributions, as well
as word pairs with very similar distance distributions even when both
distributions are irregular and contain strong peaks. The association between
distribution dissimilarity and frequency discrepancy is explored also, and it
is speculated that symmetric pairs combining low and high values of each
measure may uncover features of interest. Taken together, our results suggest
that some asymmetries in the human genome go far beyond Chargaff's rules. This
study uses both the complete human genome and its repeat-masked version.Comment: Post-print of a paper accepted to publication in "Interdisciplinary
Sciences: Computational Life Sciences" (ISSN: 1913-2751, ESSN: 1867-1462