We propose that the distribution of DNA words in genomic sequences can be
primarily characterized by a double Pareto-lognormal distribution, which
explains lognormal and power-law features found across all known genomes. Such
a distribution may be the result of completely random sequence evolution by
duplication processes. The parametrization of genomic word frequencies allows
for an assessment of significance for frequent or rare sequence motifs