Statistics in ranked lists is important in analyzing molecular biology
measurement data, such as ChIP-seq, which yields ranked lists of genomic
sequences. State of the art methods study fixed motifs in ranked lists. More
flexible models such as position weight matrix (PWM) motifs are not addressed
in this context. To assess the enrichment of a PWM motif in a ranked list we
use a PWM induced second ranking on the same set of elements. Possible orders
of one ranked list relative to the other are modeled by permutations. Due to
sample space complexity, it is difficult to characterize tail distributions in
the group of permutations. In this paper we develop tight upper bounds on tail
distributions of the size of the intersection of the top of two uniformly and
independently drawn permutations and demonstrate advantages of this approach
using our software implementation, mmHG-Finder, to study PWMs in several
datasets.Comment: Peer-reviewed and presented as part of the 13th Workshop on
Algorithms in Bioinformatics (WABI2013