A Systematic Evolution of Ligands by EXponential enrichment (SELEX)
experiment begins in round one with a random pool of oligonucleotides in
equilibrium solution with a target. Over a few rounds, oligonucleotides having
a high affinity for the target are selected. Data from a high throughput SELEX
experiment consists of lists of thousands of oligonucleotides sampled after
each round. Thus far, SELEX experiments have been very good at suggesting the
highest affinity oligonucleotide, but modeling lower affinity recognition site
variants has been difficult. Furthermore, an alignment step has always been
used prior to analyzing SELEX data. We present a novel model, based on a
biochemical parametrization of SELEX, which allows us to use data from all
rounds to estimate the affinities of the oligonucleotides. Most notably, our
model also aligns the oligonucleotides. We use our model to analyze a SELEX
experiment containing double stranded DNA oligonucleotides and the
transcription factor Bicoid as the target. Our SELEX model outperformed other
published methods for predicting putative binding sites for Bicoid as indicated
by the results of an in-vivo ChIP-chip experiment.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS537 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org