We present a sparse knowledge gradient (SpKG) algorithm for adaptively
selecting the targeted regions within a large RNA molecule to identify which
regions are most amenable to interactions with other molecules. Experimentally,
such regions can be inferred from fluorescence measurements obtained by binding
a complementary probe with fluorescence markers to the targeted regions. We use
a biophysical model which shows that the fluorescence ratio under the log scale
has a sparse linear relationship with the coefficients describing the
accessibility of each nucleotide, since not all sites are accessible (due to
the folding of the molecule). The SpKG algorithm uniquely combines the Bayesian
ranking and selection problem with the frequentist ℓ1​ regularized
regression approach Lasso. We use this algorithm to identify the sparsity
pattern of the linear model as well as sequentially decide the best regions to
test before experimental budget is exhausted. Besides, we also develop two
other new algorithms: batch SpKG algorithm, which generates more suggestions
sequentially to run parallel experiments; and batch SpKG with a procedure which
we call length mutagenesis. It dynamically adds in new alternatives, in the
form of types of probes, are created by inserting, deleting or mutating
nucleotides within existing probes. In simulation, we demonstrate these
algorithms on the Group I intron (a mid-size RNA molecule), showing that they
efficiently learn the correct sparsity pattern, identify the most accessible
region, and outperform several other policies