Motivation: Capillary electrophoresis (CE) of nucleic acids is a workhorse
technology underlying high-throughput genome analysis and large-scale chemical
mapping for nucleic acid structural inference. Despite the wide availability of
CE-based instruments, there remain challenges in leveraging their full power
for quantitative analysis of RNA and DNA structure, thermodynamics, and
kinetics. In particular, the slow rate and poor automation of available
analysis tools have bottlenecked a new generation of studies involving hundreds
of CE profiles per experiment.
Results: We propose a computational method called high-throughput robust
analysis for capillary electrophoresis (HiTRACE) to automate the key tasks in
large-scale nucleic acid CE analysis, including the profile alignment that has
heretofore been a rate-limiting step in the highest throughput experiments. We
illustrate the application of HiTRACE on thirteen data sets representing 4
different RNAs, three chemical modification strategies, and up to 480 single
mutant variants; the largest data sets each include 87,360 bands. By applying a
series of robust dynamic programming algorithms, HiTRACE outperforms prior
tools in terms of alignment and fitting quality, as assessed by measures
including the correlation between quantified band intensities between replicate
data sets. Furthermore, while the smallest of these data sets required 7 to 10
hours of manual intervention using prior approaches, HiTRACE quantitation of
even the largest data sets herein was achieved in 3 to 12 minutes. The HiTRACE
method therefore resolves a critical barrier to the efficient and accurate
analysis of nucleic acid structure in experiments involving tens of thousands
of electrophoretic bands.Comment: Revised to include Supplement. Availability: HiTRACE is freely
available for download at http://hitrace.stanford.ed