Predicting G-quadruplex Formation

Abstract

Guanine-rich regions of genomic DNA can spontaneously fold into secondary structures called G-quadruplexes (GQs). Akin to tiny switches, GQs regulate genetic processes through their folding and unfolding. Their interest to basic science, as well as their potential as therapeutic targets for human diseases, has motivated the creation of computational tools for their prediction. Currently, GQ folding predictors are based on results from studies of GQs formed in single-stranded DNA. As a result, existing tools perform poorly when applied to the prediction of GQ formation in double-stranded (ds) DNA, the native context within which genomic GQs are found. Here, we present a probabilistic model of GQ formation, which is learned from large-scale human genomic pull-down experiments and applied to the analysis of gene ontological data. Advances in the characterization of GQs in dsDNA have enabled us to integrate results from small-molecule binding assays and singlemolecule FRET microscopy into our model. In order to obtain training sets of sequences, we identified nearly 700,000 unique, potential GQs and categorized them according to pulldown experiment outcomes. Model parameters learned from these training sets agree with experimental evidence and, when asked to predict the folding of dsDNA GQ sequences, outperformed existing models of GQ folding. This tool can be applied to genomic sequences to locate the most strongly forming GQs, revealing valuable information for the design of GQ-targeting therapies, and represents the next step toward the practical, widespread use of GQs in medicine and technology.Ope

    Similar works