Thermodynamic characterization of the complete set of sequence symmetric tandem mismatches in RNA and an improved model to predict the free energy contribution of sequence asymmetric tandem mismatches

Abstract

ABSTRACT: Because of the availability of an abundance of RNA sequence information, the ability to rapidly and accurately predict the secondary structure of RNA from sequence is becoming increasingly important. A common method for predicting RNA secondary structure from sequence is free energy minimization. Therefore, accurate free energy contributions for every RNA secondary structure motif are necessary for accurate secondary structure predictions. Tandem mismatches are prevalent in naturally occurring sequences and are biologically important. A common method for predicting the stability of a sequence asymmetric tandem mismatch relies on the stabilities of the two corresponding sequence symmetric tandem mismatches [Mathews, D. H., Sabina, J., Zuker, M., and Turner, D. H. (1999) J. Mol. Biol. 288, 911-940]. To improve the prediction of sequence asymmetric tandem mismatches, the experimental thermodynamic parameters for the 22 previously unmeasured sequence symmetric tandem mismatches are reported. These new data, however, do not improve prediction of the free energy contributions of sequence asymmetric tandem mismatches. Therefore, a new model, independent of sequence symmetric tandem mismatch free energies, is proposed. This model consists of two penalties to account for destabilizing tandem mismatches, two bonuses to account for stabilizing tandem mismatches, and two penalties to account for A-U and G-U adjacent base pairs. This model improves the prediction of asymmetric tandem mismatch free energy contributions and is likely to improve the prediction of RNA secondary structure from sequence. The three most common base pairs in RNA are the Watson-Crick pairs, G-C and A-U, and the wobble G-U pair. These canonical base pairs are the components of the helical portions of RNA, and they have a regular structure and hydrogen bonding pattern. However, canonical base pairs account for only approximately half of the nucleotides found in RNA (1). The other half are involved in other secondary structure motifs, such as hairpins, bulges, and internal loops. One common RNA secondary structure motif is a tandem mismatch, or 2 × 2 internal loop. Tandem mismatches occur when two adjacent, noncanonical pairs are situated within a helical portion of canonical base pairs. The presence of tandem mismatches has been confirmed in a variety of RNA secondary structures (2-10) ranging from bacteria to trinucleotide repeats in human neurological diseases. Fortunately, due to the pioneering efforts of projects such as the Human Genome Project (11, 12), entire genomes can now be sequenced accurately and efficiently. In recent years, thousands of RNA nucleotide sequences have been made publicly available (13). After a RNA sequence has been determined, the next logical step in better understanding structure and function is to determine an accurate method for predicting the secondary structure of RNA from its primary sequence. The ability to predict secondary structure of RNA from sequence is important for several reasons. The determination of secondary structure can aid in the determination of tertiary structure. Also, because of the direct relationship between structure and function, the ability to predict secondary structure of RNA gives insight into the different functions and roles that RNA may have. In addition, being able to predict the secondary structure of RNA can help with the design of pharmaceuticals by providing an accurate target site for recognition by drugs. The overwhelming importance of being able to predict RNA secondary structure from sequence has led to the development of several computer algorithm

    Similar works