15 research outputs found

    Study of RNA Secondary Structure Prediction Algorithms

    Get PDF
    Dynamic programming algorithms such as Nussinov algorithm and Zuker algorithm define criteria to search the most stable RNA secondary structures. Stochastic Context-Free Grammar (SCFG) predicts the most possible RNA secondary structure using context-free grammar and a defined set of probabilities for each grammar rule. These algorithms form the base of using computer programs to predict RNA secondary structures without pseudoknots. In this report, we review these RNA secondary structure prediction algorithms and present our own software implementations of these algorithms. The Nussinov algorithm is easy to understand. But our results show that the Nussinov algorithm is overly simplified and can not produce the most accurate result. The SCFG algorithm may be powerful. But its result is also inaccurate because there are no accurate probabilities for each corresponding grammar rule. The Zuker’s minimum free energy method incorporated far more biological knowledge in its energy definitions. Thus, its predictions are much better than the other two algorithms. Our implementations use both recursive and non-recursive function calls. Recursion is easy to understand, but recursion introduces significant overhead. We are able to rearrange the function calls to effectively stop the recursion. The non-recursion feature allows us to parallelize the most computing intensive part of the calculation. By abstracting a secondary structure to a tree representation and a string representation, we compared our prediction results with the results from experiment measurement or non-conventional general purpose computational methods, and results from popular package such as MFOLD. Our results also illustrate the limitation of these algorithms. The limitations clearly demonstrate that more biological and chemical knowledge of RNA need to be incorporated into the RNA secondary structure prediction algorithms

    Mutations in the 3'-untranslated region of GATA4 as molecular hotspots for congenital heart disease (CHD)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The 3'-untranslated region (3'-UTR) of mRNA contains regulatory elements that are essential for the appropriate expression of many genes. These regulatory elements are involved in the control of nuclear transport, polyadenylation status, subcellular targetting as well as rates of translation and degradation of mRNA. Indeed, 3'-UTR mutations have been associated with disease, but frequently this region is not analyzed. To gain insights into congenital heart disease (CHD), we have been analyzing cardiac-specific transcription factor genes, including <it>GATA4</it>, which encodes a zinc finger transcription factor. Germline mutations in the coding region of <it>GATA4 </it>have been associated with septation defects of the human heart, but mutations are rather rare. Previously, we identified 19 somatically-derived zinc finger mutations in diseased tissues of malformed hearts. We now continued our search in the 609 bp 3'-UTR region of <it>GATA4 </it>to explore further molecular avenues leading to CHD.</p> <p>Methods</p> <p>By direct sequencing, we analyzed the 3'-UTR of <it>GATA4 </it>in DNA isolated from 68 formalin-fixed explanted hearts with complex cardiac malformations encompassing ventricular, atrial, and atrioventricular septal defects. We also analyzed blood samples of 12 patients with CHD and 100 unrelated healthy individuals.</p> <p>Results</p> <p>We identified germline and somatic mutations in the 3'-UTR of <it>GATA4</it>. In the malformed hearts, we found nine frequently occurring sequence alterations and six dbSNPs in the 3'-UTR region of <it>GATA4</it>. Seven of these mutations are predicted to affect RNA folding. We also found further five nonsynonymous mutations in exons 6 and 7 of <it>GATA4</it>. Except for the dbSNPs, analysis of tissue distal to the septation defect failed to detect sequence variations in the same donor, thus suggesting somatic origin and mosaicism of mutations. In a family, we observed c.+119A > T in the 3'-UTR associated with ASD type II.</p> <p>Conclusion</p> <p>Our results suggest that somatic <it>GATA4 </it>mutations in the 3'-UTR may provide an additional molecular rationale for CHD.</p

    Prediction of RNA secondary structure with pseudoknots using integer programming

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>RNA secondary structure prediction is one major task in bioinformatics, and various computational methods have been proposed so far. Pseudoknot is one of the typical substructures appearing in several RNAs, and plays an important role in some biological processes. Prediction of RNA secondary structure with pseudoknots is still challenging since the problem is NP-hard when arbitrary pseudoknots are taken into consideration.</p> <p>Results</p> <p>We introduce a new method of predicting RNA secondary structure with pseudoknots based on integer programming. In our formulation, we aim at minimizing the value of the objective function that reflects free energy of a folding structure of an input RNA sequence. We focus on a practical class of pseudoknots by setting constraints appropriately. Experimental results for a set of real RNA sequences show that our proposed method outperforms several existing methods in sensitivity. Furthermore, for a set of sequences of small length, our approach achieved good performance in both sensitivity and specificity.</p> <p>Conclusion</p> <p>Our integer programming-based approach for RNA structure prediction is flexible and extensible.</p

    Fine-grained parallel RNAalifold algorithm for RNA secondary structure prediction on FPGA

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In the field of RNA secondary structure prediction, the RNAalifold algorithm is one of the most popular methods using free energy minimization. However, general-purpose computers including parallel computers or multi-core computers exhibit parallel efficiency of no more than 50%. Field Programmable Gate-Array (FPGA) chips provide a new approach to accelerate RNAalifold by exploiting fine-grained custom design.</p> <p>Results</p> <p>RNAalifold shows complicated data dependences, in which the dependence distance is variable, and the dependence direction is also across two dimensions. We propose a systolic array structure including one master Processing Element (PE) and multiple slave PEs for fine grain hardware implementation on FPGA. We exploit data reuse schemes to reduce the need to load energy matrices from external memory. We also propose several methods to reduce energy table parameter size by 80%.</p> <p>Conclusion</p> <p>To our knowledge, our implementation with 16 PEs is the only FPGA accelerator implementing the complete RNAalifold algorithm. The experimental results show a factor of 12.2 speedup over the RNAalifold (<it>ViennaPackage </it>– 1.6.5) software for a group of aligned RNA sequences with 2981-residue running on a Personal Computer (PC) platform with Pentium 4 2.6 GHz CPU.</p

    Constrained Secondary Structure Prediction Using Stem Detection

    Get PDF
    RNA sequence analysis and structure prediction are classical topics of computational biology and a powerful tool to examine complex genomic data. Over the decades, various tools have been developed to predict RNA secondary structures and sequence alignments, a majority of which utilize one of the two characteristic approaches: (a) thermodynamic minimum free energy or (b) probabilistic maximum likelihood prediction. However, despite numerous takes on modeling these approaches, the computational complexity of the developed algorithms hasn’t seen significant improvements. Most algorithms still operate with a polynomial time complexity of O(N3?). This cost is significantly large while processing large RNA sequences with hundreds of bases. In this thesis, a constrained structure prediction algorithm is presented that aims to diminish the computational overhead of traditional RNA structure prediction methods to O(N?2). The proposed algorithm employs pattern recognition methods to devise rules for constructing a confined space of possible secondary structures. This confined structure space is then searched to find a secondary structure that satisfies the optimality criterion. Through this document, we present the design details of the proposed algorithm implemented using the minimum free energy (MFE) model. Later, we compare its performance to Zuker’s algorithm which is the conventional dynamic programming equivalent of the MFE model. The proposed algorithm provides a significant reduction in CPU time to process longer sequences which can be attributed to its lower computational complexity

    Lock-free Parallel Dynamic Programming

    Get PDF
    We show a method for parallelizing top down dynamic programs in a straightforward way by a careful choice of a lock-free shared hash table implementation and randomization of the order in which the dynamic program computes its subproblems. This generic approach is applied to dynamic programs for knapsack, shortest paths, and RNA structure alignment, as well as to a state-of-the-art solution for minimizing the máximum number of open stacks. Experimental results are provided on three different modern multicore architectures which show that this parallelization is effective and reasonably scalable. In particular, we obtain over 10 times speedup for 32 threads on the open stacks problem

    Monte Carlo simulation studies of DNA hybridization and DNA-directed nanoparticle assembly

    Get PDF
    A coarse-grained lattice model of DNA oligonucleotides is proposed to investigate how fundamental thermodynamic processes are encoded by the nucleobase sequence at the microscopic level, and to elucidate the general mechanisms by which single-stranded oligonucleotides hybridize to their complements either in solution or when tethered to nanoparticles. Molecular simulations based on a high-coordination cubic lattice are performed using the Monte Carlo method. The dependence of the model's thermal stability on sequence complementarity is shown to be qualitatively consistent with experiment and statistical mechanical models. From the analysis of the statistical distribution of base-paired states and of the associated free-energy landscapes, two general hybridization scenarios are found. For sequences that do not follow a two-state process, hybridization is weakly cooperative and proceeds in multiple sequential steps involving stable intermediates with increasing number of paired bases. In contrast, sequences that conform to two-state thermodynamics exhibit moderately rough landscapes, in which multiple metastable intermediates appear over broad free-energy barriers. These intermediates correspond to duplex species that bridge the configurational and energetic gaps between duplex and denatured states with minimal loss of conformational entropy, and lead to a strongly cooperative hybridization. Remarkably, two-state thermodynamic signatures are generally observed in both scenarios. The role of cooperativity in the assembly of nanoparticles tethered with model DNA oligonucleotides is similarly addressed with the Monte Carlo method, where nanoparticles are represented as finely discretized hard-core spheres on a cubic lattice. The energetic and structural mechanisms of self-assembling are investigated by simulating the aggregation of small "satellite" particles from the bulk onto a large "core" particle. A remarkable enhancement of the system's thermal stability is attained by increasing the number of strands per satellite particle available to hybridize with those on the core particle. This cooperative process is driven by the formation of multiple bridging duplexes under favorable conditions of reduced translational entropy and the resultant energetic compensation; this behavior rapidly weakens above a certain threshold of linker strands per satellite particle. Cooperativity also enhances the structural organization of the assemblies by systematically narrowing the radial distribution of the satellite particles bound the core

    Natural selection on MRNA secondary structure and its correlation with protein functional groups

    Get PDF
    Natural selection may occur at multiple levels of the biological hierarchy, including at the molecular level. It may occur on any phenotypic trait that evidences variation and that is heritable. This research uses computational methods to investigate whether the stability of the secondary structures of mRNAs has been the subject of natural selection. The DNA sequence that codes for a particular target protein is only partially determined by that protein, since the redundancy of the genetic code permits multiple possible synonymous codons for each peptide. An RNA transcript of a DNA protein template (gene) folds back on itself through complementary base pairing, resulting in an mRNA secondary structure. This mRNA secondary structure tends to have a configuration that minimizes free energy. Two synonymous mRNAs, coding for the identical protein with different sets of synonymous codons, will in general fold into different secondary structures with different minimum free energies (MFEs). The secondary structure of an mRNA is therefore a phenotypic trait that could be a target of natural selection. Several related questions were investigated: 1) Is there natural selection on the stability of RNA secondary structure, across various types of organisms? 2) Does the MFE of microbial mRNAs correlate with the function of the target protein? 3) Is there evidence of natural selection on the nucleotide composition and/or secondary structure of the prefixes and suffixes of bacterial mRNAs? 4) Is there natural selection on the secondary structures and substructures of subviral RNAs? These questions were investigated using large-scale simulations, based on the generation of sets of randomized synthetic mRNAs for particular genes. The secondary structure of each mRNA (naturally occuring and synthetic) was then computationally predicted. The experiments were performed on the complete sets of genes of a number of prokaryotes and eukaryotes. Two types of randomized experiments were performed on each genetic data set, providing an independent confirmation of the results. In the first method of randomization, synonymous mRNAs were generated for each gene, creating sequences that code for the identical protein, with a frequency of codon use characteristic of the organism. In the second method of randomization, the nucleotides of the mRNA were permuted in manner that does not preserve the mRNA sequence\u27s target protein, but exactly preserves the mRNA sequence\u27s nucleotide and dinucleotide frequencies. The MFE of each naturally occuring mRNA sequence is then compared with the MFEs of the corresponding randomized sequences. A pattern of deviation, across an entire organism, of the value of the MFE of the naturally occurring sequence from that of the corresponding randomized sequences is evidence of natural selection on the stability of the mRNA transcript. This research establishes that: In all prokaryotes studied, natural selection has favored of highly stable (lower MFE) mRNAs. In some prokaryotes, natural selection has also favored highly unstable mRNAs. No statistically significant evidence of such selection was found in eukaryotes. The distributions of MFEs of mRNAs of 25 broad functional classes of proteins (COGs - Clusters of Orthologous Groups) of five microbes and yeast correlate to functional class. mRNA prefixes have a distinctive MFE signature. The naturally occurring prefixes display more structure, on average, than randomized sequences with identical nucleotide and dinucleotide content, suggesting that natural selection favors secondary structure in the prefix of mRNA. Viroids (with RNA genomes) have highly stable secondary structures and the structures are similar among the viroids belonging to the same family. The results indicate that natural selection on the MFE of mRNA is widespread in the evolution of the genome
    corecore