16 research outputs found

    Prediction of secondary structures for large RNA molecules

    Get PDF
    The prediction of correct secondary structures of large RNAs is one of the unsolved challenges of computational molecular biology. Among the major obstacles is the fact that accurate calculations scale as O(n⁴), so the computational requirements become prohibitive as the length increases. We present a new parallel multicore and scalable program called GTfold, which is one to two orders of magnitude faster than the de facto standard programs mfold and RNAfold for folding large RNA viral sequences and achieves comparable accuracy of prediction. We analyze the algorithm's concurrency and describe the parallelism for a shared memory environment such as a symmetric multiprocessor or multicore chip. We are seeing a paradigm shift to multicore chips and parallelism must be explicitly addressed to continue gaining performance with each new generation of systems. We provide a rigorous proof of correctness of an optimized algorithm for internal loop calculations called internal loop speedup algorithm (ILSA), which reduces the time complexity of internal loop computations from O(n⁴) to O(n³) and show that the exact algorithms such as ILSA are executed with our method in affordable amount of time. The proof gives insight into solving these kinds of combinatorial problems. We have documented detailed pseudocode of the algorithm for predicting minimum free energy secondary structures which provides a base to implement future algorithmic improvements and improved thermodynamic model in GTfold. GTfold is written in C/C++ and freely available as open source from our website.M.S.Committee Chair: Bader, David; Committee Co-Chair: Heitsch, Christine; Committee Member: Harvey, Stephen; Committee Member: Vuduc, Richar

    RNA SECONDARY STRUCTURE PREDICTION TOOL

    Get PDF
    Ribonucleic Acid (RNA) is one of the major macromolecules essential to all forms of life. Apart from the important role played in protein synthesis, it performs several important functions such as gene regulation, catalyst of biochemical reactions and modification of other RNAs. In some viruses, instead of DNA, RNA serves as the carrier of genetic information. RNA is an interesting subject of research in the scientific community. It has lead to important biological discoveries. One of the major problems researchers are trying to solve is the RNA structure prediction problem. It has been found that the structure of RNA is evolutionary conserved and it can help to determine the functions served by them. In this project, I will be developing a tool to predict the secondary structure of RNA using simulated annealing. The aim of this project is to understand in detail the simulated annealing algorithm and implement it to find solutions to RNA secondary structure. The results will be compared with the very famous tool Mfold, developed by Michael Zuker, using the minimum free energy approach

    Comparison Of HSRNAFold and RNAFold Algorithms for RNA Secondary Structure Prediction.

    Get PDF
    Ribonucleic Acid (RNA) has important structural and functional roles in the cell and plays roles in many stages of protein synthesis. The structure of RNA largely determines its function

    Using SetPSO to determine RNA secondary structure

    Get PDF
    RNA secondary structure prediction is an important field in Bioinformatics. A number of different approaches have been developed to simplify the determination of RNA molecule structures. RNA is a nucleic acid found in living organisms which fulfils a number of important roles in living cells. Knowledge of its structure is crucial in the understanding of its function. Determining RNA secondary structure computationally, rather than by physical means, has the advantage of being a quicker and cheaper method. This dissertation introduces a new Set-based Particle Swarm Optimisation algorithm, known as SetPSO for short, to optimise the structure of an RNA molecule, using an advanced thermodynamic model. Structure prediction is modelled as an energy minimisation problem. Particle swarm optimisation is a simple but effective stochastic optimisation technique developed by Kennedy and Eberhart. This simple technique was adapted to work with variable length particles which consist of a set of elements rather than a vector of real numbers. The effectiveness of this structure prediction approach was compared to that of a dynamic programming algorithm called mfold. It was found that SetPSO can be used as a combinatorial optimisation technique which can be applied to the problem of RNA secondary structure prediction. This research also included an investigation into the behaviour of the new SetPSO optimisation algorithm. Further study needs to be conducted to evaluate the performance of SetPSO on different combinatorial and set-based optimisation problems.Dissertation (MS)--University of Pretoria, 2009.Computer Scienceunrestricte

    Adaptive And Cooperative Harmony Search Models For Rna Secondary Structure Prediction

    Get PDF
    Penentuan fungsi molekul RNA amat bergantung kepada struktur sekunderya. Kaedah fizikal yang sedia ada untuk penentuan struktur sekunder adalah mahal dan memakan masa. Determining the function of RNA molecules relies heavily on its secondary structure

    Adaptive and cooperative harmony search models for RNA secondary structure prediction

    Get PDF
    Penentuan fungsi molekul RNA amat bergantung kepada struktur sekundernya. Kaedah fizikal yang sedia ada untuk penentuan struktur sekunder adalah mahal dan memakan masa. Beberapa algoritma telah dicadangkan untuk peramalan struktur sekunder RNA, termasuk pengaturcaraan dinamik dan algoritma metaheuristik. Determining the function of RNA molecules relies heavily on its secondary structure. The current physical methods for secondary structure determination are expensive and time consuming. Several algorithms have been proposed for the RNA secondary structure prediction, including dynamic programming and metaheuristic algorithms

    Covariance Searches for ncRNA Gene Finding

    Get PDF
    The use of covariance models for non-coding RNA gene finding is extremely powerful and also extremely computationally demanding. A major reason for the high computational burden of this algorithm is that the search proceeds through every possible start position in the database and every possible sequence length between zero and a user-defined maximum length at every one of these start positions. Furthermore, for every start position and sequence length, all possible combinations of insertions and deletions leading to the given sequence length are searched. It has been previously shown that a large portion of this search space is nowhere near any database match observed in practice and that the search space can be limited significantly with little change in expected search results. In this work a different approach is taken in which the space of starting positions, sequence lengths, and insertion/deletion patterns is searched using a genetic algorithm

    Covariance Searches for ncRNA Gene Finding

    Full text link

    Improved Covariance Model Parameter Estimation Using RNA Thermodynamic Properties

    Get PDF
    Covariance models are a powerful description of non-coding RNA (ncRNA) families that can be used to search nucleotide databases for new members of these ncRNA families. Currently, estimation of the parameters of a covariance model (state transition and emission scores) is based only on the observed frequencies of mutations, insertions, and deletions in known ncRNA sequences. For families with very few known members, this can result in rather uninformative models where the consensus sequence has a good score and most deviations from consensus have a fairly uniform poor score. It is proposed here to combine the traditional observed-frequency information with known information about free energy changes in RNA helix formation and loop length changes. More thermodynamically probable deviations from the consensus sequence will then be favored in database search. The thermodynamic information may be incorporated into the models as informative priors that depend on neighboring consensus nucleotides and on loop lengths
    corecore