7 research outputs found

    VDA, a Method of Choosing a Better Algorithm with Fewer Validations

    Get PDF
    The multitude of bioinformatics algorithms designed for performing a particular computational task presents end-users with the problem of selecting the most appropriate computational tool for analyzing their biological data. The choice of the best available method is often based on expensive experimental validation of the results. We propose an approach to design validation sets for method comparison and performance assessment that are effective in terms of cost and discrimination power

    RNA SECONDARY STRUCTURE PREDICTION TOOL

    Get PDF
    Ribonucleic Acid (RNA) is one of the major macromolecules essential to all forms of life. Apart from the important role played in protein synthesis, it performs several important functions such as gene regulation, catalyst of biochemical reactions and modification of other RNAs. In some viruses, instead of DNA, RNA serves as the carrier of genetic information. RNA is an interesting subject of research in the scientific community. It has lead to important biological discoveries. One of the major problems researchers are trying to solve is the RNA structure prediction problem. It has been found that the structure of RNA is evolutionary conserved and it can help to determine the functions served by them. In this project, I will be developing a tool to predict the secondary structure of RNA using simulated annealing. The aim of this project is to understand in detail the simulated annealing algorithm and implement it to find solutions to RNA secondary structure. The results will be compared with the very famous tool Mfold, developed by Michael Zuker, using the minimum free energy approach

    Prediction of secondary structures for large RNA molecules

    Get PDF
    The prediction of correct secondary structures of large RNAs is one of the unsolved challenges of computational molecular biology. Among the major obstacles is the fact that accurate calculations scale as O(n⁎), so the computational requirements become prohibitive as the length increases. We present a new parallel multicore and scalable program called GTfold, which is one to two orders of magnitude faster than the de facto standard programs mfold and RNAfold for folding large RNA viral sequences and achieves comparable accuracy of prediction. We analyze the algorithm's concurrency and describe the parallelism for a shared memory environment such as a symmetric multiprocessor or multicore chip. We are seeing a paradigm shift to multicore chips and parallelism must be explicitly addressed to continue gaining performance with each new generation of systems. We provide a rigorous proof of correctness of an optimized algorithm for internal loop calculations called internal loop speedup algorithm (ILSA), which reduces the time complexity of internal loop computations from O(n⁎) to O(n³) and show that the exact algorithms such as ILSA are executed with our method in affordable amount of time. The proof gives insight into solving these kinds of combinatorial problems. We have documented detailed pseudocode of the algorithm for predicting minimum free energy secondary structures which provides a base to implement future algorithmic improvements and improved thermodynamic model in GTfold. GTfold is written in C/C++ and freely available as open source from our website.M.S.Committee Chair: Bader, David; Committee Co-Chair: Heitsch, Christine; Committee Member: Harvey, Stephen; Committee Member: Vuduc, Richar

    Using SetPSO to determine RNA secondary structure

    Get PDF
    RNA secondary structure prediction is an important field in Bioinformatics. A number of different approaches have been developed to simplify the determination of RNA molecule structures. RNA is a nucleic acid found in living organisms which fulfils a number of important roles in living cells. Knowledge of its structure is crucial in the understanding of its function. Determining RNA secondary structure computationally, rather than by physical means, has the advantage of being a quicker and cheaper method. This dissertation introduces a new Set-based Particle Swarm Optimisation algorithm, known as SetPSO for short, to optimise the structure of an RNA molecule, using an advanced thermodynamic model. Structure prediction is modelled as an energy minimisation problem. Particle swarm optimisation is a simple but effective stochastic optimisation technique developed by Kennedy and Eberhart. This simple technique was adapted to work with variable length particles which consist of a set of elements rather than a vector of real numbers. The effectiveness of this structure prediction approach was compared to that of a dynamic programming algorithm called mfold. It was found that SetPSO can be used as a combinatorial optimisation technique which can be applied to the problem of RNA secondary structure prediction. This research also included an investigation into the behaviour of the new SetPSO optimisation algorithm. Further study needs to be conducted to evaluate the performance of SetPSO on different combinatorial and set-based optimisation problems.Dissertation (MS)--University of Pretoria, 2009.Computer Scienceunrestricte

    Conception et analyse des biopuces à ADN en environnements parallÚles et distribués

    Get PDF
    Microorganisms represent the largest diversity of the living beings. They play a crucial rĂŽle in all biological processes related to their huge metabolic potentialities and their capacity for adaptation to different ecological niches. The development of new genomic approaches allows a better knowledge of the microbial communities involved in complex environments functioning. In this context, DNA microarrays represent high-throughput tools able to study the presence, or the expression levels of several thousands of genes, combining qualitative and quantitative aspects in only one experiment. However, the design and analysis of DNA microarrays, with their current high density formats as well as the huge amount of data to process, are complex but crucial steps. To improve the quality and performance of these two steps, we have proposed new bioinformatics approaches for the design and analysis of DNA microarrays in parallel and distributed environments. These multipurpose approaches use high performance computing (HPC) and new software engineering approaches, especially model driven engineering (MDE), to overcome the current limitations. We have first developed PhylGrid 2.0, a new distributed approach for the selection of explorative probes for phylogenetic DNA microarrays at large scale using computing grids. This software was used to build PhylOPDb: a comprehensive 16S rRNA oligonucleotide probe database for prokaryotic identification. MetaExploArrays, which is a parallel software of oligonucleotide probe selection on different computing architectures (a PC, a multiprocessor, a cluster or a computing grid) using meta-programming and a model driven engineering approach, has been developed to improve flexibility in accordance to user’s informatics resources. Then, PhylInterpret, a new software for the analysis of hybridization results of DNA microarrays. PhylInterpret uses the concepts of propositional logic to determine the prokaryotic composition of metagenomic samples. Finally, a new parallelization method based on model driven engineering (MDE) has been proposed to compute a complete backtranslation of short peptides to select probes for functional microarrays.Les microorganismes constituent la plus grande diversitĂ© du monde vivant. Ils jouent un rĂŽle clef dans tous les processus biologiques grĂące Ă  leurs capacitĂ©s d’adaptation et Ă  la diversitĂ© de leurs capacitĂ©s mĂ©taboliques. Le dĂ©veloppement de nouvelles approches de gĂ©nomique permet de mieux explorer les populations microbiennes. Dans ce contexte, les biopuces Ă  ADN reprĂ©sentent un outil Ă  haut dĂ©bit de choix pour l'Ă©tude de plusieurs milliers d’espĂšces en une seule expĂ©rience. Cependant, la conception et l’analyse des biopuces Ă  ADN, avec leurs formats de haute densitĂ© actuels ainsi que l’immense quantitĂ© de donnĂ©es Ă  traiter, reprĂ©sentent des Ă©tapes complexes mais cruciales. Pour amĂ©liorer la qualitĂ© et la performance de ces deux Ă©tapes, nous avons proposĂ© de nouvelles approches bioinformatiques pour la conception et l’analyse des biopuces Ă  ADN en environnements parallĂšles. Ces approches gĂ©nĂ©ralistes et polyvalentes utilisent le calcul haute performance (HPC) et les nouvelles approches du gĂ©nie logiciel inspirĂ©es de la modĂ©lisation, notamment l’ingĂ©nierie dirigĂ©e par les modĂšles (IDM) pour contourner les limites actuelles. Nous avons dĂ©veloppĂ© PhylGrid 2.0, une nouvelle approche distribuĂ©e sur grilles de calcul pour la sĂ©lection de sondes exploratoires pour biopuces phylogĂ©nĂ©tiques. Ce logiciel a alors Ă©tĂ© utilisĂ© pour construire PhylOPDb: une base de donnĂ©es complĂšte de sondes oligonuclĂ©otidiques pour l’étude des communautĂ©s procaryotiques. MetaExploArrays qui est un logiciel parallĂšle pour la dĂ©termination de sondes sur diffĂ©rentes architectures de calcul (un PC, un multiprocesseur, un cluster ou une grille de calcul), en utilisant une approche de mĂ©ta-programmation et d’ingĂ©nierie dirigĂ©e par les modĂšles a alors Ă©tĂ© conçu pour apporter une flexibilitĂ© aux utilisateurs en fonction de leurs ressources matĂ©riel. PhylInterpret, quant Ă  lui est un nouveau logiciel pour faciliter l’analyse des rĂ©sultats d’hybridation des biopuces Ă  ADN. PhylInterpret utilise les notions de la logique propositionnelle pour dĂ©terminer la composition en procaryotes d’échantillons mĂ©tagĂ©nomiques. Enfin, une dĂ©marche d’ingĂ©nierie dirigĂ©e par les modĂšles pour la parallĂ©lisation de la traduction inverse d’oligopeptides pour le design des biopuces Ă  ADN fonctionnelles a Ă©galement Ă©tĂ© mise en place
    corecore