5 research outputs found

    RNA folding on the 3D triangular lattice

    Get PDF

    Design and implementation of a cyberinfrastructure for RNA motif search, prediction and analysis

    Get PDF
    RNA secondary and tertiary structure motifs play important roles in cells. However, very few web servers are available for RNA motif search and prediction. In this dissertation, a cyberinfrastructure, named RNAcyber, capable of performing RNA motif search and prediction, is proposed, designed and implemented. The first component of RNAcyber is a web-based search engine, named RmotifDB. This web-based tool integrates an RNA secondary structure comparison algorithm with the secondary structure motifs stored in the Rfam database. With a user-friendly interface, RmotifDB provides the ability to search for ncRNA structure motifs in both structural and sequential ways. The second component of RNAcyber is an enhanced version of RmotifDB. This enhanced version combines data from multiple sources, incorporates a variety of well-established structure-based search methods, and is integrated with the Gene Ontology. To display RmotifDB’s search results, a software tool, called RSview, is developed. RSview is able to display the search results in a graphical manner. Finally, RNAcyber contains a web-based tool called Junction-Explorer, which employs a data mining method for predicting tertiary motifs in RNA junctions. Specifically, the tool is trained on solved RNA tertiary structures obtained from the Protein Data Bank, and is able to predict the configuration of coaxial helical stacks and families (topologies) in RNA junctions at the secondary structure level. Junction-Explorer employs several algorithms for motif prediction, including a random forest classification algorithm, a pseudoknot removal algorithm, and a feature ranking algorithm based on the gini impurity measure. A series of experiments including 10-fold cross- validation has been conducted to evaluate the performance of the Junction-Explorer tool. Experimental results demonstrate the effectiveness of the proposed algorithms and the superiority of the tool over existing methods. The RNAcyber infrastructure is fully operational, with all of its components accessible on the Internet

    Visual Analysis of Form and Function in Computational Biology

    Get PDF
    In the last years, the amount of available data in the field of computational biology steadily increased. In order to be able to analyze these data, various algorithms have been developed by bioinformaticians to process them efficiently. Moreover, computational models were developed to predict for instance biological relationships of species. Furthermore, the prediction of properties like the structure of certain biological molecules is modeled by complex algorithms. Despite these advances in handling such complicated tasks with automated workflows and a huge variety of freely available tools, the expert still needs to supervise the data analysis pipeline inspecting the quality of both the input data and the results. Additionally, choosing appropriate parameters of a model is quite involved. Visual support puts the expert into the data analysis loop by providing visual encodings of the data and the analysis results together with interaction facilities. In order to meet the requirements of the experts, the visualizations usually have to be adapted for the application purpose or completely new representations have to be developed. Furthermore, it is necessary to combine these visualizations with the algorithms of the experts to prepare the data. These in-situ visualizations are needed due to the amount of data handled within the analysis pipeline in this domain. In this thesis, algorithms and visualizations are presented that were developed in two different research areas of computational biology. On the one hand, the multi-replicate peak-caller Sierra Platinum was developed, which is capable of predicting significant regions of histone modifications occurring in genomes based on experimentally generated input data. This algorithm can use several input data sets simultaneously to calculate statistically meaningful results. Multiple quality measurements and visualizations were integrated into to the data analysis pipeline to support the analyst. Based on these in-situ visualizations, the analyst can modify the parameters of the algorithm to obtain the best results for a given input data set. Furthermore, Sierra Platinum and related algorithms were benchmarked against an artificial data set to evaluate the performance under specific conditions of the input data set, e.g., low read quality or undersequenced data. It turned out that Sierra Platinum achieved the best results in every test scenario. Additionally, the performance of Sierra Platinum was evaluated with experimental data confirming existing knowledge. It should be noticed that the results of the other algorithms seemed to contradict this knowledge. On the other hand, this thesis describes two new visualizations for RNA secondary structures. First, the interactive dot plot viewer iDotter is described that is able to visualize RNA secondary structure predictions as a web service. Several interaction techniques were implemented that support the analyst exploring RNA secondary structure dot plots. iDotter provides an API to share or archive annotated dot plots. Additionally, the API enables the embedding of iDotter in existing data analysis pipelines. Second, the algorithm RNApuzzler is presented that generates (outer-)planar graph drawings for all RNA secondary structure predictions. Previously presented algorithms failed in always producing crossing-free graphs. First, several drawing constraints were derived from the literature. Based on these, the algorithm RNAturtle was developed that did not always produced planar drawings. Therefore, some drawing constraints were relaxed and additional drawing constraints were established. Building on these modified constraints, RNApuzzler was developed. It takes the drawing generated by RNAturtle as an input and resolves the possible intersections of the graph. Due to the resolving mechanism, modified loops can become very large during the intersection resolving step. Therefore, an optimization was developed. During a post-processing step the radii of the heavily modified loops are reduced to a minimum. Based on the constraints and the intersection resolving mechanism, it can be shown that RNApuzzler is able to produce planar drawings for any RNA secondary structure. Finally, the results of RNApuzzler are compared to other algorithms

    Exploration des structures secondaires de l’ARN

    Get PDF
    À l’ère du numérique, valoriser les données en leur donnant un sens est un enjeu capital pour supporter la prise de décision stratégique et cela dans divers domaines, notamment dans le domaine du marketing numérique ou de la santé, ou encore, dans notre contexte, pour une meilleure compréhension de la biologie des structures des acides nucléiques. L’un des défis majeurs de la biologie structurale concerne l’étude des structures des acides ribonucléiques (ARN), les effets de ces structures et de leurs altérations sur leurs fonctions. Contribuer à cet enjeu important est l’objectif de cette thèse. Celle-ci s’inscrit principalement dans le développement de méthodes et d’outils pour l’exploration efficace des structures secondaires d’ARN. En effet, explorer les structures secondaires d’ARN contribue à lever le voile sur leur fonction et permet de mieux cerner leur implication spécifique au sein des processus cellulaires. Dans ce contexte nous avons développé le modèle des super-n-motifs qui contribue à une meilleure représentation de la complexité structurale des ARN et offre un moyen efficace d’évaluer la similarité des structures d’ARN en tenant compte de cette complexité. Le modèle des super-n-motifs facilite l’étude des ARN dont le rôle est inconnu. Il permet de poser des hypothèses sur la ou les fonctions des ARN lorsque ceux-ci partagent une similarité structurale sans équivoque. Nous avons aussi développé la plateforme structurexplor pour faciliter l’exploration des structures secondaires, c’est-à-dire de permettre, en quelques clics, de caractériser les populations de structures d’ARN en, par exemple, faisant ressortir les groupes d’ARN partageant des structures similaires. La mise en œuvre du modèle des super-n-motifs et de la plateforme structurexplor a contribué à une meilleure compréhension de la phylogénie structurale des viroïdes qui sont des agents pathogènes à ARN attaquant les plantes, phylogénie jusqu’alors basée que sur leurs séquences
    corecore