91 research outputs found

    SupeRNAlign: a new tool for flexible superposition of homologous RNA structures and inference of accurate structure-based sequence alignments

    Get PDF
    RNA has been found to play an ever-increasing role in a variety of biological processes. The function of most non-coding RNA molecules depends on their structure. Comparing and classifying macromolecular 3D structures is of crucial importance for structure-based function inference and it is used in the characterization of functional motifs and in structure prediction by comparative modeling. However, compared to the numerous methods for protein structure superposition, there are few tools dedicated to the superimposing of RNA 3D structures. Here, we present SupeRNAlign (v1.3.1), a new method for flexible superposition of RNA 3D structures, and SupeRNAlign-Coffee—a workflow that combines SupeRNAlign with T-Coffee for inferring structure-based sequence alignments. The methods have been benchmarked with eight other methods for RNA structural superposition and alignment. The benchmark included 151 structures from 32 RNA families (with a total of 1734 pairwise superpositions). The accuracy of superpositions was assessed by comparing structure-based sequence alignments to the reference alignments from the Rfam database. SupeRNAlign and SupeRNAlign-Coffee achieved significantly higher scores than most of the benchmarked methods: SupeRNAlign generated the most accurate sequence alignments among the structure superposition methods, and SupeRNAlign-Coffee performed best among the sequence alignment methods

    Detecting and comparing non-coding RNAs in the high-throughput era.

    Get PDF
    In recent years there has been a growing interest in the field of non-coding RNA. This surge is a direct consequence of the discovery of a huge number of new non-coding genes and of the finding that many of these transcripts are involved in key cellular functions. In this context, accurately detecting and comparing RNA sequences has become important. Aligning nucleotide sequences is a key requisite when searching for homologous genes. Accurate alignments reveal evolutionary relationships, conserved regions and more generally any biologically relevant pattern. Comparing RNA molecules is, however, a challenging task. The nucleotide alphabet is simpler and therefore less informative than that of amino-acids. Moreover for many non-coding RNAs, evolution is likely to be mostly constrained at the structural level and not at the sequence level. This results in very poor sequence conservation impeding comparison of these molecules. These difficulties define a context where new methods are urgently needed in order to exploit experimental results to their full potential. This review focuses on the comparative genomics of non-coding RNAs in the context of new sequencing technologies and especially dealing with two extremely important and timely research aspects: the development of new methods to align RNAs and the analysis of high-throughput data

    Alignathon: A competitive assessment of whole-genome alignment methods

    Full text link
    © 2014 Earl et al. Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark data sets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole-genome alignment (WGA). Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments and then assessments were performed collectively after all the submissions were received. Three data sets were used: Two were simulated and based on primate and mammalian phylogenies, and one was comprised of 20 real fly genomes. In total, 35 submissions were assessed, submitted by 10 teams using 12 different alignment pipelines. We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable differences in the alignment quality of differently annotated regions and found that few tools aligned the duplications analyzed. We found that many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all data sets, submissions, and assessment programs for further study and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments

    Estudio de la diversidad conformacional en ARNs

    Get PDF
    El siguiente trabajo se centra en el estudio de la diversidad conformacional de ARNs. Además introduce al lector en los aspectos de la biología estructural de los ARNs, su historia y definiciones de los conceptos principales. Se desarrolla el estado del arte de las bases de datos estructurales de ARNs y se presenta el desarrollo preliminar, con posterior análisis, de una base de datos de diversidad conformacional de ARNs.Facultad de Ciencias Exacta

    Novel algorithms to analyze RNA secondary structure evolution and folding kinetics

    Get PDF
    Thesis advisor: Peter CloteRNA molecules play important roles in living organisms, such as protein translation, gene regulation, and RNA processing. It is known that RNA secondary structure is a scaffold for tertiary structure leading to extensive amount of interest in RNA secondary structure. This thesis is primarily focused on the development of novel algorithms for the analysis of RNA secondary structure evolution and folding kinetics. We describe a software RNAsampleCDS to generate mRNA sequences coding user-specified peptides overlapping in up to six open reading frames. Sampled mRNAs are then analyzed with other tools to provide an estimate of their secondary structure properties. We investigate homology of RNAs with respect to both sequence and secondary structure information as well. RNAmountAlign an efficient software package for multiple global, local, and semiglobal alignment of RNAs using a weighted combination of sequence and structural similarity with statistical support is presented. Furthermore, we approach RNA folding kinetics from a novel network perspective, presenting algorithms for the shortest path and expected degree of nodes in the network of all secondary structures of an RNA. In these algorithms we consider move set MS2 , allowing addition, removal and shift of base pairs used by several widely-used RNA secondary structure folding kinetics software that implement Gillespie’s algorithm. We describe MS2distance software to compute the shortest MS2 folding trajectory between any two given RNA secondary structures. Moreover, RNAdegree software implements the first algorithm to efficiently compute the expected degree of an RNA MS2 network of secondary structures. The source code for all the software and webservers for RNAmountAlign, MS2distance, and RNAdegree are publicly available at http://bioinformatics.bc.edu/clotelab/.Thesis (PhD) — Boston College, 2018.Submitted to: Boston College. Graduate School of Arts and Sciences.Discipline: Biology

    Bioinformatics approaches to study antibiotics resistance emergence across levels of biological organization.

    Get PDF
    The Review on Antimicrobial Resistance predicts that in thirty years infections with antibiotic-resistant microorganisms will become one of the leading causes of death. The discovery of new antibiotics has so far been too slow to ensure continuous use of antibiotics in the face of growing resistance. Therefore, efforts to curb resistance emergence gain in importance. These efforts comprise two complementary strategies. The first focuses on the mechanisms of resistance emergence, in the hope that it would enable development of pharmacological agents constraining resistance emergence. The second aims at improving antibiotic use practices, based on studies of the impact of antibiotics on resistance emergence within patient populations. Antibiotic resistance emerges in bacterial cells, negatively influences the human gut microbiome, and transfers between people. Hence, antibiotic resistance has impacts across several levels of biological organization. This thesis describes four projects, which concerned various aspects of antibiotics resistance. The first two projects deal with basic resistance emergence mechanisms, on the level of bacterial strains and bacterial consortia, whereas the other two deal with finding better practices for antibiotic use on a population level. During the first project, I analyzed changes in genomes of MRSA strains isolated from several patients throughout antibiotic therapies and developing MRSA infections. I observed changes in number and types of virulence factors responsible for interacting with the human body, which are attributed to mobile genetic elements. In the second project, I showed that, prompted by antibiotic therapy, within the human gut microbiome resistance transfers from bacterial genomes onto plasmids, prophages, and free phages. Hence, resistance emergence depends not only on the antibiotic therapy but also on the state of the gut microbiome, which again results from the patients’ overall health and previous antibiotic therapies. The third project, SATURN, employed machine learning methods for a large set of data regarding patients’ demographics, comorbidities, antibiotic therapies, surgeries, and colonization with multi-drug resistant bacteria. The final classifiers were made available on the AskSaturn website where the doctors can compare antibiotic therapies based on the probability of colonization with multi-drug resistant bacteria. The fourth project, Tübiom, focused on the antibiotic-influenced gut microbiomes of the healthy population. The first two projects rely on genome and metagenome sequencing data. For them, I designed specialized bioinformatics analysis pipelines. The latter two projects use mixed data, which were analyzed with machine learning algorithms. These projects also involved web development and data visualization. Although each of the projects requires different data and methods, each of them provides a crucial part in a pipeline aiming at utilizing gut microbiome information in medical practice to constrain resistance emergence

    Molecular insights to crustacean phylogeny

    Get PDF
    This thesis aims to resolve internal relationships of the major crustacean groups inferring phylogenies with molecular data. New molecular and neuroanatomical data support the scenario that the Hexapoda might have evolved from Crustacea. Most molecular studies of crustaceans relied on single gene or multigene analyses in which for most cases partly sequenced rRNA genes were used. However, intensive data quality and alignment assessments prior to phylogenetic reconstructions are not conducted in most studies. One methodological aim in this thesis was to implement new tools to infer data quality, to improve alignment quality and to test the impact of complex modeling of the data. Two of the three phylogenetic analyses in this thesis are also based on rRNA genes. In analysis (A) 16S rRNA, 18S rRNA and COI sequences were analyzed. RY coding of the COI fragment, an alignment procedure that considers the secondary structure of RNA molecules and the exclusion of alignment positions of ambiguous positional homology was performed to improve data quality. Anyhow, by extensive network reconstructions it was shown that the signal quality in the chosen and commonly used markers is not suitable to infer crustacean phylogeny, despite the extensive data processing and optimization. This result draws a new light on previous studies relying on these markers. In analyses (B) completely sequenced 18S and 28S rRNA genes were used to reconstruct the phylogeny. Base compositional heterogeneity was taken into account based on the finding of analysis (A), additionally to secondary structure alignment optimization and alignment assessment. The complex modeling to compare time-heterogeneous versus time-homogenous processes in combination with mixed models for an implementation of secondary structures was only possible applying the Bayesian software package PHASE. The results clearly demonstrated that complex modeling counts and that ignoring time-heterogeneous processes can mislead phylogenetic reconstructions. Some results enlight the phylogeny of Crustaceans, for the first time the Cephalocarida (Hutchinsoniella macracantha) were placed in a clade with the Branchiopoda, which morphologically is plausible. Compared to the time-homogeneous tree the time-heterogeneous tree gives lower support values for some nodes. It can be suggested, that the incorporation of base compositional heterogeneity in phylogenetic analysis improves the reliability of the topology. The Pancrustacea are supported maximally in both approaches, but internal relations are not reliably reconstructed. One result of this analysis is that the phylogenetic signal in rRNA data might be eroded for crustaceans. Recent publications presented analyses based on phylogenomic data, to reconstruct mainly metazoan phylogeny. The supermatrix method seems to outperform the supertree approach. In this analysis the supermatrix approach was applied. Crustaceans were collected to conduct EST sequencing projects and to include the resulting sequences combined with public sequence data into a phylogenomic analysis (C). New and innovative reduction heuristics were performed to condense the dataset. The results showed that the matrix implementation of the reduced dataset ends in a more reliable topology in which most node values are highly supported. In analysis (C) the Branchiopoda were positioned as sister-group to Hexapoda, a differing result to analysis (A) and (B), but that is in line with other phylogenomic studies
    • …
    corecore