141 research outputs found

    SupeRNAlign: a new tool for flexible superposition of homologous RNA structures and inference of accurate structure-based sequence alignments

    Get PDF
    RNA has been found to play an ever-increasing role in a variety of biological processes. The function of most non-coding RNA molecules depends on their structure. Comparing and classifying macromolecular 3D structures is of crucial importance for structure-based function inference and it is used in the characterization of functional motifs and in structure prediction by comparative modeling. However, compared to the numerous methods for protein structure superposition, there are few tools dedicated to the superimposing of RNA 3D structures. Here, we present SupeRNAlign (v1.3.1), a new method for flexible superposition of RNA 3D structures, and SupeRNAlign-Coffee—a workflow that combines SupeRNAlign with T-Coffee for inferring structure-based sequence alignments. The methods have been benchmarked with eight other methods for RNA structural superposition and alignment. The benchmark included 151 structures from 32 RNA families (with a total of 1734 pairwise superpositions). The accuracy of superpositions was assessed by comparing structure-based sequence alignments to the reference alignments from the Rfam database. SupeRNAlign and SupeRNAlign-Coffee achieved significantly higher scores than most of the benchmarked methods: SupeRNAlign generated the most accurate sequence alignments among the structure superposition methods, and SupeRNAlign-Coffee performed best among the sequence alignment methods

    Detecting and comparing non-coding RNAs in the high-throughput era.

    Get PDF
    In recent years there has been a growing interest in the field of non-coding RNA. This surge is a direct consequence of the discovery of a huge number of new non-coding genes and of the finding that many of these transcripts are involved in key cellular functions. In this context, accurately detecting and comparing RNA sequences has become important. Aligning nucleotide sequences is a key requisite when searching for homologous genes. Accurate alignments reveal evolutionary relationships, conserved regions and more generally any biologically relevant pattern. Comparing RNA molecules is, however, a challenging task. The nucleotide alphabet is simpler and therefore less informative than that of amino-acids. Moreover for many non-coding RNAs, evolution is likely to be mostly constrained at the structural level and not at the sequence level. This results in very poor sequence conservation impeding comparison of these molecules. These difficulties define a context where new methods are urgently needed in order to exploit experimental results to their full potential. This review focuses on the comparative genomics of non-coding RNAs in the context of new sequencing technologies and especially dealing with two extremely important and timely research aspects: the development of new methods to align RNAs and the analysis of high-throughput data

    Alignathon: A competitive assessment of whole-genome alignment methods

    Full text link
    © 2014 Earl et al. Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark data sets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole-genome alignment (WGA). Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments and then assessments were performed collectively after all the submissions were received. Three data sets were used: Two were simulated and based on primate and mammalian phylogenies, and one was comprised of 20 real fly genomes. In total, 35 submissions were assessed, submitted by 10 teams using 12 different alignment pipelines. We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable differences in the alignment quality of differently annotated regions and found that few tools aligned the duplications analyzed. We found that many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all data sets, submissions, and assessment programs for further study and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments

    Estudio de la diversidad conformacional en ARNs

    Get PDF
    El siguiente trabajo se centra en el estudio de la diversidad conformacional de ARNs. Además introduce al lector en los aspectos de la biología estructural de los ARNs, su historia y definiciones de los conceptos principales. Se desarrolla el estado del arte de las bases de datos estructurales de ARNs y se presenta el desarrollo preliminar, con posterior análisis, de una base de datos de diversidad conformacional de ARNs.Facultad de Ciencias Exacta

    Bioinformatic strategies to explore iodine transport in plants and its potential application in biofortification

    Get PDF
    Dissertação de mestrado em BioinformaticsO objetivo deste trabalho é contribuir para o conhecimento do transporte de iodo em plantas através do uso de recursos computacionais e ferramentas bioinformáticas. A importância do estudo da utilização e regulação do iodo em espécies de plantas não se esgota em si mesma, visto que poderá ser essencial na construção de uma alternativa ao sal iodado para consumo humano. Esta alternativa poderá ser a biofortificação de plantas em iodo. O iodo é essencial para a saúde humana e a sua deficiência tem um enorme impacto no desenvolvimento infantil e no bem-estar das populações. O transporte de iodo ao nível celular, é uma etapa fundamental no metabolismo deste composto. No entanto, até à data não são conhecidos transportadores de iodo em plantas. Com este propósito, o primeiro passo deste trabalho foi a identificação de proteínas transportadoras de iodo já estudadas em humanos, de forma a selecionar candidatos para a reconstrução de árvores filogenéticas. Quatro proteínas foram selecionadas: Sodium/iodide co-transporter (NIS), Anoctamin-1 (ANO1), Pendrin (PDS), e Cystic fibrosis transmembrane conductance regulator (CFTR). As reconstruções filogenéticas foram obtidas considerando não apenas as plantas, mas também espécies representativas em animais, fungos e algas. Foram detetadas sequências de putativos ortólogos em plantas para cada uma destas reconstruções: transportadores de ureia DUR3 (NIS), transportadores de sulfato SULTR (Pendrin), anoctamin-like proteins (ANO1), and ABC-C subfamily plant protein sequences (CFTR). O organismo modelo Arabidopsis thaliana foi selecionado para as análises subsequentes. Foram obtidos e analisados dados de transcritómica, contendo informação acerca da expressão de genes sob tratamentos com iodo em Arabidopsis thaliana. Seguidamente, foram testadas as diferenças estatisticamente significativas entre fatores, tais como os tecidos e tratamentos, concluindo que tais diferenças existem entre os níveis de expressão em raízes e folhas. Deste modo, foram efetuadas duas análises, separando folhas de raízes no cálculo da expressão relativa dos genes selecionados. Foram detetadas diferenças nos padrões de expressão de genes que codificam proteínas transportadoras entre os diferentes tecidos da planta, com destaque para o gene AtDUR3 nas folhas e para os genes AtMRP7, AtMRP8, AtDUR3, AtSULTR1;3 nas raízes. Para as análises de docking foram selecionadas as seguintes proteínas: AtDUR3, AtTMEM16, AtSULTR1;3, AtSULTR4;2 e AtMRP7. Para tal foram utilizadas previsões de estrutura em 3D, de modo a testar a probabilidade da interação das mesmas com iodeto e iodato. A proteína AtTMEM16 foi a única a apresentar possíveis locais de ligação com iodeto enquanto que todas as restantes proteínas apresentaram possíveis interações com o iodato. No seu conjunto, estes resultados fornecem uma base de proteínas candidatas a estudos posteriores no âmbito do transporte de iodo em plantas, seja num contexto de biologia funcional ou em estratégias de biofortificação de iodo em plantas.The purpose of this work is to contribute to the knowledge of iodine transport in plants through the use of computational resources and bioinformatics tools. The importance of understanding iodide usage and regulation in plant species lies beyond the scope of its own ends since it could play a vital role in the construction of an alternative to iodized salt for human consumption. This alternative could be iodine biofortified crops. Iodine is essential for human health, and its deficiency has a great impact on child development and wellbeing of populations. At the cellular level, iodine transport is a fundamental step in the metabolism of this compound. However, up until this point, no iodine transporters are known in plants. With this purpose, the first step was identifying Human iodine transporter proteins and selecting candidates to use as queries for phylogenetic reconstructions. Four proteins were selected: the Sodium/iodide co-transporter (NIS), Anoctamin-1 (ANO1), Pendrin (PDS), and the Cystic fibrosis transmembrane conductance regulator (CFTR). Phylogenetic reconstructions were obtained considering not only plants, but also animal, fungi, and algae representative species. Putative ortholog sequences were detected in plants for all these reconstructions: urea transporters DUR3 (NIS), sulphate transporters SULTR (Pendrin), anoctamin-like proteins (ANO1), ABC-C subfamily proteins (CFTR). The plant model species Arabidopsis thaliana was selected to proceed with subsequent analysis, collecting all the detected sequences for a transcriptomic analysis. Transcriptomic data of gene expression under iodide treatments in Arabidopsis thaliana were obtained and analysed. By testing the statistical differences between factors (i.e. different tissues and treatments) significant differences between expression levels in roots and leaves were found. Hence, two separate analyses were conducted considering the relative expression of genes in leaves, and roots of A. thaliana. Different expression patterns were detected between tissues and genes, with highlight for AtDUR3 in the leaves, and AtMRP7, AtMRP8, AtDUR3, and AtSULTR1;3 in the roots. AtDUR3, AtTMEM16, AtSULTR1;3, AtSULTR4;2, and AtMRP7 were selected for docking analysis, by using their predicted 3D structures, and testing for the likelihood of these interacting with iodide ions and iodate. Whilst iodide ion is only predicted to bind with AtTMEM16, iodate is likely to interact with all the other proteins tested. Overall, these analyses resulted in a set of candidate proteins to be considered in further studies of iodide transport in plants for functional biology research or iodine biofortification strategies

    Annotation and comparative analysis of fungal genomes: a hitchhiker's guide to genomics

    Get PDF
    This thesis describes several genome-sequencing projects such as those from the fungi Laccaria bicolor S238N-H82, Glomus intraradices DAOM 197198, Melampsora laricis-populina 98AG31, Puccinia graminis, Pichia pastoris GS115 and Candida bombicola, as well as the one of the haptophyte Emiliania huxleyi CCMP1516. These species are important organisms in many aspects, for instance: L. bicolor and G. intraradices are symbiotic fungi growing associate with trees and present an important ecological niches for promoting tree growth; M. laricis- populina and P. graminis are two devastating fungi threating plants; the tiny yeast P. pastoris is the major protein production platform in the pharmaceutical industry; the biosurfactant production yeast C. bombicola is likely to provide a low ecotoxicity detergent and E. huxleyi places in a unique phylogeny position of chromalveolate and contributes to the global carbon cycle system. The completion of the genome sequence and the subsequent functional studies broaden our understanding of these complex biological systems and promote the species as possible model organisms. However, it is commonly observed that the genome sequencing projects are launched with lots of enthusiasm but often frustratingly difficult to finish. Part of the reason are the ever-increasing expectations regarding quality delivery (both with respect to data and analyses). The Introductory Chapter aims to provide an overview of how best to conduct a genome sequencing project. It explains the importance of understanding the basic biology and genetics of the target organism. It also discusses the latest developments in new (next) generation high throughput sequencing (HTS) technologies, how to handle the data and their applications. The emergence of the new HTS technologies brings the whole biology research into a new frontier. For instance, with the help of the new sequencing technologies, we were able to sequence the genome of our interest, namely Pichia pastoris. This tiny yeast, the analysis of which forms the bulk of this thesis, is an important heterologous production platform because its methanol assimilation properties makes it ideally suitable for large scale industrial production. The unique protein assembly pathway of P. pastoris also attracts much basic research interests. We used the new HTS method to sequence and assemble the GS115 genome into four chromosomes and made it publicly available to the research community (Chapter 2 and Chapter 3). The public release of the GS115 brought broader interests on the comparison of GS115 and its parental strains. By sequencing the parental strain of GS115 with different new sequencing platforms, we identified several point mutations in the coding genes that likely contribute to the higher protein translocation efficiency in GS115. The sequence divergence and copy number variation of rDNA between strains also explains the difference of protein production efficiency (Chapter 4). Before 2008, the Sanger sequencing method was the only technology to obtain high quality complete genomes of eukaryotes. Because of the high cost of the Sanger method, regarding the other genome projects discussed in this thesis, it was necessary to team up with many other partners and to rely on the U.S. Department of Energy Joint Genome Institute (DOE-JGI) and the Broad Institute to generate the genome sequence. The M. larici-populina srain 98AG31 and the Puccinia graminis f. sp. tritici strain CRL 75-36-700-3 are two devastating basidiomycete ‘rusts’ that infect poplar and wheat. Lineage-specific gene family expansions in these two rusts highlight the possible role in their obligate biotrophic life-style. Two large sets of effector-like small-secreted proteins with different pri- mary sequence structures were identified in each organism. The in planta-induced transcriptomic data showed upregulation of these lineage-specific genes and they are likely involved in the establishing of the rust-host interaction. An additional immunolocalization study on M. larici-populina confirmed the accumulation of some candidate effectors in the haustoria and infection hyphae, which is described in Chapter 5

    Bioinformatics approaches to study antibiotics resistance emergence across levels of biological organization.

    Get PDF
    The Review on Antimicrobial Resistance predicts that in thirty years infections with antibiotic-resistant microorganisms will become one of the leading causes of death. The discovery of new antibiotics has so far been too slow to ensure continuous use of antibiotics in the face of growing resistance. Therefore, efforts to curb resistance emergence gain in importance. These efforts comprise two complementary strategies. The first focuses on the mechanisms of resistance emergence, in the hope that it would enable development of pharmacological agents constraining resistance emergence. The second aims at improving antibiotic use practices, based on studies of the impact of antibiotics on resistance emergence within patient populations. Antibiotic resistance emerges in bacterial cells, negatively influences the human gut microbiome, and transfers between people. Hence, antibiotic resistance has impacts across several levels of biological organization. This thesis describes four projects, which concerned various aspects of antibiotics resistance. The first two projects deal with basic resistance emergence mechanisms, on the level of bacterial strains and bacterial consortia, whereas the other two deal with finding better practices for antibiotic use on a population level. During the first project, I analyzed changes in genomes of MRSA strains isolated from several patients throughout antibiotic therapies and developing MRSA infections. I observed changes in number and types of virulence factors responsible for interacting with the human body, which are attributed to mobile genetic elements. In the second project, I showed that, prompted by antibiotic therapy, within the human gut microbiome resistance transfers from bacterial genomes onto plasmids, prophages, and free phages. Hence, resistance emergence depends not only on the antibiotic therapy but also on the state of the gut microbiome, which again results from the patients’ overall health and previous antibiotic therapies. The third project, SATURN, employed machine learning methods for a large set of data regarding patients’ demographics, comorbidities, antibiotic therapies, surgeries, and colonization with multi-drug resistant bacteria. The final classifiers were made available on the AskSaturn website where the doctors can compare antibiotic therapies based on the probability of colonization with multi-drug resistant bacteria. The fourth project, Tübiom, focused on the antibiotic-influenced gut microbiomes of the healthy population. The first two projects rely on genome and metagenome sequencing data. For them, I designed specialized bioinformatics analysis pipelines. The latter two projects use mixed data, which were analyzed with machine learning algorithms. These projects also involved web development and data visualization. Although each of the projects requires different data and methods, each of them provides a crucial part in a pipeline aiming at utilizing gut microbiome information in medical practice to constrain resistance emergence

    Desarrollo de técnicas bioinformáticas para el análisis de datos de secuenciación masiva en sistemática y genómica evolutiva: Aplicación en el análisis del sistema quimiosensorial en artrópodos

    Get PDF
    [spa] Las tecnologías de secuenciación de próxima generación (NGS) proporcionan datos potentes para investigar cuestiones biológicas y evolutivas fundamentales, como estudios relacionados con la genómica evolutiva de la adaptación y la filogenética. Actualmente, es posible llevar a cabo proyectos genómicos complejos analizando genomas completos y / o transcriptomas, incluso de organismos no modelo. En esta tesis, hemos realizado dos estudios complementarios utilizando datos NGS. En primer lugar, hemos analizado el transcriptoma (RNAseq) de los principales órganos quimiosensoriales del quelicerado Macrothele calpeiana, Walckenaer, 1805, la única araña protegida en Europa, para investigar el origen y la evolución del sistema quimiosensorial (SQ) en los artrópodos. El SQ es un proceso fisiológico esencial para la supervivencia de los organismos, y está involucrado en procesos biológicos vitales, como la detección de alimentos, parejas o depredadores y sitios de ovoposición. Este sistema, está relativamente bien caracterizado en hexápodos, pero existen pocos estudios en otros linajes de artrópodos. El análisis de nuestro transcriptoma permitió detectar algunos genes expresados en los supuestos órganos quimiosensoriales de los quelicerados, como cinco NPC2 y dos IR. Además, también detectamos 29 tránscritos adicionales después de incluir en los perfiles de HMM nuevos miembros del SQ de genomas de artrópodos recientemente disponibles, como algunos genes de las familias de los SNMP, ENaC, TRP, GR y una OBP-like. Desafortunadamente, muchos de ellos eran fragmentos parciales. En segundo lugar, también hemos desarrollado algunas herramientas bioinformáticas para analizar datos de RNAseq y desarrollar marcadores moleculares. Los investigadores interesados en la aplicación biológica de datos NGS pueden carecer de la experiencia bioinformática requerida para el tratamiento de la gran cantidad de datos generados. En este contexto, principalmente, es necesario el desarrollo de herramientas fáciles de usar para realizar todos los procesos relacionados con el procesamiento básico de datos NGS y la integración de utilidades para realizar análisis posteriores. En esta tesis, hemos desarrollado dos herramientas bioinformáticas con interfaz gráfica, que permite realizar todos los procesos comunes del procesamiento de datos NGS y algunos de los principales análisis posteriores: i) TRUFA (TRanscriptome User-Friendly Analysis), que permite analizar datos RNAseq de organismos que no modelos, incluyendo la anotación funcional y el análisis de expresión génica diferencial; y ii) DOMINO (Development Of Molecular markers In Non-model Organisms), que permite identificar y seleccionar marcadores moleculares apropiados para análisis de biología evolutiva. Estas herramientas han sido validadas utilizando simulaciones por ordenador y datos experimentales, principalmente de arañas.[eng] The Next Generation Sequencing (NGS) technologies are providing powerful data to investigate fundamental biological and evolutionary questions including phylogenetic and adaptive genomic topics. Currently, it is possible to carry out complex genomic projects analyzing the complete genomes and/or transcriptomes even in non-model organisms. In this thesis, we have performed two complementary studies using NGS data. Firstly, we have analyzed the transcriptome (RNAseq) of the main chemosensory organs of the chelicerate Macrothele calpeiana, Walckenaer, 1805, the only spider protected in Europe, to investigate the origin and evolution of the Chemosensory System (CS) in arthropods. The CS is an essential physiological process for the survival of organisms, and it is involved in vital biological processes, such as the detection of food, partners or predators and oviposition sites. This system, which has it relatively well characterized in hexapods, is completely unknown in other arthropod lineages. Our transcriptome analysis allowed to detect some genes expressed in the putative chemosensory organs of chelicerates, such as five NPC2s and two IRs. Furthermore, we detected 29 additional transcripts after including new CS members from recently available genomes in the HMM profiles, such as the SNMPs, ENaCs, TRPs, GRs and one OBP-like. Unfortunately, many of them were partial fragments. Secondly, we have also developed some bioinformatics tools to analyze RNAseq data, and to develop molecular markers. Researchers interested in the biological application of NGS data may lack the bioinformatic expertise required for the treatment of the large amount of data generated. In this context, the development of user-friendly tools for common data processing and the integration of utilities to perform downstream analysis is mostly needed. In this thesis, we have developed two bioinformatics tools with an easy to use graphical interface to perform all the basics processes of the NGS data processing: i) TRUFA (TRanscriptome User-Friendly Analysis), that allows analyzing RNAseq data from non-model organisms, including the functional annotation and differential gene expression analysis; and ii) DOMINO (Development of Molecular markers in Non-model Organisms), which allows identifying and selecting molecular markers appropriated for evolutionary biology analysis. These tools have been validated using computer simulations and experimental data, mainly from spiders
    corecore