336 research outputs found

    Protein Secondary Structure Prediction using an Optimised Bayesian Classification Neural Network

    Get PDF
    The prediction of protein secondary structure is a topic that has been tackled by many researchers in the field of bioinformatics. In previous work, this problem has been solved by various methods including the use of traditional classification neural networks with the standard error back-propagation training algorithm. Since the traditional neural network may have a poor generalisation, the Bayesian technique has been used to improve the generalisation and the robustness of these networks. This paper describes the use of optimised classification Bayesian neural networks for the prediction of protein secondary structure. The well-known RS126 dataset was used for network training and testing. The experimental results show that the optimised classification Bayesian neural network can reach an accuracy greater than 75%

    Genetics of Halophilic Microorganisms

    Get PDF
    Halophilic microorganisms are found in all domains of life and thrive in hypersaline (high salt content) environments. These unusual microbes have been a subject of study for many years due to their interesting properties and physiology. Studies of the genetics of halophilic microorganisms (from gene expression and regulation to genomics) have provided understanding into the mechanisms of how life can exist at high salinity levels. Here, we highlight recent studies that advance the knowledge of biological function through examination of the genetics of halophilic microorganisms and their viruses

    Analysis of pan-genome content and its application in microbial identification

    Get PDF

    Comparative genomics of Dothideomycete fungi

    Get PDF
    Fungi are a diverse group of eukaryotic micro-organisms particularly suited for comparative genomics analyses. Fungi are important to industry, fundamental science and many of them are notorious pathogens of crops, thereby endangering global food supply. Dozens of fungi have been sequenced in the last decade and with the advances of the next generation sequencing, thousands of new genome sequences will become available in coming years. In this thesis I have used bioinformatics tools to study different biological and evolutionary processes in various genomes with a focus on the genomes of the Dothideomycetefungi Cladosporium fulvum, Dothistroma septosporumand Zymoseptoria tritici. Chapter 1introduces the scientific disciplines of mycology and bioinformatics from a historical perspective. It exemplifies a typical whole-genome sequence analysis of a fungal genome, and focusses in particular on structural gene annotation and detection of transposable elements. In addition it shortly reviews the microRNA pathway as known in animal and plants in the context of the putative existence of similar yet subtle different small RNA pathways in other branches of the eukaryotic tree of life. Chapter 2addresses the novel sequenced genomes of the closely related Dothideomyceteplant pathogenic fungi Cladosporium fulvumand Dothistroma septosporum. Remarkably, it revealed occurrence of a surprisingly high similarity at the protein level combined with striking differences at the DNA level, gene repertoire and gene expression. Most noticeably, the genome of C. fulvumappears to be at least twice as large, which is solely attributable to a much larger content in repetitive sequences. Chapter 3describes a novel alignment-based fungal gene prediction method (ABFGP) that is particularly suitable for plastic genomes like those of fungi. It shows excellent performance benchmarked on a dataset of 7,000 unigene-supported gene models from ten different fungi. Applicability of the method was shown by revisiting the annotations of C. fulvumand D. septosporumand of various other fungal genomes from the first-generation sequencing era. Thousands of gene models were revised in each of the gene catalogues, indeed revealing a correlation to the quality of the genome assembly, and to sequencing strategies used in the sequencing centres, highlighting different types of errors in different annotation pipelines. Chapter 4focusses on the unexpected high number of gene models that were identified by ABFGP that align nicely to informant genes, but only upon toleration of frame shifts and in-frame stop-codons. These discordances could represent sequence errors (SEs) and/or disruptive mutations (DMs) that caused these truncated and erroneous gene models. We revisited the same fungal gene catalogues as in chapter 3, confirmed SEs by resequencing and successively removed those, yielding a high-confidence and large dataset of nearly 1,000 pseudogenes caused by DMs. This dataset of fungal pseudogenes, containing genes listed as bona fide genes in current gene catalogues, does not correspond to various observations previously done on fungal pseudogenes. Moreover, the degree of pseudogenization showing up to a ten-fold variation for the lowest versus the highest affected species, is generally higher in species that reproduce asexually compared to those that in addition reproduce sexually. Chapter 5describes explorative genomics and comparative genomics analyses revealing the presence of introner-like elements (ILEs) in various Dothideomycetefungi including Zymoseptoria triticiin which they had not identified yet, although its genome sequence is already publicly available for several years. ILEs combine hallmark intron properties with the apparent capability of multiplying themselves as repetitive sequence. ILEs strongly associate with events of intron gain, thereby delivering in silico proof of their mobility. Phylogenetic analyses at the intra- and inter-species level showed that most ILEs are related and likely share common ancestry. Chapter 6provides additional evidence that ILE multiplication strongly dominates over other types of intron duplication in fungi. The observed high rate of ILE multiplication followed by rapid sequence degeneration led us to hypothesize that multiplication of ILEs has been the major cause and mechanism of intron gain in fungi, and we speculate that this could be generalized to all eukaryotes. Chapter 7describes a new strategy for miRNA hairpin prediction using statistical distributions of observed biological variation of properties (descriptors) of known miRNA hairpins. We show that the method outperforms miRNA prediction by previous, conventional methods that usually apply threshold filtering. Using this method, several novel candidate miRNAs were assigned in the genomes of Caenorhabditis elegansand two human viruses. Although this chapter is not applied on fungi, the study does provide a flexible method to find evidence for existence of a putative miRNA-like pathway in fungi. Chapter 8provides a general discussion on the advent of bioinformatics in mycological research and its implications. It highlights the necessity of a prioriplanning and integration of functional analysis and bioinformatics in order to achieve scientific excellence, and describes possible scenarios for the near future of fungal (comparative) genomics research. Moreover, it discusses the intrinsic error rate in large-scale, automatically inferred datasets and the implications of using and comparing those.</p

    Comparative genomics of Dothideomycete fungi

    Get PDF
    Fungi are a diverse group of eukaryotic micro-organisms particularly suited for comparative genomics analyses. Fungi are important to industry, fundamental science and many of them are notorious pathogens of crops, thereby endangering global food supply. Dozens of fungi have been sequenced in the last decade and with the advances of the next generation sequencing, thousands of new genome sequences will become available in coming years. In this thesis I have used bioinformatics tools to study different biological and evolutionary processes in various genomes with a focus on the genomes of the Dothideomycetefungi Cladosporium fulvum, Dothistroma septosporumand Zymoseptoria tritici. Chapter 1introduces the scientific disciplines of mycology and bioinformatics from a historical perspective. It exemplifies a typical whole-genome sequence analysis of a fungal genome, and focusses in particular on structural gene annotation and detection of transposable elements. In addition it shortly reviews the microRNA pathway as known in animal and plants in the context of the putative existence of similar yet subtle different small RNA pathways in other branches of the eukaryotic tree of life. Chapter 2addresses the novel sequenced genomes of the closely related Dothideomyceteplant pathogenic fungi Cladosporium fulvumand Dothistroma septosporum. Remarkably, it revealed occurrence of a surprisingly high similarity at the protein level combined with striking differences at the DNA level, gene repertoire and gene expression. Most noticeably, the genome of C. fulvumappears to be at least twice as large, which is solely attributable to a much larger content in repetitive sequences. Chapter 3describes a novel alignment-based fungal gene prediction method (ABFGP) that is particularly suitable for plastic genomes like those of fungi. It shows excellent performance benchmarked on a dataset of 7,000 unigene-supported gene models from ten different fungi. Applicability of the method was shown by revisiting the annotations of C. fulvumand D. septosporumand of various other fungal genomes from the first-generation sequencing era. Thousands of gene models were revised in each of the gene catalogues, indeed revealing a correlation to the quality of the genome assembly, and to sequencing strategies used in the sequencing centres, highlighting different types of errors in different annotation pipelines. Chapter 4focusses on the unexpected high number of gene models that were identified by ABFGP that align nicely to informant genes, but only upon toleration of frame shifts and in-frame stop-codons. These discordances could represent sequence errors (SEs) and/or disruptive mutations (DMs) that caused these truncated and erroneous gene models. We revisited the same fungal gene catalogues as in chapter 3, confirmed SEs by resequencing and successively removed those, yielding a high-confidence and large dataset of nearly 1,000 pseudogenes caused by DMs. This dataset of fungal pseudogenes, containing genes listed as bona fide genes in current gene catalogues, does not correspond to various observations previously done on fungal pseudogenes. Moreover, the degree of pseudogenization showing up to a ten-fold variation for the lowest versus the highest affected species, is generally higher in species that reproduce asexually compared to those that in addition reproduce sexually. Chapter 5describes explorative genomics and comparative genomics analyses revealing the presence of introner-like elements (ILEs) in various Dothideomycetefungi including Zymoseptoria triticiin which they had not identified yet, although its genome sequence is already publicly available for several years. ILEs combine hallmark intron properties with the apparent capability of multiplying themselves as repetitive sequence. ILEs strongly associate with events of intron gain, thereby delivering in silico proof of their mobility. Phylogenetic analyses at the intra- and inter-species level showed that most ILEs are related and likely share common ancestry. Chapter 6provides additional evidence that ILE multiplication strongly dominates over other types of intron duplication in fungi. The observed high rate of ILE multiplication followed by rapid sequence degeneration led us to hypothesize that multiplication of ILEs has been the major cause and mechanism of intron gain in fungi, and we speculate that this could be generalized to all eukaryotes. Chapter 7describes a new strategy for miRNA hairpin prediction using statistical distributions of observed biological variation of properties (descriptors) of known miRNA hairpins. We show that the method outperforms miRNA prediction by previous, conventional methods that usually apply threshold filtering. Using this method, several novel candidate miRNAs were assigned in the genomes of Caenorhabditis elegansand two human viruses. Although this chapter is not applied on fungi, the study does provide a flexible method to find evidence for existence of a putative miRNA-like pathway in fungi. Chapter 8provides a general discussion on the advent of bioinformatics in mycological research and its implications. It highlights the necessity of a prioriplanning and integration of functional analysis and bioinformatics in order to achieve scientific excellence, and describes possible scenarios for the near future of fungal (comparative) genomics research. Moreover, it discusses the intrinsic error rate in large-scale, automatically inferred datasets and the implications of using and comparing those.</p

    Structure and Evolution of Lizard Immunity Genes

    Get PDF
    One of the most important gene families to play a role in adaptive immunity is the major histocompatibility complex (MHC). MHC class II loci are considered to be the most variable loci in the vertebrate genome, and studies have shown that this variability can be maintained through complex co-evolutionary dynamics between host and parasite. Despite the rich body of research into the MHC, there is comparatively little understanding of its genomic architecture in reptiles. Similarly, loci associated with innate immunity have received little attention in reptiles compared to other vertebrates. In the first chapter, we investigated the structure and organization of the MHC in the Anolis carolinensis genome by sequencing and annotating five bacterial artificial chromosomes (BAC) from the green anole genome library. We were able to identify three mhc2a, four mhc2b, and up to 15 mhc1 loci in A. carolinensis. Furthermore, we were able to link 17 scaffolds and provide sequence data to fill two significant gaps in the genome assembly. In the second chapter, we investigated the relative importance of drift and selection in shaping mhc2 variability in the reptile Podarcis erhardii. We sequenced the mhc2 gene from lizard populations from 14 islands in the Aegean that have experienced bottlenecks of differing duration and intensity. Despite signals of balancing selection, patterns of mhc2 variation were similar to microsatellites, providing evidence that the dominant evolutionary force in this system is drift. In the third chapter, we investigated how parasite infection rates impact innate immune variability in A. sabanus, a lizard indigenous to Saba Island where natural fluctuations in Plasmodium infection rates have been documented. We developed primers and sequenced part of the peptide binding region of three Toll-like receptors (TLRs) - tlr4, tlr6, and tlr13 and several beta-defensin (BD) loci. Although we were unable to characterize BD variability, we found three different haplotypes in tlr4, and five in tlr6. However, nucleotide variability was low (π \u3c 0.005) and was not associated with infection status. We nevertheless present primers for multiple TLR genes and two BDs that could be of use in future studies of reptile innate immunity

    Plant Viruses: From Ecology to Control

    Get PDF
    Plant viruses cause many of the most important diseases threatening crops worldwide. Over the last quarter of a century, an increasing number of plant viruses have emerged in various parts of the world, especially in the tropics and subtropics. As is generally observed for plant viruses, most of the emerging viruses are transmitted horizontally by biological vectors, mainly insects. Reverse genetics using infectious clones—available for many plant viruses—has been used for identification of viral determinants involved in virus–host and virus–vector interactions. Although many studies have identified a number of factors involved in disease development and transmission, the precise mechanisms are unknown for most of the virus–plant–vector combinations. In most cases, the diverse outcomes resulting from virus–virus interactions are poorly understood. Although significant advances have been made towards understand the mechanisms involved in plant resistance to viruses, we are far from being able to apply this knowledge to protect cultivated plants from the all viral threats.The aim of this Special Issue was to provide a platform for researchers interested in plant virology to share their recent results. To achieve this, we invited the plant virology community to submit research articles, short communications and reviews related to the various aspects of plant virology: ecology, virus–plant host interactions, virus–vector interactions, virus–virus interactions, and control strategies. This issue contains some of the best current research in plant virology

    Network and multi-scale signal analysis for the integration of large omic datasets: applications in \u3ci\u3ePopulus trichocarpa\u3c/i\u3e

    Get PDF
    Poplar species are promising sources of cellulosic biomass for biofuels because of their fast growth rate, high cellulose content and moderate lignin content. There is an increasing movement on integrating multiple layers of ’omics data in a systems biology approach to understand gene-phenotype relationships and assist in plant breeding programs. This dissertation involves the use of network and signal processing techniques for the combined analysis of these various data types, for the goals of (1) increasing fundamental knowledge of P. trichocarpa and (2) facilitating the generation of hypotheses about target genes and phenotypes of interest. A data integration “Lines of Evidence” method is presented for the identification and prioritization of target genes involved in functions of interest. A new post-GWAS method, Pleiotropy Decomposition, is presented, which extracts pleiotropic relationships between genes and phenotypes from GWAS results, allowing for identification of genes with signatures favorable to genome editing. Continuous wavelet transform signal processing analysis is applied in the characterization of genome distributions of various features (including variant density, gene density, and methylation profiles) in order to identify chromosome structures such as the centromere. This resulted in the approximate centromere locations on all P. trichocarpa chromosomes, which had previously not been adequately reported in the scientific literature. Discrete wavelet transform signal processing followed by correlation analysis was applied to genomic features from various data types including transposable element density, methylation density, SNP density, gene density, centromere position and putative ancestral centromere position. Subsequent correlation analysis of the resulting wavelet coefficients identified scale-specific relationships between these genomic features, and provide insights into the evolution of the genome structure of P. trichocarpa. These methods have provided strategies to both increase fundamental knowledge about the P. trichocarpa system, as well as to identify new target genes related to biofuels targets. We intend that these approaches will ultimately be used in the designing of better plants for more efficient and sustainable production of bioenergy

    An exploration of the function of specific components of the predicted secretome of Fusarium graminearum during wheat infection

    Get PDF
    Fusarium graminearum is a major fungal pathogen of wheat and other small grain cereal crops globally, causing Fusarium ear blight (FEB) disease. Like many other plant pathogens, F. graminearum is predicted to produce in planta secreted effector proteins that modulate plant metabolism to suppress or re-programme plant defences. Understanding the molecular functions of Fg effectors will help to elucidate the processes underlying wheat spike colonisation and fungal pathogenicity. With the aim of identifying Fg effector proteins that can suppress host plant defences, I selected using next generation sequencing and bioinformatic analysis, a set of small secreted proteins (SSP) to express in planta using the Barley stripe mosaic virus over-expression system (BSMV-VOX). I then tested whether expression of any of these SSPs enhanced Fg fungal infection of susceptible wheat spikes. Amongst the set of Fg SSP tested, FgSSP8, which encodes a ribonuclease protein, induced strong symptoms of necrosis in N. benthamiana leaves when infiltrated via the BSMV:FgSSP8. Three other genes tested (FgSSP7, FgSSP6 and FgSSP5) enhance FEB disease formation in the majority of the experiments when overexpressed in wheat ears prior to infecting with F. graminearum. FgSSP6 and FgSSP7 belong to the cerato-platanin protein (CPP) family. In several other plant pathogenic fungi, CPPs have been implicated in a number of virulence and plant protection mechanisms, including induction of host plant cell death, binding specific polymers and/or expansin-like activity. FgSSP5 encodes a protein that possesses the pfam domain RALF (Rapid alkalinization factor; PF05498.6). RALF domain-containing proteins are predominately found in plants and play a role in plant development regulating tissue expansion and/or negatively regulating pollen tube elongation. BLAST analyses identified RALF domain containing proteins in a restricted range of different pathogen species. Based on the VOX results and biochemical tests, our hypothesis is that pre-elevated cerato-platanins (FgSSP6 and FgSSP7) levels in the apoplast/surrounding the hyphae could initially shield the hyphae from detection by the plant, but late induce an intense defence response culminating in cell death to benefit the necrotrophic phase of Fg by increasing nutrient availability. FgSSP5 may be a specific virulence factor that manipulates a key plant process, by alkalinising the plant environment during infection, and using the same plant receptor repertoire used to recognise plant proteins. Once the mechanisms are further understood, these genes/proteins could potentially be novel intervention targets either for conventional chemistries and/or for methods such as host-induced gene silencing to achieve FEB disease and/or mycotoxin control. The characterisation of single and double gene deletion F. graminearum mutants is in progress.CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior) - Brazi

    Desarrollo de técnicas bioinformáticas para el análisis de datos de secuenciación masiva en sistemática y genómica evolutiva: Aplicación en el análisis del sistema quimiosensorial en artrópodos

    Get PDF
    [spa] Las tecnologías de secuenciación de próxima generación (NGS) proporcionan datos potentes para investigar cuestiones biológicas y evolutivas fundamentales, como estudios relacionados con la genómica evolutiva de la adaptación y la filogenética. Actualmente, es posible llevar a cabo proyectos genómicos complejos analizando genomas completos y / o transcriptomas, incluso de organismos no modelo. En esta tesis, hemos realizado dos estudios complementarios utilizando datos NGS. En primer lugar, hemos analizado el transcriptoma (RNAseq) de los principales órganos quimiosensoriales del quelicerado Macrothele calpeiana, Walckenaer, 1805, la única araña protegida en Europa, para investigar el origen y la evolución del sistema quimiosensorial (SQ) en los artrópodos. El SQ es un proceso fisiológico esencial para la supervivencia de los organismos, y está involucrado en procesos biológicos vitales, como la detección de alimentos, parejas o depredadores y sitios de ovoposición. Este sistema, está relativamente bien caracterizado en hexápodos, pero existen pocos estudios en otros linajes de artrópodos. El análisis de nuestro transcriptoma permitió detectar algunos genes expresados en los supuestos órganos quimiosensoriales de los quelicerados, como cinco NPC2 y dos IR. Además, también detectamos 29 tránscritos adicionales después de incluir en los perfiles de HMM nuevos miembros del SQ de genomas de artrópodos recientemente disponibles, como algunos genes de las familias de los SNMP, ENaC, TRP, GR y una OBP-like. Desafortunadamente, muchos de ellos eran fragmentos parciales. En segundo lugar, también hemos desarrollado algunas herramientas bioinformáticas para analizar datos de RNAseq y desarrollar marcadores moleculares. Los investigadores interesados en la aplicación biológica de datos NGS pueden carecer de la experiencia bioinformática requerida para el tratamiento de la gran cantidad de datos generados. En este contexto, principalmente, es necesario el desarrollo de herramientas fáciles de usar para realizar todos los procesos relacionados con el procesamiento básico de datos NGS y la integración de utilidades para realizar análisis posteriores. En esta tesis, hemos desarrollado dos herramientas bioinformáticas con interfaz gráfica, que permite realizar todos los procesos comunes del procesamiento de datos NGS y algunos de los principales análisis posteriores: i) TRUFA (TRanscriptome User-Friendly Analysis), que permite analizar datos RNAseq de organismos que no modelos, incluyendo la anotación funcional y el análisis de expresión génica diferencial; y ii) DOMINO (Development Of Molecular markers In Non-model Organisms), que permite identificar y seleccionar marcadores moleculares apropiados para análisis de biología evolutiva. Estas herramientas han sido validadas utilizando simulaciones por ordenador y datos experimentales, principalmente de arañas.[eng] The Next Generation Sequencing (NGS) technologies are providing powerful data to investigate fundamental biological and evolutionary questions including phylogenetic and adaptive genomic topics. Currently, it is possible to carry out complex genomic projects analyzing the complete genomes and/or transcriptomes even in non-model organisms. In this thesis, we have performed two complementary studies using NGS data. Firstly, we have analyzed the transcriptome (RNAseq) of the main chemosensory organs of the chelicerate Macrothele calpeiana, Walckenaer, 1805, the only spider protected in Europe, to investigate the origin and evolution of the Chemosensory System (CS) in arthropods. The CS is an essential physiological process for the survival of organisms, and it is involved in vital biological processes, such as the detection of food, partners or predators and oviposition sites. This system, which has it relatively well characterized in hexapods, is completely unknown in other arthropod lineages. Our transcriptome analysis allowed to detect some genes expressed in the putative chemosensory organs of chelicerates, such as five NPC2s and two IRs. Furthermore, we detected 29 additional transcripts after including new CS members from recently available genomes in the HMM profiles, such as the SNMPs, ENaCs, TRPs, GRs and one OBP-like. Unfortunately, many of them were partial fragments. Secondly, we have also developed some bioinformatics tools to analyze RNAseq data, and to develop molecular markers. Researchers interested in the biological application of NGS data may lack the bioinformatic expertise required for the treatment of the large amount of data generated. In this context, the development of user-friendly tools for common data processing and the integration of utilities to perform downstream analysis is mostly needed. In this thesis, we have developed two bioinformatics tools with an easy to use graphical interface to perform all the basics processes of the NGS data processing: i) TRUFA (TRanscriptome User-Friendly Analysis), that allows analyzing RNAseq data from non-model organisms, including the functional annotation and differential gene expression analysis; and ii) DOMINO (Development of Molecular markers in Non-model Organisms), which allows identifying and selecting molecular markers appropriated for evolutionary biology analysis. These tools have been validated using computer simulations and experimental data, mainly from spiders
    corecore