281 research outputs found

    Improving Taxonomic Delimitation of Fungal Species in the Age of Genomics and Phenomics

    Get PDF
    Species concepts have long provided a source of debate among biologists. These lively debates have been important for reaching consensus on how to communicate across scientific disciplines and for advancing innovative strategies to study evolution, population biology, ecology, natural history, and disease epidemiology. Species concepts are also important for evaluating variability and diversity among communities, understanding biogeographical distributions, and identifying causal agents of disease across animal and plant hosts. While there have been many attempts to address the concept of species in the fungi, there are several concepts that have made taxonomic delimitation especially challenging. In this review we discuss these major challenges and describe methodological approaches that show promise for resolving ambiguity in fungal taxonomy by improving discrimination of genetic and functional traits. We highlight the relevance of eco-evolutionary theory used in conjunction with integrative taxonomy approaches to improve the understanding of interactions between environment, ecology, and evolution that give rise to distinct species boundaries. Beyond recent advances in genomic and phenomic methods, bioinformatics tools and modeling approaches enable researchers to test hypothesis and expand our knowledge of fungal biodiversity. Looking to the future, the pairing of integrative taxonomy approaches with multi-locus genomic sequencing and phenomic techniques, such as transcriptomics and proteomics, holds great potential to resolve many unknowns in fungal taxonomic classification

    Improving Taxonomic Delimitation of Fungal Species in the Age of Genomics and Phenomics

    Get PDF
    Species concepts have long provided a source of debate among biologists. These lively debates have been important for reaching consensus on how to communicate across scientific disciplines and for advancing innovative strategies to study evolution, population biology, ecology, natural history, and disease epidemiology. Species concepts are also important for evaluating variability and diversity among communities, understanding biogeographical distributions, and identifying causal agents of disease across animal and plant hosts. While there have been many attempts to address the concept of species in the fungi, there are several concepts that have made taxonomic delimitation especially challenging. In this review we discuss these major challenges and describe methodological approaches that show promise for resolving ambiguity in fungal taxonomy by improving discrimination of genetic and functional traits. We highlight the relevance of eco-evolutionary theory used in conjunction with integrative taxonomy approaches to improve the understanding of interactions between environment, ecology, and evolution that give rise to distinct species boundaries. Beyond recent advances in genomic and phenomic methods, bioinformatics tools and modeling approaches enable researchers to test hypothesis and expand our knowledge of fungal biodiversity. Looking to the future, the pairing of integrative taxonomy approaches with multi-locus genomic sequencing and phenomic techniques, such as transcriptomics and proteomics, holds great potential to resolve many unknowns in fungal taxonomic classification

    Robust Algorithms for Detecting Hidden Structure in Biological Data

    Get PDF
    Biological data, such as molecular abundance measurements and protein sequences, harbor complex hidden structure that reflects its underlying biological mechanisms. For example, high-throughput abundance measurements provide a snapshot the global state of a living cell, while homologous protein sequences encode the residue-level logic of the proteins\u27 function and provide a snapshot of the evolutionary trajectory of the protein family. In this work I describe algorithmic approaches and analysis software I developed for uncovering hidden structure in both kinds of data. Clustering is an unsurpervised machine learning technique commonly used to map the structure of data collected in high-throughput experiments, such as quantification of gene expression by DNA microarrays or short-read sequencing. Clustering algorithms always yield a partitioning of the data, but relying on a single partitioning solution can lead to spurious conclusions. In particular, noise in the data can cause objects to fall into the same cluster by chance rather than due to meaningful association. In the first part of this thesis I demonstrate approaches to clustering data robustly in the presence of noise and apply robust clustering to analyze the transcriptional response to injury in a neuron cell. In the second part of this thesis I describe identifying hidden specificity determining residues (SDPs) from alignments of protein sequences descended through gene duplication from a common ancestor (paralogs) and apply the approach to identify numerous putative SDPs in bacterial transcription factors in the LacI family. Finally, I describe and demonstrate a new algorithm for reconstructing the history of duplications by which paralogs descended from their common ancestor. This algorithm addresses the complexity of such reconstruction due to indeterminate or erroneous homology assignments made by sequence alignment algorithms and to the vast prevalence of divergence through speciation over divergence through gene duplication in protein evolution

    On the detection of functionally coherent groups of protein domains with an extension to protein annotation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein domains coordinate to perform multifaceted cellular functions, and domain combinations serve as the functional building blocks of the cell. The available methods to identify functional domain combinations are limited in their scope, e.g. to the identification of combinations falling within individual proteins or within specific regions in a translated genome. Further effort is needed to identify groups of domains that span across two or more proteins and are linked by a cooperative function. Such functional domain combinations can be useful for protein annotation.</p> <p>Results</p> <p>Using a new computational method, we have identified 114 groups of domains, referred to as domain assembly units (DASSEM units), in the proteome of budding yeast <it>Saccharomyces cerevisiae</it>. The units participate in many important cellular processes such as transcription regulation, translation initiation, and mRNA splicing. Within the units the domains were found to function in a cooperative manner; and each domain contributed to a different aspect of the unit's overall function. The member domains of DASSEM units were found to be significantly enriched among proteins contained in transcription modules, defined as genes sharing similar expression profiles and presumably similar functions. The observation further confirmed the functional coherence of DASSEM units. The functional linkages of units were found in both functionally characterized and uncharacterized proteins, which enabled the assessment of protein function based on domain composition.</p> <p>Conclusion</p> <p>A new computational method was developed to identify groups of domains that are linked by a common function in the proteome of <it>Saccharomyces cerevisiae</it>. These groups can either lie within individual proteins or span across different proteins. We propose that the functional linkages among the domains within the DASSEM units can be used as a non-homology based tool to annotate uncharacterized proteins.</p

    Computational methods for analysis and modeling of time-course gene expression data

    Get PDF
    Genes encode proteins, some of which in turn regulate other genes. Such interactions make up gene regulatory relationships or (dynamic) gene regulatory networks. With advances in the measurement technology for gene expression and in genome sequencing, it has become possible to measure the expression level of thousands of genes simultaneously in a cell at a series of time points over a specific biological process. Such time-course gene expression data may provide a snapshot of most (if not all) of the interesting genes and may lead to a better understanding gene regulatory relationships and networks. However, inferring either gene regulatory relationships or networks puts a high demand on powerful computational methods that are capable of sufficiently mining the large quantities of time-course gene expression data, while reducing the complexity of the data to make them comprehensible. This dissertation presents several computational methods for inferring gene regulatory relationships and gene regulatory networks from time-course gene expression. These methods are the result of the author’s doctoral study. Cluster analysis plays an important role for inferring gene regulatory relationships, for example, uncovering new regulons (sets of co-regulated genes) and their putative cis-regulatory elements. Two dynamic model-based clustering methods, namely the Markov chain model (MCM)-based clustering and the autoregressive model (ARM)-based clustering, are developed for time-course gene expression data. However, gene regulatory relationships based on cluster analysis are static and thus do not describe the dynamic evolution of gene expression over an observation period. The gene regulatory network is believed to be a time-varying system. Consequently, a state-space model for dynamic gene regulatory networks from time-course gene expression data is developed. To account for the complex time-delayed relationships in gene regulatory networks, the state space model is extended to be the one with time delays. Finally, a method based on genetic algorithms is developed to infer the time-delayed relationships in gene regulatory networks. Validations of all these developed methods are based on the experimental data available from well-cited public databases

    Insights into Genome Functional Organisation through the Analysis of Interaction Networks

    Get PDF
    Using computational techniques to identify orthology and operon structure, it is possible to find functional interactions between genes, which, together, define the genetic interactome. These large networks contain information about the relationships between phenotypes in organisms as genes responsible for related abilities are often co-regulated and reasserting of these genes can be detected in the operon structure. However, these networks are too large to analyse by hand In order to practically analyse the networks, a computational tool, gisql, was developed and, using this tool, the connectivity patterns in the genetic interactome can be analysed to understand high-level organisation of the genome and to narrow the list of candidate genes for wet lab analysis. The many strains of Escherichia coli are interesting subjects as there are many sequenced strains and they show highly variable pathogenic abilities. Analysis shows that the pathogenic genes have a strong tendency to connect to genes ubiquitous in the E. coli pan-genome. The Rhizobiales, including Sinorhizobium meliloti and Ochrobactrum anthropi, are multi-chromosomal eukaryote-associated bacteria and a significant history of horizontal transfer. Regions of the pSymB megaplasmid of S. meliloti which cannot be deleted via transposon-targeted homologous recombination were shown to be significantly more connected to the main chromosome. Targets for functional complementation of deletions in pSymB in S. meliloti using genes from O. anthropi were identified and unusual connectivity patterns of orthologs were identified. Finally, a putative cytokinin receptor in the Rhizobiaceæ, likely involved in symbiosis with plant hosts, was identified. Thanks to the flexibility of gisql, these analyses were straight-forward and fast to develop

    Knowledge-based identification of functional domains in proteins

    Get PDF
    The characterization of proteins and enzymes is traditionally organised according to the sequence-structure-function paradigm. The investigation of the inter-relationships between these three properties has motivated the development of several experimental and computational techniques, that have made available an unprecedented amount of sequence and structural data. The interest in developing comparative methods for rationalizing such copious information has, of course, grown in parallel. Regarding the structure-function relationship, for instance, the availability of experimentally resolved protein structures and of computer simulations have improved our understanding of the role of proteins' internal dynamics in assisting their functional rearrangements and activity. Several approaches are currently available for elucidating and comparing proteins' internal dynamics. These can capture the relevant collective degrees of freedom that recapitulate the main conformational changes. These collective coordinates have the potential to unveil remote evolutionary relationships between proteins, that are otherwise not easily accessible from purely sequence- or structure-based investigations. Starting from this premise, in the first chapter of this thesis I will present a novel and general computational method that can detect large-scale dynamical correlations in proteins by comparing different representative conformers. This is accomplished by applying dimensionality-reduction techniques to inter-amino acid distance fluctuation matrices. As a result, an optimal quasi-rigid domain decomposition of the protein or macromolecular assembly of interest is identified, and this facilitates the functionally-oriented interpretation of their internal dynamics. Building on this approach, in the second chapter I will discuss its systematic application to a class of membrane proteins of paramount biochemical interest, namely the class A G protein-coupled receptors. The comparative analysis of their internal dynamics, as encoded by the quasi-rigid domains, allowed us to identify recurrent patterns in the large-scale dynamics of these receptors. This, in turn, allowed us to single out a number of key functional sites. These were, for the most part, previously known -- a fact that at the same time validates the method, and gives confidence for the viability of the other, novel sites. Finally, for the last part of the thesis, I focussed on the sequence-structure relationship. In particular, I considered the problem of inferring structural properties of proteins from the analysis of large multiple sequence alignments of homologous sequences. For this purpose, I recasted the strategies developed for the dynamical features extraction in order to identify compact groups of coevolving residues, based only on the knowledge of amino acid variability in aligned primary sequences. Throughout the thesis, many methodological techniques have been taken into considerations, mainly based on concepts from graph theory and statistical data analysis (clustering). All these topics are explained in the methodological sections of each chapter
    • …
    corecore