334 research outputs found

    Community-wide analysis of microbial genome sequence signatures

    Get PDF
    Genome signatures are used to identify and cluster sequences de novo from an acid biofilm microbial community metagenomic dataset, revealing information about the low-abundance community members

    Adapting the EMPIRIC Approach to Investigate Evolutionary Constraints in Influenza A Virus Surface Proteins

    Get PDF
    Controlling influenza A virus (IAV) infections remains a challenge largely due to the high replication and mutation rates of the virus. IAV is a negative-sense RNA virus with two main surface proteins — hemagglutinin (HA) and neuraminidase (NA). HA recognizes and binds sialic acid on host cell receptors to initiate virus entry. NA also recognizes sialic acid on host cell receptors but functions by cleaving sialic acid interactions to release progeny virus. Because both HA and NA interact with sialic acid on the host cell surface with opposing effects, their balance is essential for optimal viral infectivity. However, the evolutionary constraints that maintain HA and NA function, while conserving a functional balance, are not fully understood. I adapted the comprehensive and systematic mutational scanning technology, termed EMPIRIC (Exceedingly Meticulous and Parallel Investigation of Randomized Individual Codons), to investigate the local fitness landscape of regions of HA under standard conditions and under drug pressure. We observed that synonymous substitutions had a higher mean absolute fitness effect in the signal than a neighboring HA region used as a control. Folding ∆G calculations revealed a hairpin loop that appeared to be differentially enriched between human and swine IAV variants in sequences of circulating strains. However, the molecular mechanism resulting in the observed host species-specific constraints remains undefined. Studying the fitness landscape of the receptor binding site of HA revealed the high sensitivity of this region to mutation. However, modulating the levels of NA activity by mutation and by using the NA inhibitor oseltamivir enabled the identification of HA mutations with adaptive potential under selection pressure by oseltamivir. These results highlight the importance of the HA-NA functional balance virus replication and in the development of resistance to oseltamivir inhibitors. These studies provide improved understanding of IAV biology, and can inform the development of improved antiviral agents with reduced likelihood for resistance

    Discovering discriminative and class-specific sequence and structural motifs in proteins

    Get PDF
    Finding recurring motifs is an important problem in bioinformatics. Such motifs can be used for any number of problems including sequence classi cation, label prediction, knowledge discovery and biological engineering of proteins t for a speci c purpose. Our motivation is to create a better foundation for the research and development of novel motif mining and machine learning methods that can extract class-speci c and discriminative motifs using both sequence and structural features. We propose the building blocks of a general machine learning framework to act on a biological input. This thesis present a combination of elements that are aimed to be applicable to a variety of biological problems. Ideally, the learner should only require a number of biological data instances as input that are classi- ed into a number of di erent classes as de ned by the researchers. The output should be the factors and motifs that discriminate between those classes (for reasonable, non-random class de nitions). This ideal work ow requires two main steps. First step is the representation of the biological input with features that contain the signi cant information the researcher is looking for. Due to the complexity of the macromolecules, abstract representations are required to convert the real world representation into quanti able descriptors that are suitable for motif mining and machine learning. The second step of the proposed work ow is the motif mining and knowledge discovery step. Using these informative representations, an algorithm should be able to nd discriminative, class-speci c motifs that are over-represented in one class and under-represented in the other. This thesis presents novel procedures for representation of the proteins to be used in a variety of machine learning algorithms, and two separate motif mining algorithms, one based on temporal motif mining, and the other on deep learning, that can work with the given biological data. The descriptors and the learners are applied to a wide range of computational problems encountered in life sciences

    P. patens genomic and transcriptomic analyses

    Get PDF
    The model organism Physcomitrium patens, formerly Physcomitrella patens is a moss in the Funariaceae family. Due to P. patens ability to generate easily transgenic plants via homologous recombination, the interest of scientists worldwide was attracted. P. patens was the world's first completely sequenced non-seed plant genome (V1). Constant improvements of the genome assembly and the associated gene annotations resulted in the current P. patens pseudo-chromosomal genome version (V3). This genome version is the basis of all analyses performed in this thesis. Since P. patens became a U.S. Department of Energy Joint Genome Institute (DOE JGI) plant flagship genome 1 and a member of the JGI Gene Atlas project 2, hundreds of P. patens RNA-seq samples were generated. During my time as a PhD student, I analysed the JGI Gene Atlas RNA-seq samples and several dozen other RNA-seq samples from different projects. These RNA-seq samples contained data from five different P. patens ecotypes/accessions (Gransden, Kaskaskia, Reute, Villersexel, and Wisconsin).To efficiently analyse this data, I developed a powerful RNA-seq pipeline to perform differentially expressed gene (DEG) calling. The performance of the RNA-seq pipeline was tested by comparing its results to commercial software solutions and multiple RNA-seq samples from different species. My newly generated gene expression results, together with previous published expression data from a variety of other projects, were stored at our novel online tool PEATmoss. Furthermore, my gene version lookup tables were implemented in a database structure. This, allows PEATmoss users to find gene models of different gene annotation versions and to use them in PEATmoss. With an updated version of my RNA-seq pipeline, I identified and analysed sequence variations in P. patens accessions. A clear clustering by individual accessions could be shown. I could demonstrate, that due to decades of vegetative propagation in laboratories, somatic mutations have accumulated in Gransden laboratory plants. In addition, we used restriction fragment length polymorphism (RFLP) to offer a simple method for quick identification of unknown P. patens plants. 1 https://jgi.doe.gov/our-science/science-programs/plant-genomics/plant-flagship-genomes/ 2 https://jgi.doe.gov/doe-jgi-plant-flagship-gene-atlas

    Computational approaches for analysing and engineering micropollutant degradation in microbial communities

    Get PDF
    PhD ThesisThe presence of micropollutants in wastewater is problematic, as many micropollutants exert negative ecological and toxicological effects in their environment. A well-known effect of micropollutants is the feminisation of aquatic wildlife by environmental estrogens, a proportion of which enter water courses from municipal sources via wastewater treatment plants (WWTPs). While WWTPs remove some micropollutants, they are not designed to do so. Given that WWTPs already have high operating costs (both financially and energetically), there is a need for novel approaches to micropollutant removal that are both cost-effective and environmentally sustainable. One proposed approach is to use enzymes to degrade micropollutants, which requires an understanding of metabolic pathways for the desired micropollutant, and a strategy for deploying the enzymes in the environment. Although tools exist to assist with metabolic pathway prediction and enzyme discovery, there are currently no computational approaches that are able to identify enzymes from a user’s collection of proteins (given a query compound and expected change to that query compound). To address this research gap, we developed EnSeP, a data-driven, transformation-specific approach to enzyme discovery. Using EnSeP, we then identified candidate enzymes involved in estradiol degradation. Recent advances in synthetic biology mean that deploying a single synthetic construct in multiple microorganisms is feasible. In the context of micropollutant metabolism, this means that a biodegradative pathway could be introduced into multiple organisms in a community simultaneously, providing more opportunities for the construct (and its functionality) to persist in the population long-term. However, current design tools have not yet been adapted for multiple organism applications. To address this research gap, we developed an evolutionary algorithm (EA) that optimises a single coding sequence (CDS) for multiple hosts. Finally, based on insights from developing the EA, we developed an improved version of the single-organism CDS optimisation algorithm that the EA is based on

    Evolution of the insecticide target Rdl in African Anopheles is driven by interspecific and interkaryotypic introgression.

    Get PDF
    The evolution of insecticide resistance mechanisms in natural populations of Anopheles malaria vectors is a major public health concern across Africa. Using genome sequence data, we study the evolution of resistance mutations in the resistance to dieldrin locus (Rdl), a GABA receptor targeted by several insecticides, but most notably by the long-discontinued cyclodiene, dieldrin. The two Rdl resistance mutations (296G and 296S) spread across West and Central African Anopheles via two independent hard selective sweeps that included likely compensatory nearby mutations, and were followed by a rare combination of introgression across species (from A. gambiae and A. arabiensis to A. coluzzii) and across non-concordant karyotypes of the 2La chromosomal inversion. Rdl resistance evolved in the 1950s as the first known adaptation to a large-scale insecticide-based intervention, but the evolutionary lessons from this system highlight contemporary and future dangers for management strategies designed to combat development of resistance in malaria vectors

    Genomic Studies of Gene Expression Errors and Their Evolutionary Ramifications

    Full text link
    Gene expression produces biologically functional RNAs and proteins and is essential for life. Nevertheless, gene expression is subject to several types of errors that are generally harmful. Despite the prevalence and significant consequences of expression errors, their genome-wide patterns are not well characterized. Furthermore, the evolutionary ramifications of such errors are poorly understood. In my dissertation, I address the above questions using novel computational approaches. I focus on two types of gene expression errors: (i) stochastic gene expression, which leads to a variation of the expression level among isogenic cells in the same environment (gene expression noise), and (ii) mistranslation, which induces protein misfolding and can be toxic to the cells. My thesis has three main chapters in addition to the introduction and conclusion chapters. First, in Chapter 2, I studied gene expression noises of individual genes. I decomposed noises of 3975 mouse genes into intrinsic noise and extrinsic noises and studied their biological mechanisms and evolution consequences. Next, in Chapter 3, I move forward to consider gene expression noises for pairs of genes simultaneously. I discovered chromosome-wide co-fluctuation in expression for linked genes, which is partly due to chromatin co-accessibilities of linked loci attributable to three-dimensional proximity. I further found that genes encoding components of the same protein complex are more likely to become linked during evolution due to natural selection for intracellular among-component dosage balance. Thus, selection for mitigating the harm of expression noise drives the nonrandom genomic distributions of genes. Finally, in Chapter 4, I studied yet another kind of expression error: mistranslation. I focused on the relationship between mistranslation and codon usage. Specifically, I provide the first direct and global evidence for a prominent but unresolved hypothesis: preferred codons are translated more accurately. Furthermore, I showed that this proposition is generally true across three domains of life. Interestingly, the relative translational accuracies of synonymous codons vary drastically among species, which is mainly explained by the variation of tRNA compositions. Together with other information, these findings suggest that codon usage coevolves with the cellular tRNA pool to maximize translational accuracy and efficiency. In conclusion, my dissertation documents the genome-wide patterns of gene expression errors and demonstrates their profound impacts on both molecular and phenotypic evolution. The knowledge gained has implications beyond expression errors because of the universality of molecular errors in cellular life.PHDEcology and Evolutionary BiologyUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169993/1/mengysun_1.pd

    Experimental Illumination of Comprehensive Fitness Landscapes: A Dissertation

    Get PDF
    Evolution is the single cohesive logical framework in which all biological processes may exist simultaneously. Incremental changes in phenotype over imperceptibly large timescales have given rise to the enormous diversity of life we witness on earth both presently and through the natural record. The basic unit of evolution is mutation, and by perturbing biological processes, mutations may alter the fitness of an individual. However, the fitness effect of a mutation is difficult to infer from historical record, and complex to obtain experimentally in an efficient and accurate manner. We have recently developed a high throughput method to iteratively mutagenize regions of essential genes in yeast and subsequently analyze individual mutant fitness termed Exceedingly Methodical and Parallel Investigation of Randomized Individual Codons (EMPIRIC). Utilizing this technique as exemplified in Chapters II and III, it is possible to determine the fitness effects of all possible point mutations in parallel through growth competition followed by a high throughput sequencing readout. We have employed this technique to determine the distribution of fitness effects in a nine amino acid region of the Hsp90 gene of S. cerevisiae under elevated temperature, and found the bimodal distribution of fitness effects to be remarkably consistent with near-neutral theory. Comparing the measured fitness effects of mutants to the natural record, phylogenetic alignments appear to be a poor predictor of experimental fitness. In Chapter IV, to further interrogate the properties of this region, library competition under conditions of elevated temperature and salinity were performed to study the potential of protein adaptation. Strikingly, whereas both optimal and elevated temperatures produced no statistically significant beneficial mutations, under conditions of elevated salinity, adaptive mutations appear with fitness advantages up to 8% greater than wild type. Of particular interest, mutations conferring fitness benefits under conditions of elevated salinity almost always experience a fitness defect in other experimental conditions, indicating these mutations are environmentally specialized. Applying the experimental fitness measurements to long standing theoretical predictions of adaptation, our results are remarkably consistent with Fisher’s Geometric Model of protein evolution. Epistasis between mutations can have profound effects on evolutionary trajectories. Although the importance of epistasis has been realized since the early 1900s, the interdependence of mutations is difficult to study in vivo due to the stochastic and constant nature of background mutations. In Chapter V, utilizing the EMPIRIC methodology allows us to study the distribution of fitness effects in the context of mutant genetic backgrounds with minimal influence from unintended background mutations. By analyzing intragenic epistatic interactions, we uncovered a complex interplay between solvent shielded structural residues and solvent exposed hydrophobic surface in the amino acid 582-590 region of Hsp90. Additionally, negative epistasis appears to be negatively correlated with mutational promiscuity while additive interactions are positively correlated, indicating potential avenues for proteins to navigate fitness ‘valleys’. In summary, the work presented in this dissertation is focused on applying experimental context to the theory-rich field of evolutionary biology. The development and implementation of a novel methodology for the rapid and accurate assessment of organismal fitness has allowed us to address some of the most basic processes of evolution including adaptation and protein expression level. Through the work presented here and by investigators across the world, the application of experimental data to evolutionary theory has the potential to improve drug design and human health in general, as well as allow for predictive medicine in the coming era of personalized medicine
    • …
    corecore