47 research outputs found

    The Coevolution of Phycobilisomes: Molecular Structure Adapting to Functional Evolution

    Get PDF
    Phycobilisome is the major light-harvesting complex in cyanobacteria and red alga. It consists of phycobiliproteins and their associated linker peptides which play key role in absorption and unidirectional transfer of light energy and the stability of the whole complex system, respectively. Former researches on the evolution among PBPs and linker peptides had mainly focused on the phylogenetic analysis and selective evolution. Coevolution is the change that the conformation of one residue is interrupted by mutation and a compensatory change selected for in its interacting partner. Here, coevolutionary analysis of allophycocyanin, phycocyanin, and phycoerythrin and covariation analysis of linker peptides were performed. Coevolution analyses reveal that these sites are significantly correlated, showing strong evidence of the functional and structural importance of interactions among these residues. According to interprotein coevolution analysis, less interaction was found between PBPs and linker peptides. Our results also revealed the correlations between the coevolution and adaptive selection in PBS were not directly related, but probably demonstrated by the sites coupled under physical-chemical interactions

    A flexible integrative approach based on random forest improves prediction of transcription factor binding sites

    Get PDF
    Transcription factor binding sites (TFBSs) are DNA sequences of 6-15 base pairs. Interaction of these TFBSs with transcription factors (TFs) is largely responsible for most spatiotemporal gene expression patterns. Here, we evaluate to what extent sequence-based prediction of TFBSs can be improved by taking into account the positional dependencies of nucleotides (NPDs) and the nucleotide sequence-dependent structure of DNA. We make use of the random forest algorithm to flexibly exploit both types of information. Results in this study show that both the structural method and the NPD method can be valuable for the prediction of TFBSs. Moreover, their predictive values seem to be complementary, even to the widely used position weight matrix (PWM) method. This led us to combine all three methods. Results obtained for five eukaryotic TFs with different DNA-binding domains show that our method improves classification accuracy for all five eukaryotic TFs compared with other approaches. Additionally, we contrast the results of seven smaller prokaryotic sets with high-quality data and show that with the use of high-quality data we can significantly improve prediction performance. Models developed in this study can be of great use for gaining insight into the mechanisms of TF binding

    Seeing the forest for the trees : retrieving plant secondary biochemical pathways from metabolome networks

    Get PDF
    Over the last decade, a giant leap forward has been made in resolving the main bottleneck in metabolomics, i.e., the structural characterization of the many unknowns. This has led to the next challenge in this research field: retrieving biochemical pathway information from the various types of networks that can be constructed from metabolome data. Searching putative biochemical pathways, referred to as biotransformation paths, is complicated because several flaws occur during the construction of metabolome networks. Multiple network analysis tools have been developed to deal with these flaws, while in silico retrosynthesis is appearing as an alternative approach. In this review, the different types of metabolome networks, their flaws, and the various tools to trace these biotransformation paths are discussed

    Cancer proteogenomics : connecting genotype to molecular phenotype

    Get PDF
    The central dogma of molecular biology describes the one-way road from DNA to RNA and finally to protein. Yet, how this flow of information encoded in DNA as genes (genotype) is regulated in order to produce the observable traits of an individual (phenotype) remains unanswered. Recent advances in high-throughput data, i.e., ‘omics’, have allowed the quantification of DNA, RNA and protein levels leading to integrative analyses that essentially probe the central dogma along all of its constituent molecules. Evidence from these analyses suggest that mRNA abundances are at best a moderate proxy for proteins which are the main functional units of cells and thus closer to the phenotype. Cancer proteogenomic studies consider the ensemble of proteins, the so-called proteome, as the readout of the functional molecular phenotype to investigate its influence by upstream events, for example DNA copy number alterations. In typical proteogenomic studies, however, the identified proteome is a simplification of its actual composition, as they methodologically disregard events such as splicing, proteolytic cleavage and post-translational modifications that generate unique protein species – proteoforms. The scope of this thesis is to study the proteome diversity in terms of: a) the complex genetic background of three tumor types, i.e. breast cancer, childhood acute lymphoblastic leukemia and lung cancer, and b) the proteoform composition, describing a computational method for detecting protein species based on their distinct quantitative profiles. In Paper I, we present a proteogenomic landscape of 45 breast cancer samples representative of the five PAM50 intrinsic subtypes. We studied the effect of copy number alterations (CNA) on mRNA and protein levels, overlaying a public dataset of drug- perturbed protein degradation. In Paper II, we describe a proteogenomic analysis of 27 B-cell precursor acute lymphoblastic leukemia clinical samples that compares high hyperdiploid versus ETV6/RUNX1-positive cases. We examined the impact of the amplified chromosomes on mRNA and protein abundance, specifically the linear trend between the amplification level and the dosage effect. Moreover, we investigated mRNA-protein quantitative discrepancies with regard to post-transcriptional and post-translational effects such as mRNA/protein stability and miRNA targeting. In Paper III, we describe a proteogenomic cohort of 141 non-small cell lung cancer clinical samples. We used clustering methods to identify six distinct proteome-based subtypes. We integrated the protein abundances in pathways using protein-protein correlation networks, bioinformatically deconvoluted the immune composition and characterized the neoantigen burden. In Paper IV, we developed a pipeline for proteoform detection from bottom-up mass- spectrometry-based proteomics. Using an in-depth proteomics dataset of 18 cancer cell lines, we identified proteoforms related to splice variant peptides supported by RNA-seq data. This thesis adds on the previous literature of proteogenomic studies by analyzing the tumor proteome and its regulation along the flow of the central dogma of molecular biology. It is anticipated that some of these findings would lead to novel insights about tumor biology and set the stage for clinical applications to improve the current cancer patient care

    The European Lake Microbiome: A Study in Complexity

    Get PDF
    While it is known that microbes play many indispensable roles in ecosystems, the relationship between microbiomes and their environment is far from being well-understood. In part, this is the case because the methods necessary for studying environmental microbiomes, such as Next- Generation Sequencing and high-dimensional Machine Learning, have been developed relatively recently. However, the complex nature of ecosystems and environmental microbiomes acts as a further barrier to progress in this field of research. This thesis develops methods and concepts used to gain insight into the ecology of micro- biomes in lakes. It is based around two metabarcoding datasets sampled from lakes in Austria and the whole of Europe, respectively, and attempts to elucidate the microbiome’s relationship to environmental parameters. To this end, a tool for GPS-based dataset enhancement and a ma- chine learning framework for measuring microbiome covariation is developed. Building on this, the latent structure of the microbiome is estimated. In the discussion, a novel theory of informa- tion transmission in complex environments is described. Taken together, the work included herein presents a thorough analysis of the European lake microbiome that takes the complexity of the study object into account. The results point to- wards parameters that act as drivers of lake microbiome structure as well as microorganisms that might act as keystone species for ecosystem functioning. Furthermore, this work might provide the basis for considerable future progress in the study of environmental microbiomes

    Tracing Evolution of Gene Transfer Agents Using Comparative Genomics

    Get PDF
    The accumulating evidence suggest that viruses and their components can be domesticated by their hosts, equipping them with convenient molecular toolkits for various functions. One of such domesticated system is Gene Transfer Agents (GTAs) that are produced by some bacteria and archaea. GTAs morphologically resemble small phage-like particles and contain random fragments of their host genome. They are produced only by a small fraction of the microbial population and are released through a lysis of the host cell. Bioinformatic analyses suggest that GTAs are especially abundant in the taxonomic class of Alphaproteobacteria, where they are vertically inherited and evolve as a part of their host genomes. In this work, we extensively analyze evolutionary patterns of alphaproteobacterial GTAs using comparative genomics, phylogenomics and machine learning methods. We initially develop an algorithm that validate the wide presence of GTA elements in alphaproteobacterial genomes, where they are generally mistaken for prophages due to their homology. Furthermore, we demonstrate that GTAs evolve under the selection that reduces the energetic cost of their production, indicating their importance for the conditions of the nutrient depletion. The genome-wide screenings of translational selection and coevolution signatures highlight the significance of GTAs as a stress-response adaptation for the horizontal gene transfer, revealing a set of previously unknown genes that could play a role in the GTA cycle. As production of GTAs leads to the host death, their maintenance is likely to be under a kin or group level selection. By combining our findings with accumulated body of knowledge, this work proposes a conceptual model illustrating the role of GTAs in bacterial populations and their persistence for hundreds of millions of years of evolution

    Surface Water Photochemistry

    Get PDF

    Genomic Studies of Gene Expression Errors and Their Evolutionary Ramifications

    Full text link
    Gene expression produces biologically functional RNAs and proteins and is essential for life. Nevertheless, gene expression is subject to several types of errors that are generally harmful. Despite the prevalence and significant consequences of expression errors, their genome-wide patterns are not well characterized. Furthermore, the evolutionary ramifications of such errors are poorly understood. In my dissertation, I address the above questions using novel computational approaches. I focus on two types of gene expression errors: (i) stochastic gene expression, which leads to a variation of the expression level among isogenic cells in the same environment (gene expression noise), and (ii) mistranslation, which induces protein misfolding and can be toxic to the cells. My thesis has three main chapters in addition to the introduction and conclusion chapters. First, in Chapter 2, I studied gene expression noises of individual genes. I decomposed noises of 3975 mouse genes into intrinsic noise and extrinsic noises and studied their biological mechanisms and evolution consequences. Next, in Chapter 3, I move forward to consider gene expression noises for pairs of genes simultaneously. I discovered chromosome-wide co-fluctuation in expression for linked genes, which is partly due to chromatin co-accessibilities of linked loci attributable to three-dimensional proximity. I further found that genes encoding components of the same protein complex are more likely to become linked during evolution due to natural selection for intracellular among-component dosage balance. Thus, selection for mitigating the harm of expression noise drives the nonrandom genomic distributions of genes. Finally, in Chapter 4, I studied yet another kind of expression error: mistranslation. I focused on the relationship between mistranslation and codon usage. Specifically, I provide the first direct and global evidence for a prominent but unresolved hypothesis: preferred codons are translated more accurately. Furthermore, I showed that this proposition is generally true across three domains of life. Interestingly, the relative translational accuracies of synonymous codons vary drastically among species, which is mainly explained by the variation of tRNA compositions. Together with other information, these findings suggest that codon usage coevolves with the cellular tRNA pool to maximize translational accuracy and efficiency. In conclusion, my dissertation documents the genome-wide patterns of gene expression errors and demonstrates their profound impacts on both molecular and phenotypic evolution. The knowledge gained has implications beyond expression errors because of the universality of molecular errors in cellular life.PHDEcology and Evolutionary BiologyUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169993/1/mengysun_1.pd

    Psychophysicality: rethinking the physicalist foundations of the mind/body problem

    Get PDF
    In this thesis, I shall examine the question of physicalism through two papers criticising the formulation of the doctrine. In the first chapter, I discuss Tim Carne's and D.H. Mellor's influential (1990) There Is No Question of Physicalism, in which they argue that there are no real criteria by which the science of psychology can be separated from the paradigmatically physical sciences, and so no principled reason to suppose that the predicates of pyschology do not describe real elements of the world's ontology whereas those of physics do. I shall explain why I find their arguments unconvincing, and to show how some of the reasons they consider not to support the noncontinuity of psychology with physics actually can support the distinction. Crane and Mellor take physicalism to be an epistemological doctrine, according to which the empirical world "contains just what a true and complete physical science would say it contains". Physicalism can, however, be taken as a metaphysical doctrine, and indeed I think that many modern physicalists do take it this way. In his (1998) What Are Physical Properties?, Chris Daly argues that no principled distinction can be drawn between physical and nonphysical properties, and that therefore any metaphysical programme which assumes such a distinction is misguided. I shall agree with much of his reasoning, but not with his 'downbeat' conclusion: while I agree that there are serious difficulties involved in setting constraints on the bounds of the physical, I think that enough can positively be said to make physicalism a meaningful position. Between the two papers, a fairly broad survey of some recent accounts of physicalism is made and these two distinct avenues explored: physicalism construed as a doctrine about science, and physicalism as a doctrine attempting to limit the contents of the world a priori through a definition of what it is to be a physical properties. All in all, I think that there is much to learn from these two papers, but not all of it is as negative, conclusive, or 'downbeat' as their authors might have intended. Rather, I think that some new directions are indicated by the failure of some of the avenues they explore
    corecore