19 research outputs found

    A reference map of potential determinants for the human serum metabolome

    Get PDF
    The serum metabolome contains a plethora of biomarkers and causative agents of various diseases, some of which are endogenously produced and some that have been taken up from the environment(1). The origins of specific compounds are known, including metabolites that are highly heritable(2,3), or those that are influenced by the gut microbiome(4), by lifestyle choices such as smoking(5), or by diet(6). However, the key determinants of most metabolites are still poorly understood. Here we measured the levels of 1,251 metabolites in serum samples from a unique and deeply phenotyped healthy human cohort of 491 individuals. We applied machine-learning algorithms to predict metabolite levels in held-out individuals on the basis of host genetics, gut microbiome, clinical parameters, diet, lifestyle and anthropometric measurements, and obtained statistically significant predictions for more than 76% of the profiled metabolites. Diet and microbiome had the strongest predictive power, and each explained hundreds of metabolites-in some cases, explaining more than 50% of the observed variance. We further validated microbiome-related predictions by showing a high replication rate in two geographically independent cohorts(7,8) that were not available to us when we trained the algorithms. We used feature attribution analysis(9) to reveal specific dietary and bacterial interactions. We further demonstrate that some of these interactions might be causal, as some metabolites that we predicted to be positively associated with bread were found to increase after a randomized clinical trial of bread intervention. Overall, our results reveal potential determinants of more than 800 metabolites, paving the way towards a mechanistic understanding of alterations in metabolites under different conditions and to designing interventions for manipulating the levels of circulating metabolites.The levels of 1,251 metabolites are measured in 475 phenotyped individuals, and machine-learning algorithms reveal that diet and the microbiome are the determinants with the strongest predictive power for the levels of these metabolites

    Large-scale mapping of gene regulatory logic reveals context-dependent repression by transcriptional activators

    No full text
    Transcription factors (TFs) are key mediators that propagate extracellular and intracellular signals through to changes in gene expression profiles. However, the rules by which promoters decode the amount of active TF into target gene expression are not well understood. To determine the mapping between promoter DNA sequence, TF concentration, and gene expression output, we have conducted in budding yeast a large-scale measurement of the activity of thousands of designed promoters at six different levels of TF. We observe that maximum promoter activity is determined by TF concentration and not by the number of binding sites. Surprisingly, the addition of an activator site often reduces expression. A thermodynamic model that incorporates competition between neighboring binding sites for a local pool of TF molecules explains this behavior and accurately predicts both absolute expression and the amount by which addition of a site increases or reduces expression. Taken together, our findings support a model in which neighboring binding sites interact competitively when TF is limiting but otherwise act additively.This work was supported by the Spanish Ministerio de Economía y Competitividad and FEDER through project BFU2015-68351-P to L.B.C. and by grant 2014SGR0974 from the Agència de Gestió d’Ajuts Universitaris i de Recerca (AGAUR) to L.B.C. This work was supported by grants from the European Research Council (ERC) and the US National Institutes of Health (NIH) to E.S. D.vD. was supported by Nederlandse Organisatie voor Wetenschappelijk Onderzoek (NWO) Rubicon fellowship 825.14.016

    Large-scale mapping of gene regulatory logic reveals context-dependent repression by transcriptional activators

    No full text
    Transcription factors (TFs) are key mediators that propagate extracellular and intracellular signals through to changes in gene expression profiles. However, the rules by which promoters decode the amount of active TF into target gene expression are not well understood. To determine the mapping between promoter DNA sequence, TF concentration, and gene expression output, we have conducted in budding yeast a large-scale measurement of the activity of thousands of designed promoters at six different levels of TF. We observe that maximum promoter activity is determined by TF concentration and not by the number of binding sites. Surprisingly, the addition of an activator site often reduces expression. A thermodynamic model that incorporates competition between neighboring binding sites for a local pool of TF molecules explains this behavior and accurately predicts both absolute expression and the amount by which addition of a site increases or reduces expression. Taken together, our findings support a model in which neighboring binding sites interact competitively when TF is limiting but otherwise act additively.This work was supported by the Spanish Ministerio de Economía y Competitividad and FEDER through project BFU2015-68351-P to L.B.C. and by grant 2014SGR0974 from the Agència de Gestió d’Ajuts Universitaris i de Recerca (AGAUR) to L.B.C. This work was supported by grants from the European Research Council (ERC) and the US National Institutes of Health (NIH) to E.S. D.vD. was supported by Nederlandse Organisatie voor Wetenschappelijk Onderzoek (NWO) Rubicon fellowship 825.14.016

    Systematic Dissection of the Sequence Determinants of Gene 3’ End Mediated Expression Control

    No full text
    <div><p>The 3’end genomic region encodes a wide range of regulatory process including mRNA stability, 3’ end processing and translation. Here, we systematically investigate the sequence determinants of 3’ end mediated expression control by measuring the effect of 13,000 designed 3’ end sequence variants on constitutive expression levels in yeast. By including a high resolution scanning mutagenesis of more than 200 native 3’ end sequences in this designed set, we found that most mutations had only a mild effect on expression, and that the vast majority (~90%) of strongly effecting mutations localized to a single positive TA-rich element, similar to a previously described 3’ end processing efficiency element, and resulted in up to ten-fold decrease in expression. Measurements of 3’ UTR lengths revealed that these mutations result in mRNAs with aberrantly long 3’UTRs, confirming the role for this element in 3’ end processing. Interestingly, we found that other sequence elements that were previously described in the literature to be part of the polyadenylation signal had a minor effect on expression. We further characterize the sequence specificities of the TA-rich element using additional synthetic 3’ end sequences and show that its activity is sensitive to single base pair mutations and strongly depends on the A/T content of the surrounding sequences. Finally, using a computational model, we show that the strength of this element in native 3’ end sequences can explain some of their measured expression variability (R = 0.41). Together, our results emphasize the importance of efficient 3’ end processing for endogenous protein levels and contribute to an improved understanding of the sequence elements involved in this process.</p></div

    Sequence determinants of 3’ end functional elements.

    No full text
    <p><b>(A)</b> Heat map showing the mean effect of a mutation as a function of location in the 3’ end sequence. Each row represents one sequence and the color represents the mean expression fold change across two replicates between the mutated and wild type sequences. Rows are sorted by the location of the maximal affecting mutation. <b>(B)</b> Heat map of predicted logistic values on a held-out test set (see main text and methods). Location of subsequences correspond to those in Fig 3A. <b>(C)</b> Frequency of AT dinucleotide, highest weighted feature in the inferred model, in sliding windows of 20bp. Location of subsequences correspond to those in Fig 3A. <b>(D)</b> Table of the features that contribute most to the classification. Color represents the mean coefficient across the 10 cross validation partitions. For each possible mono/di-nucleotide three types of features were considered: ‘[0|1]’ – a binary feature that is one if the specified mono/di-nucleotide occurs at least once in the sequence and zero otherwise, ‘#’ – a counter of the number that the specified mono/di-nucleotide occurs in the sequence. ‘%’ percent of nucleotides of the sequence that are part of an occurrence of the specified mono/di-nucleotide. <b>(E)</b> DNA sequence motif found to be enriched in the positive subsequence instances. <b>(F)</b> Distribution of distances between the location (center) of the mutation that resulted in the maximal reduction in expression and the location of the main polyadenylation site for the wild type sequence. <b>(G)</b> Results of YFP specific 3’ RACE, where each lane represents 4 expression bins. Lowest lane displays long aberrant 3’UTRs not apparent in the higher expression bins.</p

    Structural variation in the gut microbiome associates with host health

    Get PDF
    Differences in the presence of even a few genes between otherwise identical bacterial strains may result in critical phenotypic differences. Here we systematically identify microbial genomic structural variants (SVs) and find them to be prevalent in the human gut microbiome across phyla and to replicate in different cohorts. SVs are enriched for CRISPR-associated and antibiotic-producing functions and depleted from housekeeping genes, suggesting that they have a role in microbial adaptation. We find multiple associations between SVs and host disease risk factors, many of which replicate in an independent cohort. Exploring genes that are clustered in the same SV, we uncover several possible mechanistic links between the microbiome and its host, including a region in Anaerostipes hadrus that encodes a composite inositol catabolism-butyrate biosynthesis pathway, the presence of which is associated with lower host metabolic disease risk. Overall, our results uncover a nascent layer of variability in the microbiome that is associated with microbial adaptation and host health

    Compensation for differences in gene copy number among yeast ribosomal proteins is encoded within their promoters

    No full text
    Coordinate regulation of ribosomal protein (RP) genes is key for controlling cell growth. In yeast, it is unclear how this regulation achieves the required equimolar amounts of the different RP components, given that some RP genes exist in duplicate copies, while others have only one copy. Here, we tested whether the solution to this challenge is partly encoded within the DNA sequence of the RP promoters, by fusing 110 different RP promoters to a fluorescent gene reporter, allowing us to robustly detect differences in their promoter activities that are as small as ∼10%. We found that single-copy RP promoters have significantly higher activities, suggesting that proper RP stoichiometry is indeed partly encoded within the RP promoters. Notably, we also partially uncovered how this regulation is encoded by finding that RP promoters with higher activity have more nucleosome-disfavoring sequences and characteristic spatial organizations of these sequences and of binding sites for key RP regulators. Mutations in these elements result in a significant decrease of RP promoter activity. Thus, our results suggest that intrinsic (DNA-dependent) nucleosome organization may be a key mechanism by which genomes encode biologically meaningful promoter activities. Our approach can readily be applied to uncover how transcriptional programs of other promoters are encoded

    Prediction of polyadenylation signals in native sequences.

    No full text
    <p><b>(A)</b> Native sequences are aligned by the main polyadenylation site and ordered by the expression values (right panel). The color indicates the predicted logistic values using the classifier learned on the scanning mutagenesis set. The lower panel shows the mean predicted logistic in a 20bp sliding window (centered) relative to the polyadenylation site. <b>(B)</b> Mean predicted logistic in a 20 bp window, centered around the peak from Fig 4A on the y-axis versus expression levels in the x-axis. The red line shows a smoothing line with 50 instances window.</p

    Systematic mutagenesis of a designed synthetic terminator.

    No full text
    <p><b>(A)</b> Illustration of the construct design: a minimal terminator sequence was embedded within a mutated non-terminating 3’ end sequence from the CYC1-512 3’ end region. <b>(B)</b> All possible single bp mutations in the three elements EE, PE and cleavage on the left, middle and right panels, respectively. Boxes on the left of each panel show the mutated sequences with a highlighted white letter representing the location and exact mutation relative to the wild type sequence shown on the top. Bars show the expression value of each sequence. <b>(C)</b> Expression as a function of context A/T content. Each point represents a mutated sequence with A/T content of the relevant sequence region on the x-axis and expression on the y-axis. Black points show the expression of the non-mutated sequence with different barcodes. Mutated regions are: (1) upstream to EE (2) between EE to PE (3) between PE to cleavage and (4) downstream to cleavage, corresponding to the panels from left to right.</p

    Illustration of our method and overall expression distribution.

    No full text
    <p><b>(A)</b> 13,000 designed synthetic sequences were ligated into a low copy plasmid (top part). The plasmid pool was then transformed into yeast to create a heterogeneous pool of yeast cells each expressing YFP to a different level corresponding to one of the unique 13,000 cloned 3’ end sequences. The cells were then sorted using fluorescence activated sorting (FACS) into 16 expression bins by the YFP/mCherry ratio (middle). Next, the reporter 3’ end sequences of cells in each bin were amplified, using bar coded primers for each bin, and sequence barcodes was recovered using next-generation sequencing (NGS). Finally, each sequencing read was mapped to a specific 3’ end sequence and a specific bin (bottom) to achieve the distribution of cells with each synthetic 3’ end sequence across the expression bins. The distribution of each construct was fit to a gamma distribution and the mean expression value was inferred based on this fit. <b>(B)</b> The distribution of library expression values in induced and un-induced promoter states. The induced state displays a tri-modal distribution with 3 peaks corresponding to (1) non-induced promoter state (2) induced promoter state and low expressing 3’ end sequences and (3) induced promoter state with a wide range of 3’ end mediated expression.</p
    corecore