38 research outputs found

    Systematic mutagenesis of a designed synthetic terminator.

    No full text
    <p><b>(A)</b> Illustration of the construct design: a minimal terminator sequence was embedded within a mutated non-terminating 3’ end sequence from the CYC1-512 3’ end region. <b>(B)</b> All possible single bp mutations in the three elements EE, PE and cleavage on the left, middle and right panels, respectively. Boxes on the left of each panel show the mutated sequences with a highlighted white letter representing the location and exact mutation relative to the wild type sequence shown on the top. Bars show the expression value of each sequence. <b>(C)</b> Expression as a function of context A/T content. Each point represents a mutated sequence with A/T content of the relevant sequence region on the x-axis and expression on the y-axis. Black points show the expression of the non-mutated sequence with different barcodes. Mutated regions are: (1) upstream to EE (2) between EE to PE (3) between PE to cleavage and (4) downstream to cleavage, corresponding to the panels from left to right.</p

    Systematic Dissection of the Sequence Determinants of Gene 3’ End Mediated Expression Control

    No full text
    <div><p>The 3’end genomic region encodes a wide range of regulatory process including mRNA stability, 3’ end processing and translation. Here, we systematically investigate the sequence determinants of 3’ end mediated expression control by measuring the effect of 13,000 designed 3’ end sequence variants on constitutive expression levels in yeast. By including a high resolution scanning mutagenesis of more than 200 native 3’ end sequences in this designed set, we found that most mutations had only a mild effect on expression, and that the vast majority (~90%) of strongly effecting mutations localized to a single positive TA-rich element, similar to a previously described 3’ end processing efficiency element, and resulted in up to ten-fold decrease in expression. Measurements of 3’ UTR lengths revealed that these mutations result in mRNAs with aberrantly long 3’UTRs, confirming the role for this element in 3’ end processing. Interestingly, we found that other sequence elements that were previously described in the literature to be part of the polyadenylation signal had a minor effect on expression. We further characterize the sequence specificities of the TA-rich element using additional synthetic 3’ end sequences and show that its activity is sensitive to single base pair mutations and strongly depends on the A/T content of the surrounding sequences. Finally, using a computational model, we show that the strength of this element in native 3’ end sequences can explain some of their measured expression variability (R = 0.41). Together, our results emphasize the importance of efficient 3’ end processing for endogenous protein levels and contribute to an improved understanding of the sequence elements involved in this process.</p></div

    Scanning mutagenesis of native 3’ end sequences reveals critical elements required to maintain expression.

    No full text
    <p><b>(A)</b> Illustration of the two scanning mutagenesis strategies used, in the upper panel two 10bp mutation windows were designed with non-overlapping 10bp steps. In the lower panel 9bp mutation windows were designed with overlapping 3bp steps. <b>(B)</b> Profile of the effect of mutations as a function of location for two genes: CDC24 and YTA5. Y-axis shows the expression log<sub>2</sub> fold change compared to the wild type sequence with each point representing a single 10bp mutation window centered around the corresponding x-axis value relative to the stop codon. The gray line connects the average of each pair of mutations. <b>(C)</b> Distribution of log2 fold ratio between mutated and wild type 3’ end sequences showing a highly skewed distribution towards negative values. <b>(D)</b> Distribution of absolute expression values (a.u.) for non-mutated native 3’ end sequences (dark red) and mutated 3’ end sequences (gray). For the mutated sequences, the mutation that resulted in the largest reduction in expression was chosen for each native sequence.</p

    Prediction of polyadenylation signals in native sequences.

    No full text
    <p><b>(A)</b> Native sequences are aligned by the main polyadenylation site and ordered by the expression values (right panel). The color indicates the predicted logistic values using the classifier learned on the scanning mutagenesis set. The lower panel shows the mean predicted logistic in a 20bp sliding window (centered) relative to the polyadenylation site. <b>(B)</b> Mean predicted logistic in a 20 bp window, centered around the peak from Fig 4A on the y-axis versus expression levels in the x-axis. The red line shows a smoothing line with 50 instances window.</p

    Sequence determinants of 3’ end functional elements.

    No full text
    <p><b>(A)</b> Heat map showing the mean effect of a mutation as a function of location in the 3’ end sequence. Each row represents one sequence and the color represents the mean expression fold change across two replicates between the mutated and wild type sequences. Rows are sorted by the location of the maximal affecting mutation. <b>(B)</b> Heat map of predicted logistic values on a held-out test set (see main text and methods). Location of subsequences correspond to those in Fig 3A. <b>(C)</b> Frequency of AT dinucleotide, highest weighted feature in the inferred model, in sliding windows of 20bp. Location of subsequences correspond to those in Fig 3A. <b>(D)</b> Table of the features that contribute most to the classification. Color represents the mean coefficient across the 10 cross validation partitions. For each possible mono/di-nucleotide three types of features were considered: ‘[0|1]’ – a binary feature that is one if the specified mono/di-nucleotide occurs at least once in the sequence and zero otherwise, ‘#’ – a counter of the number that the specified mono/di-nucleotide occurs in the sequence. ‘%’ percent of nucleotides of the sequence that are part of an occurrence of the specified mono/di-nucleotide. <b>(E)</b> DNA sequence motif found to be enriched in the positive subsequence instances. <b>(F)</b> Distribution of distances between the location (center) of the mutation that resulted in the maximal reduction in expression and the location of the main polyadenylation site for the wild type sequence. <b>(G)</b> Results of YFP specific 3’ RACE, where each lane represents 4 expression bins. Lowest lane displays long aberrant 3’UTRs not apparent in the higher expression bins.</p

    Illustration of our method and overall expression distribution.

    No full text
    <p><b>(A)</b> 13,000 designed synthetic sequences were ligated into a low copy plasmid (top part). The plasmid pool was then transformed into yeast to create a heterogeneous pool of yeast cells each expressing YFP to a different level corresponding to one of the unique 13,000 cloned 3’ end sequences. The cells were then sorted using fluorescence activated sorting (FACS) into 16 expression bins by the YFP/mCherry ratio (middle). Next, the reporter 3’ end sequences of cells in each bin were amplified, using bar coded primers for each bin, and sequence barcodes was recovered using next-generation sequencing (NGS). Finally, each sequencing read was mapped to a specific 3’ end sequence and a specific bin (bottom) to achieve the distribution of cells with each synthetic 3’ end sequence across the expression bins. The distribution of each construct was fit to a gamma distribution and the mean expression value was inferred based on this fit. <b>(B)</b> The distribution of library expression values in induced and un-induced promoter states. The induced state displays a tri-modal distribution with 3 peaks corresponding to (1) non-induced promoter state (2) induced promoter state and low expressing 3’ end sequences and (3) induced promoter state with a wide range of 3’ end mediated expression.</p

    YFP expression is correlated with noise strength.

    No full text
    <p>(<b>A</b>) For several different galactose concentrations (represented by different colors), shown is the YFP expression of each 3′ end library strain (x-axis) versus its noise (y-axis, expression variance divided by mean expression squared). Each point represents the noise computed from single cell flow cytometry measurements of the corresponding 3′ end strain. (<b>B</b>) Same as panel (A) only with noise strength (expression variance divided by mean expression) on the y-axis.</p

    Illustration of the master strain and library construction procedure.

    No full text
    <p>A master strain was constructed such that it will contain two main constructs in the HIS deletion locus: a constant control construct with mCherry driven by the TEF2 promoter and terminated by a constant ADH1 terminator; and a test construct with a YFP gene driven by Gal1/10 promoter. Following master strain construction, a library of PCR products containing the downstream intergenic regions of 85 tested genes was amplified from the genome by PCR and extended to also contain the URA3 promoter and start codon. This library of DNA sequences was then integrated into the master strain such that only integrations in the exact genomic location would result in an intact selection marker.</p

    The effect of 3′ end sequences on expression is large and is correlated with endogenous mRNA levels.

    No full text
    <p>(<b>A</b>) Dynamic range of YFP levels of library strains at different galactose induction levels. YFP production per cell per second was measured and calculated in different Galactose concentrations resulting in different promoter activation levels for all library strains at every galactose concentration. Shown are YFP measurements of the 3′ end library strains. Note that the ratio between the highest and lowest strain at the highest induction level (0.1% galactose) shows a fold difference of more than 10-fold. (<b>B</b>) Comparison of the span of expression values between promoter and 3′ end strains for the same group of genes. A box plot is added to show the difference in IQR between the groups. (<b>C</b>) Comparison of YFP levels in the 3′ end library (y-axis) with endogenous mRNA levels measured by RNA-seq (x-axis). The Pearson correlation is given (inset). (<b>D</b>) Same as (C) but for a different strain library in which promoters of the same respective genes are fused to a YFP reporter.</p

    Effect of the 3′ end sequences on YFP accumulation in batch measurements.

    No full text
    <p>(<b>A</b>) YFP measurements of clones with three different 3′ end sequences. Shown are YFP measurements of three different strains, each with a unique 3′ end sequence. Lines of the same color represent measurements of different clones from the same type of 3′ end sequence, demonstrating that the effect of the different constructs on expression is above the variability of our experimental system. The lowest expressing strain (red) contains the COX17 3′ end and serves as a positive control for our experimental system. (<b>B,C,D</b>) Plate fluorometer measurements over time. Following inoculation of the cells in a fresh media containing 2% galactose, optical density (OD), mCherry and YFP are measured over time (B,C and D respectively). Note that as expected, OD and mCherry measurements remain highly similar between different library strains, while YFP expression varies considerably.</p
    corecore