3 research outputs found

    Modeling intron features uncovers design principles and allows the prediction of gene expression in a synthetic system.

    No full text
    <p>A) Sequence based predictor of gene expression assembly process: In every iteration the feature contributing the highest correlation to the reporter expression measurements was added. The first eight features and their description are presented. B) Bar diagram of the predictor's cumulative correlation with expression levels of YiFP variants as a function of the number of added features. C) A predictor function based on 3, 13, or 38 features was able to explain 49%, 77% and 90% of gene expression variation, respectively. (for 13 features: p<2.2e-16; empirical p<5e-03); D) Cross validation of the predictor assembly method using training and test sets, with 80% and 20% of introns respectively, demonstrated a predictive power of 50% (for >15 features: 0.37</p

    Heuristic for Maximizing DNA Reuse in Synthetic DNA Library Assembly

    No full text
    <i>De novo</i> DNA synthesis is in need of new ideas for increasing production rate and reducing cost. DNA reuse in combinatorial library construction is one such idea. Here, we describe an algorithm for planning multistage assembly of DNA libraries with shared intermediates that greedily attempts to maximize DNA reuse, and show both theoretically and empirically that it runs in linear time. We compare solution quality and algorithmic performance to the best results reported for computing DNA assembly graphs, finding that our algorithm achieves solutions of equivalent quality but with dramatically shorter running times and substantially improved scalability. We also show that the related computational problem <i>bounded-depth min-cost string production</i> (BDMSP), which captures DNA library assembly operations with a simplified cost model, is NP-hard and APX-hard by reduction from vertex cover. The algorithm presented here provides solutions of near-minimal stages and thanks to almost instantaneous planning of DNA libraries it can be used as a metric of ″manufacturability″ to guide DNA library design. Rapid planning remains applicable even for DNA library sizes vastly exceeding today’s biochemical assembly methods, future-proofing our method

    Rationally designed, heterologous <i>S. cerevisiae</i> transcripts expose novel expression determinants

    No full text
    <div><p>Deducing generic causal relations between RNA transcript features and protein expression profiles from endogenous gene expression data remains a major unsolved problem in biology. The analysis of gene expression from heterologous genes contributes significantly to solving this problem, but has been heavily biased toward the study of the effect of 5′ transcript regions and to prokaryotes. Here, we employ a synthetic biology driven approach that systematically differentiates the effect of different regions of the transcript on gene expression up to 240 nucleotides into the ORF. This enabled us to discover new causal effects between features in previously unexplored regions of transcripts, and gene expression in natural regimes. We rationally designed, constructed, and analyzed 383 gene variants of the viral <i>HRSVgp04</i> gene ORF, with multiple synonymous mutations at key positions along the transcript in the eukaryote <i>S. cerevisiae</i>. Our results show that a few silent mutations at the 5′UTR can have a dramatic effect of up to 15 fold change on protein levels, and that even synonymous mutations in positions more than 120 nucleotides downstream from the ORF 5′end can modulate protein levels up to 160%–300%. We demonstrate that the correlation between protein levels and folding energy increases with the significance of the level of selection of the latter in endogenous genes, reinforcing the notion that selection for folding strength in different parts of the ORF is related to translation regulation. Our measured protein abundance correlates notably(correlation up to r = 0.62 (p=0.0013)) with mean relative codon decoding times, based on ribosomal densities (Ribo-Seq) in endogenous genes, supporting the conjecture that translation elongation and adaptation to the tRNA pool can modify protein levels in a causal/direct manner. This report provides an improved understanding of transcript evolution, design principles of gene expression regulation, and suggests simple rules for engineering synthetic gene expression in eukaryotes.</p></div
    corecore