45 research outputs found

    Exploring constraints of sequence space in search of optimal enzymes

    Get PDF
    Please click Additional Files below to see the full abstract

    Machine learning to engineer antibody frameworks for developability

    Get PDF
    Monoclonal antibodies (mAbs) have revolutionized medicine in the last 20 years and today represents ~$70B/yr in total pharmaceutical sales, most notably in the areas of oncology and autoimmune disorders. The function and developability of mAbs depend on the expression, folding and integrity of their structure. Protein pharmaceuticals must be tolerant to factors such as heat, interfacial stress, aggregation, pH and more in order to reach the market. We here apply systematic variance, inductive machine learning, synthetic genes and our high-throughput transient mammalian protein expression platform to engineer a humanized IgG1 scaffold for high developability independent of the hypervariable region present in the mAb. All amino acid substitutions present in the framework of human IgG1 antibodies were derived from human sequences in the public domain databases and assembled in a set of 96 partial factorial IgG1 variants (aka ‘infologs’) using Design of Experiment (DoE) variable distribution. Total explored sequence diversity was ~2x1019. Hypervariable regions were derived from two commercial antibodies for a total of 2x96 genetic constructs. Synthesis of the 2x96 antibodies was done by transient transfection in HEK293 cells and purified in high throughput. Several independent machine learning algorithms were compared for cross validation and model accuracy and used to build iterative sequence-function correlation models to identify and quantify independent and/or synergistic variables affecting one or more of the developability functionalities [1]. The study resulted in markedly improved mAbs frameworks as well as a deeper understanding on how different machine learning algorithms are dependent on different types of data sets. Please click Additional Files below to see the full abstract

    Engineering of camel chymosin for improved cheese properties

    Get PDF
    More than 20 Mio tons of cheese are produced world-wide per year. By improving cheese yield and quality through process optimization, the amount of milk needed for manufacturing can be reduced significantly. Chymosin, an aspartic acid protease, is initiating milk coagulation in cheese manufacturing by cleaving off the glycomacropeptide (GMP) from the surface of casein micelles. Non-specific proteolysis of casein molecules by chymosin during this milk clotting process releases soluble peptides into the whey, resulting in protein losses from the cheese. The ratio between specific clotting activity (C) and non-specific proteolysis (P) of a coagulant can therefore be used as predictor for cheese yield. During ripening of the cheese, remaining coagulant continues proteolytic break-down of the caseins with significant impact on cheese properties. While the main proteolytic activity, the release of N-terminal peptides from alphaS1 casein (alphaS1-N), is associated with cheese softening and loss of firmness, cleavage of the C-terminal end of beta casein (beta-C) contributes to unwanted bitterness of the cheese [1]. The chymosin from Bos taurus (bovine chymosin) is traditionally used as milk coagulant in cheese manufacture. However, the homologous enzyme from Camelus dromedarius (camel chymosin) has been shown to be a superior alternative for various cheese types, since it reveals higher specific activity (C) and specificity (C/P) for the milk clotting reaction [2], as well as lower alphaS1 and beta casein proteolysis during ripening (Fig. 1). Please click Additional Files below to see the full abstract

    Gene Designer: a synthetic biology tool for constructing artificial DNA segments

    Get PDF
    BACKGROUND: Direct synthesis of genes is rapidly becoming the most efficient way to make functional genetic constructs and enables applications such as codon optimization, RNAi resistant genes and protein engineering. Here we introduce a software tool that drastically facilitates the design of synthetic genes. RESULTS: Gene Designer is a stand-alone software for fast and easy design of synthetic DNA segments. Users can easily add, edit and combine genetic elements such as promoters, open reading frames and tags through an intuitive drag-and-drop graphic interface and a hierarchical DNA/Protein object map. Using advanced optimization algorithms, open reading frames within the DNA construct can readily be codon optimized for protein expression in any host organism. Gene Designer also includes features such as a real-time sliding calculator of oligonucleotide annealing temperatures, sequencing primer generator, tools for avoidance or inclusion of restriction sites, and options to maximize or minimize sequence identity to a reference. CONCLUSION: Gene Designer is an expandable Synthetic Biology workbench suitable for molecular biologists interested in the de novo creation of genetic constructs

    Multidimensional engineering of Chymosin for efficient cheese production by machine learning guided directed evolution

    Get PDF
    The global cheese market today exceeds $100B/year. Chymosin (a.k.a. rennin) is an aspartic endopeptidase produced by the stomach lining of new-born mammals. During cheese production chymosin is added to the milk where it cleaves the glycomacropeptide (GMP) from the surface of casein micelles to initiate milk coagulation. Current commercial recombinant chymosin enzymes derived from Bos taurus (cow) or Camelus dromedarius (camel) are limited in their proteolytic specificity leading to incomplete milk-to-cheese conversion. Increasing the chymosin specificity for GMP cleavage would significantly decrease the amount of milk needed for cheese production thereby reducing cost and decreasing environmental footprint of the dairy industry. Separate from milk coagulation, chymosin dependent release of N-terminal peptides from alphaS1 casein during cheese ripening leads to unwanted softening, accompanied with cheese loss during industrial processing such as slicing and shredding. Furthermore, chymosin dependent cleavage of the C-terminal end of beta casein contributes to unwanted bitterness of the cheese. Improvement of chymosin proteolytic specificity in both milk coagulation and cheese ripening is consequently of high commercial relevance. Please click Additional Files below to see the full abstract

    SCHEMA Recombination of a Fungal Cellulase Uncovers a Single Mutation That Contributes Markedly to Stability

    Get PDF
    A quantitative linear model accurately (R^2 = 0.88) describes the thermostabilities of 54 characterized members of a family of fungal cellobiohydrolase class II (CBH II) cellulase chimeras made by SCHEMA recombination of three fungal enzymes, demonstrating that the contributions of SCHEMA sequence blocks to stability are predominantly additive. Thirty-one of 31 predicted thermostable CBH II chimeras have thermal inactivation temperatures higher than the most thermostable parent CBH II, from Humicola insolens, and the model predicts that hundreds more CBH II chimeras share this superior thermostability. Eight of eight thermostable chimeras assayed hydrolyze the solid cellulosic substrate Avicel at temperatures at least 5 °C above the most stable parent, and seven of these showed superior activity in 16-h Avicel hydrolysis assays. The sequence-stability model identified a single block of sequence that adds 8.5 °C to chimera thermostability. Mutating individual residues in this block identified the C313S substitution as responsible for the entire thermostabilizing effect. Introducing this mutation into the two recombination parent CBH IIs not featuring it (Hypocrea jecorina and H. insolens) decreased inactivation, increased maximum Avicel hydrolysis temperature, and improved long time hydrolysis performance. This mutation also stabilized and improved Avicel hydrolysis by Phanerochaete chrysosporium CBH II, which is only 55–56% identical to recombination parent CBH IIs. Furthermore, the C313S mutation increased total H. jecorina CBH II activity secreted by the Saccharomyces cerevisiae expression host more than 10-fold. Our results show that SCHEMA structure-guided recombination enables quantitative prediction of cellulase chimera thermostability and efficient identification of stabilizing mutations

    Engineering proteinase K using machine learning and synthetic genes

    Get PDF
    BACKGROUND: Altering a protein's function by changing its sequence allows natural proteins to be converted into useful molecular tools. Current protein engineering methods are limited by a lack of high throughput physical or computational tests that can accurately predict protein activity under conditions relevant to its final application. Here we describe a new synthetic biology approach to protein engineering that avoids these limitations by combining high throughput gene synthesis with machine learning-based design algorithms. RESULTS: We selected 24 amino acid substitutions to make in proteinase K from alignments of homologous sequences. We then designed and synthesized 59 specific proteinase K variants containing different combinations of the selected substitutions. The 59 variants were tested for their ability to hydrolyze a tetrapeptide substrate after the enzyme was first heated to 68°C for 5 minutes. Sequence and activity data was analyzed using machine learning algorithms. This analysis was used to design a new set of variants predicted to have increased activity over the training set, that were then synthesized and tested. By performing two cycles of machine learning analysis and variant design we obtained 20-fold improved proteinase K variants while only testing a total of 95 variant enzymes. CONCLUSION: The number of protein variants that must be tested to obtain significant functional improvements determines the type of tests that can be performed. Protein engineers wishing to modify the property of a protein to shrink tumours or catalyze chemical reactions under industrial conditions have until now been forced to accept high throughput surrogate screens to measure protein properties that they hope will correlate with the functionalities that they intend to modify. By reducing the number of variants that must be tested to fewer than 100, machine learning algorithms make it possible to use more complex and expensive tests so that only protein properties that are directly relevant to the desired application need to be measured. Protein design algorithms that only require the testing of a small number of variants represent a significant step towards a generic, resource-optimized protein engineering process

    Engineering the Salmonella type III secretion system to export spider silk monomers

    Get PDF
    The type III secretion system (T3SS) exports proteins from the cytoplasm, through both the inner and outer membranes, to the external environment. Here, a system is constructed to harness the T3SS encoded within Salmonella Pathogeneity Island 1 to export proteins of biotechnological interest. The system is composed of an operon containing the target protein fused to an N-terminal secretion tag and its cognate chaperone. Transcription is controlled by a genetic circuit that only turns on when the cell is actively secreting protein. The system is refined using a small human protein (DH domain) and demonstrated by exporting three silk monomers (ADF-1, -2, and -3), representative of different types of spider silk. Synthetic genes encoding silk monomers were designed to enhance genetic stability and codon usage, constructed by automated DNA synthesis, and cloned into the secretion control system. Secretion rates up to 1.8 mg l−1 h−1 are demonstrated with up to 14% of expressed protein secreted. This work introduces new parts to control protein secretion in Gram-negative bacteria, which will be broadly applicable to problems in biotechnology

    Design Parameters to Control Synthetic Gene Expression in Escherichia coli

    Get PDF
    BACKGROUND:Production of proteins as therapeutic agents, research reagents and molecular tools frequently depends on expression in heterologous hosts. Synthetic genes are increasingly used for protein production because sequence information is easier to obtain than the corresponding physical DNA. Protein-coding sequences are commonly re-designed to enhance expression, but there are no experimentally supported design principles. PRINCIPAL FINDINGS:To identify sequence features that affect protein expression we synthesized and expressed in E. coli two sets of 40 genes encoding two commercially valuable proteins, a DNA polymerase and a single chain antibody. Genes differing only in synonymous codon usage expressed protein at levels ranging from undetectable to 30% of cellular protein. Using partial least squares regression we tested the correlation of protein production levels with parameters that have been reported to affect expression. We found that the amount of protein produced in E. coli was strongly dependent on the codons used to encode a subset of amino acids. Favorable codons were predominantly those read by tRNAs that are most highly charged during amino acid starvation, not codons that are most abundant in highly expressed E. coli proteins. Finally we confirmed the validity of our models by designing, synthesizing and testing new genes using codon biases predicted to perform well. CONCLUSION:The systematic analysis of gene design parameters shown in this study has allowed us to identify codon usage within a gene as a critical determinant of achievable protein expression levels in E. coli. We propose a biochemical basis for this, as well as design algorithms to ensure high protein production from synthetic genes. Replication of this methodology should allow similar design algorithms to be empirically derived for any expression system
    corecore