187 research outputs found

    Probing the Mutational Interplay between Primary and Promiscuous Protein Functions: A Computational-Experimental Approach

    Get PDF
    Protein promiscuity is of considerable interest due its role in adaptive metabolic plasticity, its fundamental connection with molecular evolution and also because of its biotechnological applications. Current views on the relation between primary and promiscuous protein activities stem largely from laboratory evolution experiments aimed at increasing promiscuous activity levels. Here, on the other hand, we attempt to assess the main features of the simultaneous modulation of the primary and promiscuous functions during the course of natural evolution. The computational/experimental approach we propose for this task involves the following steps: a function-targeted, statistical coupling analysis of evolutionary data is used to determine a set of positions likely linked to the recruitment of a promiscuous activity for a new function; a combinatorial library of mutations on this set of positions is prepared and screened for both, the primary and the promiscuous activities; a partial-least-squares reconstruction of the full combinatorial space is carried out; finally, an approximation to the Pareto set of variants with optimal primary/promiscuous activities is derived. Application of the approach to the emergence of folding catalysis in thioredoxin scaffolds reveals an unanticipated scenario: diverse patterns of primary/promiscuous activity modulation are possible, including a moderate (but likely significant in a biological context) simultaneous enhancement of both activities. We show that this scenario can be most simply explained on the basis of the conformational diversity hypothesis, although alternative interpretations cannot be ruled out. Overall, the results reported may help clarify the mechanisms of the evolution of new functions. From a different viewpoint, the partial-least-squares-reconstruction/Pareto-set-prediction approach we have introduced provides the computational basis for an efficient directed-evolution protocol aimed at the simultaneous enhancement of several protein features and should therefore open new possibilities in the engineering of multi-functional enzymes

    Algorithms for optimizing cross-overs in DNA shuffling

    Get PDF
    DNA shuffling generates combinatorial libraries of chimeric genes by stochastically recombining parent genes. The resulting libraries are subjected to large-scale genetic selection or screening to identify those chimeras with favorable properties (e.g., enhanced stability or enzymatic activity). While DNA shuffling has been applied quite successfully, it is limited by its homology-dependent, stochastic nature. Consequently, it is used only with parents of sufficient overall sequence identity, and provides no control over the resulting chimeric library. Results: This paper presents efficient methods to extend the scope of DNA shuffling to handle significantly more diverse parents and to generate more predictable, optimized libraries. Our CODNS (cross-over optimization for DNA shuffling) approach employs polynomial-time dynamic programming algorithms to select codons for the parental amino acids, allowing for zero or a fixed number of conservative substitutions. We first present efficient algorithms to optimize the local sequence identity or the nearest-neighbor approximation of the change in free energy upon annealing, objectives that were previously optimized by computationally-expensive integer programming methods. We then present efficient algorithms for more powerful objectives that seek to localize and enhance the frequency of recombination by producing “runs” of common nucleotides either overall or according to the sequence diversity of the resulting chimeras. We demonstrate the effectiveness of CODNS in choosing codons and allocating substitutions to promote recombination between parents targeted in earlier studies: two GAR transformylases (41% amino acid sequence identity), two very distantly related DNA polymerases, Pol X and b (15%), and beta- lactamases of varying identity (26-47%). Conclusions: Our methods provide the protein engineer with a new approach to DNA shuffling that supports substantially more diverse parents, is more deterministic, and generates more predictable and more diverse chimeric libraries

    Optimization Algorithms for Site-directed Protein Recombination Experiment Planning

    Get PDF
    Site-directed protein recombination produces improved and novel protein variants by recombining sequence fragments from parent proteins. The resulting hybrids accumulate multiple mutations that have been evolutionarily accepted together. Subsequent screening or selection identifies hybrids with desirable characteristics. In order to increase the hit rate of good variants, this thesis develops experiment planning algorithms to optimize protein recombination experiments. First, to improve the frequency of generating novel hybrids, a metric is developed to assess the diversity among hybrids and parent proteins. Dynamic programming algorithms are then created to optimize the selection of breakpoint locations according to this metric. Second, the trade-off between diversity and stability in recombination experiment planning is studied, recognizing that diversity requires changes from parent proteins, which may also disrupt important residue interactions necessary for protein stability. Accordingly, methods based on dynamic programming are developed to provide combined optimization of diversity and stability, finding optimal breakpoints such that no other experiment plan has better performance in both aspects simultaneously. Third, in order to support protein recombination with heterogeneous structures and focus on functionally important regions, a general framework for protein fragment swapping is developed. Differentiating source and target parents, and swappable regions within them, fragment swapping enables asymmetric, selective site-directed recombination. Two applications of protein fragment swapping are studied. In order to generate hybrids inheriting functionalities from both source and target proteins by fragment swapping, a method based on integer programming selects optimal swapping fragments to maximize the predicted stability and activity of hybrids in the resulting library. In another application, human source protein fragments are swapped into therapeutic exogenous target protein to minimize the occurrence of peptides that trigger immune response. A dynamic programming method is developed to optimize fragment selection for both humanity and functionality, resulting in therapeutically active variants with decreased immunogenicity

    Fold Family-Regularized Bayesian Optimization for Directed Protein Evolution

    Get PDF
    Directed Evolution (DE) is a technique for protein engineering that involves iterative rounds of mutagenesis and screening to search for sequences that optimize a given property (ex. binding affinity to a specified target). Unfortunately, the underlying optimization problem is under-determined, and so mutations introduced to improve the specified property may come at the expense of unmeasured, but nevertheless important properties (ex. subcellular localization). We seek to address this issue by incorporating a fold-specific regularization factor into the optimization problem. The regularization factor biases the search towards designs that resemble sequences from the fold family to which the protein belongs. We applied our method to a large library of protein GB1 mutants with binding affinity measurements to IgG-Fc. Our results demonstrate that the regularized optimization problem produces more native-like GB1 sequences with only a minor decrease in binding affinity. Specifically, the log-odds of our designs under a generative model of the GB1 fold family are between 41-45% higher than those obtained without regularization, with only a 7% drop in binding affinity. Thus, our method is capable of making a trade-off between competing traits. Moreover, we demonstrate that our active-learning driven approach reduces the wet-lab burden to identify optimal GB1 designs by 67%, relative to recent results from the Arnold lab on the same data

    Modeling Tumor Clonal Evolution for Drug Combinations Design

    Get PDF
    Cancer is a clonal evolutionary process. This presents challenges for effective therapeutic intervention, given the constant selective pressure toward drug resistance. Mathematical modeling from population genetics, evolutionary dynamics, and engineering perspectives are being increasingly employed to study tumor progression, intratumoral heterogeneity, drug resistance, and rational drug scheduling and combinations design. In this review we discuss the promising opportunities that these interdisciplinary approaches hold for advances in cancer biology and treatment. We propose that quantitative modeling perspectives can complement emerging experimental technologies to facilitate enhanced understanding of disease progression and improved capabilities for therapeutic drug regimen designs.David H. Koch Cancer Research Fund (Grant P30-CA14051)National Cancer Institute (U.S.). Integrative Cancer Biology Program (Grant U54-CA112967)National Institute of General Medical Sciences (U.S.). Interdepartmental Biotechnology Training Program (5T32GM008334

    SwiftLib: rapid degenerate-codon-library optimization through dynamic programming

    Get PDF
    Degenerate codon (DC) libraries efficiently address the experimental library-size limitations of directed evolution by focusing diversity toward the positions and toward the amino acids (AAs) that are most likely to generate hits; however, manually constructing DC libraries is challenging, error prone and time consuming. This paper provides a dynamic programming solution to the task of finding the best DCs while keeping the size of the library beneath some given limit, improving on the existing integer-linear programming formulation. It then extends the algorithm to consider multiple DCs at each position, a heretofore unsolved problem, while adhering to a constraint on the number of primers needed to synthesize the library. In the two library-design problems examined here, the use of multiple DCs produces libraries that very nearly cover the set of desired AAs while still staying within the experimental size limits. Surprisingly, the algorithm is able to find near-perfect libraries where the ratio of amino-acid sequences to nucleic-acid sequences approaches 1; it effectively side-steps the degeneracy of the genetic code. Our algorithm is freely available through our web server and solves most design problems in about a second

    Computational Modeling of Designed Ankyrin Repeat Protein Complexes with their Targets

    Get PDF
    Recombinant therapeutic proteins are playing an ever-increasing role in the clinic. High-affinity binding candidates can be produced in a high-throughput manner through the process of selection and evolution from large libraries, but the structures of the complexes with target protein can only be determined for a small number of them in a costly, low-throughput manner, typically by x-ray crystallography. Reliable modeling of complexes would greatly help to understand their mode of action and improve them by further engineering, for example, by designing bi-paratopic binders. Designed ankyrin repeat proteins (DARPins) are one such class of antibody mimetics that have proven useful in the clinic, in diagnostics and research. Here we have developed a standardized procedure to model DARPin–target complexes that can be used to predict the structures of unknown complexes. It requires only the sequence of a DARPin and a structure of the unbound target. The procedure includes homology modeling of the DARPin, modeling of the flexible parts of a target, rigid body docking to ensembles of the target and docking with a partially flexible backbone. For a set of diverse DARPin–target complexes tested it generated a single model of the complex that well approximates the native state of the complex. We provide a protocol that can be used in a semi-automated way and with tools that are freely available. The presented concepts should help to accelerate the development of novel bio-therapeutics for scaffolds with similar properties

    Improved differential search algorithms for metabolic network optimization

    Get PDF
    The capabilities of Escherichia coli and Zymomonas mobilis to efficiently converting substrate into valuable metabolites have caught the attention of many industries. However, the production rates of these metabolites are still below the maximum threshold. Over the years, the organism strain design was improvised through the development of metabolic network that eases the process of exploiting and manipulating organism to maximize its growth rate and to maximize metabolites production. Due to the complexity of metabolic networks and multiple objectives, it is difficult to identify near-optimal knockout reactions that can maximize both objectives. This research has developed two improved modelling-optimization methods. The first method introduces a Differential Search Algorithm and Flux Balance Analysis (DSAFBA) to identify knockout reactions that maximize the production rate of desired metabolites. The latter method develops a non-dominated searching DSAFBA (ndsDSAFBA) to investigate the trade-off relationship between production rate and its growth rate by identifying knockout reactions that maximize both objectives. These methods were assessed against three metabolic networks – E.coli core model, iAF1260 and iEM439 for production of succinic acid, acetic acid and ethanol. The results revealed that the improved methods are superior to the other state-of-the-art methods in terms of production rate, growth rate and computation time. The study has demonstrated that the two improved modelling-optimization methods could be used to identify near-optimal knockout reactions that maximize production of desired metabolites as well as the organism’s growth rate within a shorter computation time

    DEEP LEARNING METHODS FOR PREDICTION OF AND ESCAPE FROM PROTEIN RECOGNITION

    Get PDF
    Protein interactions drive diverse processes essential to living organisms, and thus numerous biomedical applications center on understanding, predicting, and designing how proteins recognize their partners. While unfortunately the number of interactions of interest still vastly exceeds the capabilities of experimental determination methods, computational methods promise to fill the gap. My thesis pursues the development and application of computational methods for several protein interaction prediction and design tasks. First, to improve protein-glycan interaction specificity prediction, I developed GlyBERT, which learns biologically relevant glycan representations encapsulating the components most important for glycan recognition within their structures. GlyBERT encodes glycans with a branched biochemical language and employs an attention-based deep language model to embed the correlation between local and global structural contexts. This approach enables the development of predictive models from limited data, supporting applications such as lectin binding prediction. Second, to improve protein-protein interaction prediction, I developed a unified geometric deep neural network, ‘PInet’ (Protein Interface Network), which leverages the best properties of both data- and physics-driven methods, learning and utilizing models capturing both geometrical and physicochemical molecular surface complementarity. In addition to obtaining state-of-the-art performance in predicting protein-protein interactions, PInet can serve as the backbone for other protein-protein interaction modeling tasks such as binding affinity prediction. Finally, I turned from ii prediction to design, addressing two important tasks in the context of antibodyantigen recognition. The first problem is to redesign a given antigen to evade antibody recognition, e.g., to help biotherapeutics avoid pre-existing immunity or to focus vaccine responses on key portions of an antigen. The second problem is to design a panel of variants of a given antigen to use as “bait” in experimental identification of antibodies that recognize different parts of the antigen, e.g., to support classification of immune responses or to help select among different antibody candidates. I developed a geometry-based algorithm to generate variants to address these design problems, seeking to maximize utility subject to experimental constraints. During the design process, the algorithm accounts for and balances the effects of candidate mutations on antibody recognition and on antigen stability. In retrospective case studies, the algorithm demonstrated promising precision, recall, and robustness of finding good designs. This work represents the first algorithm to systematically design antigen variants for characterization and evasion of polyclonal antibody responses
    corecore