123 research outputs found

    Machine Learning-Assisted Directed Evolution Navigates a Combinatorial Epistatic Fitness Landscape with Minimal Screening Burden

    Get PDF
    Due to screening limitations, in directed evolution (DE) of proteins it is rarely feasible to fully evaluate combinatorial mutant libraries made by mutagenesis at multiple sites. Instead, DE often involves a single-step greedy optimization in which the mutation in the highest-fitness variant identified in each round of single-site mutagenesis is fixed. However, because the effects of a mutation can depend on the presence or absence of other mutations, the efficiency and effectiveness of a single-step greedy walk is influenced by both the starting variant and the order in which beneficial mutations are identified—the process is path-dependent. We recently demonstrated a path-independent machine learning-assisted approach to directed evolution (MLDE) that allows in silico screening of full combinatorial libraries made by simultaneous saturation mutagenesis, thus explicitly capturing the effects of cooperative mutations and bypassing the path-dependence that can limit greedy optimization. Here, we thoroughly investigate and optimize an MLDE workflow by testing a number of design considerations of the MLDE pipeline. Specifically, we (1) test the effects of different encoding strategies on MLDE efficiency, (2) integrate new models and a training procedure more amenable to protein engineering tasks, and (3) incorporate training set design strategies to avoid information-poor low-fitness protein variants (“holes”) in the training data. When applied to an epistatic, hole-filled, four-site combinatorial fitness landscape of protein G domain B1 (GB1), the resulting focused training MLDE (ftMLDE) protocol achieved the global fitness maximum up to 92% of the time at a total screening burden of 470 variants. In contrast, minimal-screening-burden single-step greedy optimization over the GB1 fitness landscape reached the global maximum just 1.2% of the time; ftMLDE matching this minimal screening burden (80 total variants) achieved the global optimum up to 9.6% of the time with a 49% higher expected maximum fitness achieved. To facilitate further development of MLDE, we present the MLDE software package (https://github.com/fhalab/MLDE), which is designed for use by protein engineers without computational or machine learning expertise

    Machine learning-assisted directed protein evolution with combinatorial libraries

    Get PDF
    To reduce experimental effort associated with directed protein evolution and to explore the sequence space encoded by mutating multiple positions simultaneously, we incorporate machine learning in the directed evolution workflow. Combinatorial sequence space can be quite expensive to sample experimentally, but machine learning models trained on tested variants provide a fast method for testing sequence space computationally. We validate this approach on a large published empirical fitness landscape for human GB1 binding protein, demonstrating that machine learning-guided directed evolution finds variants with higher fitness than those found by other directed evolution approaches. We then provide an example application in evolving an enzyme to produce each of the two possible product enantiomers (stereodivergence) of a new-to-nature carbene Si-H insertion reaction. The approach predicted libraries enriched in functional enzymes and fixed seven mutations in two rounds of evolution to identify variants for selective catalysis with 93% and 79% ee. By greatly increasing throughput with in silico modeling, machine learning enhances the quality and diversity of sequence solutions for a protein engineering problem.Comment: Corrected best S-selective variant sequence in Figure 4. Corrected less R-selective variant sequences from Round II Input library in Table 2 and Supp Table 4. Corrections may also be found on PNAS version https://www.pnas.org/content/early/2019/12/26/192177011

    Diversity-Oriented Enzymatic Synthesis of Cyclopropane Building Blocks

    Get PDF
    While biocatalysis is increasingly incorporated into drug development pipelines, it is less commonly used in the early stages of drug discovery. By engineering a protein to produce a chiral motif with a derivatizable functional handle, biocatalysts can be used to help generate diverse building blocks for drug discovery. Here we show the engineering of two variants of Rhodothermus marinus nitric oxide dioxygenase (RmaNOD) to catalyze the formation of cis- and trans-diastereomers of a pinacolboronate-substituted cyclopropane which can be readily derivatized to generate diverse stereopure cyclopropane building blocks

    Diversity-Oriented Enzymatic Synthesis of Cyclopropane Building Blocks

    Get PDF
    While biocatalysis is increasingly incorporated into drug development pipelines, it is less commonly used in the early stages of drug discovery. By engineering a protein to produce a chiral motif with a derivatizable functional handle, biocatalysts can be used to help generate diverse building blocks for drug discovery. Here we show the engineering of two variants of Rhodothermus marinus nitric oxide dioxygenase (RmaNOD) to catalyze the formation of cis- and trans-diastereomers of a pinacolboronate-substituted cyclopropane which can be readily derivatized to generate diverse stereopure cyclopropane building blocks

    Machine learning-assisted directed protein evolution with combinatorial libraries

    Get PDF
    To reduce experimental effort associated with directed protein evolution and to explore the sequence space encoded by mutating multiple positions simultaneously, we incorporate machine learning into the directed evolution workflow. Combinatorial sequence space can be quite expensive to sample experimentally, but machine-learning models trained on tested variants provide a fast method for testing sequence space computationally. We validated this approach on a large published empirical fitness landscape for human GB1 binding protein, demonstrating that machine learning-guided directed evolution finds variants with higher fitness than those found by other directed evolution approaches. We then provide an example application in evolving an enzyme to produce each of the two possible product enantiomers (i.e., stereodivergence) of a new-to-nature carbene Si–H insertion reaction. The approach predicted libraries enriched in functional enzymes and fixed seven mutations in two rounds of evolution to identify variants for selective catalysis with 93% and 79% ee (enantiomeric excess). By greatly increasing throughput with in silico modeling, machine learning enhances the quality and diversity of sequence solutions for a protein engineering problem

    Regional Brain Responses in Nulliparous Women to Emotional Infant Stimuli

    Get PDF
    Infant cries and facial expressions influence social interactions and elicit caretaking behaviors from adults. Recent neuroimaging studies suggest that neural responses to infant stimuli involve brain regions that process rewards. However, these studies have yet to investigate individual differences in tendencies to engage or withdraw from motivationally relevant stimuli. To investigate this, we used event-related fMRI to scan 17 nulliparous women. Participants were presented with novel infant cries of two distress levels (low and high) and unknown infant faces of varying affect (happy, sad, and neutral) in a randomized, counter-balanced order. Brain activation was subsequently correlated with scores on the Behavioral Inhibition System/Behavioral Activation System scale. Infant cries activated bilateral superior and middle temporal gyri (STG and MTG) and precentral and postcentral gyri. Activation was greater in bilateral temporal cortices for low- relative to high-distress cries. Happy relative to neutral faces activated the ventral striatum, caudate, ventromedial prefrontal, and orbitofrontal cortices. Sad versus neutral faces activated the precuneus, cuneus, and posterior cingulate cortex, and behavioral activation drive correlated with occipital cortical activations in this contrast. Behavioral inhibition correlated with activation in the right STG for high- and low-distress cries relative to pink noise. Behavioral drive correlated inversely with putamen, caudate, and thalamic activations for the comparison of high-distress cries to pink noise. Reward-responsiveness correlated with activation in the left precentral gyrus during the perception of low-distress cries relative to pink noise. Our findings indicate that infant cry stimuli elicit activations in areas implicated in auditory processing and social cognition. Happy infant faces may be encoded as rewarding, whereas sad faces activate regions associated with empathic processing. Differences in motivational tendencies may modulate neural responses to infant cues

    Valuing Health Gain from Composite Response Endpoints for Multisystem Diseases

    Get PDF
    Objectives: This study aimed to demonstrate how to estimate the value of health gain after patients with a multisystem disease achieve a condition-specific composite response endpoint. Methods: Data from patients treated in routine practice with an exemplar multisystem disease (systemic lupus erythematosus) were extracted from a national register (British Isles Lupus Assessment Group Biologics Register). Two bespoke composite response endpoints (Major Clinical Response and Improvement) were developed in advance of this study. Difference-in-differences regression compared health utility values (3-level version of EQ-5D; UK tariff) over 6 months for responders and nonresponders. Bootstrapped regression estimated the incremental quality-adjusted life-years (QALYs), probability of QALY gain after achieving the response criteria, and population monetary benefit of response. Results: Within the sample (n = 171), 18.2% achieved Major Clinical Response and 49.1% achieved Improvement at 6 months. Incremental health utility values were 0.0923 for Major Clinical Response and 0.0454 for Improvement. Expected incremental QALY gain at 6 months was 0.020 for Major Clinical Response and 0.012 for Improvement. Probability of QALY gain after achieving the response criteria was 77.6% for Major Clinical Response and 72.7% for Improvement. Population monetary benefit of response was £1 106 458 for Major Clinical Response and £649 134 for Improvement. Conclusions: Bespoke composite response endpoints are becoming more common to measure treatment response for multisystem diseases in trials and observational studies. Health technology assessment agencies face a growing challenge to establish whether these endpoints correspond with improved health gain. Health utility values can generate this evidence to enhance the usefulness of composite response endpoints for health technology assessment, decision making, and economic evaluation
    corecore