3 research outputs found

    Computational modelling of enzyme activity to speed up biocatalyst redesign

    No full text
    Previously held under moratorium in Chemistry department (GSK) from 28/09/2018 until 18/06/2021Biocatalysis is increasingly used for the synthesis of pharmaceuticals intermediates. However, to expand the applicability of these methods current timelines for biocatalyst optimization need to be reduced. Quantum Mechanics/Molecular Mechanics (QM/MM) methods allow, in principle, the accurate evaluation of enzymatic activities and thus offer an interesting option for the in silico pre-screening of variants. However, standard QM/MM methods are a computationally expensive class of methods and thus for practical large scale applications, approximations need to be made. In this work, a QM/MM-based protocol for hotspot identification has been developed and tested. The establishment and validation of an internal protocol for accurate QM/MM calculations was achieved through the mechanistic study of aldose reductase. This study highlights the importance of parameters, such as size of the QM region or the choice of the QM method, on differentiating between competitive mechanisms and consequently on accurately determining the role of the environment on the energy profile of the reaction. Having a validated QM/MM methodology, an enzymatic amide bond formation was used as a case study to elaborate and test a protocol for hotspot identification. In the resulting protocol three major approximations were introduced to speed up calculations: starting from a single snapshot, neglecting possible reorganizations consecutive to the mutation and focusing mainly on electrostatic effects. Two different types of charge modification protocols were investigated: charge deletion and charge introduction. From this study one specific hotspot was identified. In a further study, a homology model strategy was conducted to cope with the absence of experimentally determined structure, a frequent issue in enzyme design. Our previously established protocol was re-tested starting from the homology model and hotspots were identified. Finally, an evaluation of solvent free calculations, as an option to further accelerate the calculations, was also carried out. Encouraging results were obtained in the solvent free studies as similar hotspots were obtained relative to the water or toluene solvated models. Nonetheless, significant variations do exist between different solvents and further studies are necessary to validate the use of this approximation in a wider context.Biocatalysis is increasingly used for the synthesis of pharmaceuticals intermediates. However, to expand the applicability of these methods current timelines for biocatalyst optimization need to be reduced. Quantum Mechanics/Molecular Mechanics (QM/MM) methods allow, in principle, the accurate evaluation of enzymatic activities and thus offer an interesting option for the in silico pre-screening of variants. However, standard QM/MM methods are a computationally expensive class of methods and thus for practical large scale applications, approximations need to be made. In this work, a QM/MM-based protocol for hotspot identification has been developed and tested. The establishment and validation of an internal protocol for accurate QM/MM calculations was achieved through the mechanistic study of aldose reductase. This study highlights the importance of parameters, such as size of the QM region or the choice of the QM method, on differentiating between competitive mechanisms and consequently on accurately determining the role of the environment on the energy profile of the reaction. Having a validated QM/MM methodology, an enzymatic amide bond formation was used as a case study to elaborate and test a protocol for hotspot identification. In the resulting protocol three major approximations were introduced to speed up calculations: starting from a single snapshot, neglecting possible reorganizations consecutive to the mutation and focusing mainly on electrostatic effects. Two different types of charge modification protocols were investigated: charge deletion and charge introduction. From this study one specific hotspot was identified. In a further study, a homology model strategy was conducted to cope with the absence of experimentally determined structure, a frequent issue in enzyme design. Our previously established protocol was re-tested starting from the homology model and hotspots were identified. Finally, an evaluation of solvent free calculations, as an option to further accelerate the calculations, was also carried out. Encouraging results were obtained in the solvent free studies as similar hotspots were obtained relative to the water or toluene solvated models. Nonetheless, significant variations do exist between different solvents and further studies are necessary to validate the use of this approximation in a wider context

    Construction of balanced, chemically dissimilar training, validation and test sets for machine learning on molecular datasets

    No full text
    When preparing training, validation and test sets for machine learning on molecular datasets, it is desirable to combine two requirements: 1) robustness, i.e. making a test set that is chemically dissimilar from the training set; 2) data balance, i.e. ensuring that the proportion of data points and the distribution of data labels (categorical) / data values (continuous) are as homogeneous as possible among the sets, for each individual property to model, while partitioning the overall set of compounds as required. Recent literature shows that meeting both these requirements simultaneously is sometimes very difficult. This is especially true for multi-task learning, but also for single-task learning if one aims to balance the distribution of data labels or values, too. In this work we present a method that resolves this issue by first carrying out a chemistry-guided clustering of the initial dataset to ensure the separation of chemical matter, and subsequently applying linear programming to select the lists of clusters that – once assembled into the final sets – result in the best possible data balance
    corecore