4 research outputs found

    Enhancing reaction-based de novo design using a multi-label reaction class recommender

    Get PDF
    Reaction-based de novo design refers to the in-silico generation of novel chemical structures by combining reagents using structural transformations derived from known reactions. The driver for using reaction-based transformations is to increase the likelihood of the designed molecules being synthetically accessible. We have previously described a reaction-based de novo design method based on reaction vectors which are transformation rules that are encoded automatically from reaction databases. A limitation of reaction vectors is that they account for structural changes that occur at the core of a reaction only, and they do not consider the presence of competing functionalities that can compromise the reaction outcome. Here, we present the development of a Reaction Class Recommender to enhance the reaction vector framework. The recommender is intended to be used as a filter on the reaction vectors that are applied during de novo design to reduce the combinatorial explosion of in-silico molecules produced while limiting the generated structures to those which are most likely to be synthesisable. The recommender has been validated using an external data set extracted from the recent medicinal chemistry literature and in two simulated de novo design experiments. Results suggest that the use of the recommender drastically reduces the number of solutions explored by the algorithm while preserving the chance of finding relevant solutions and increasing the global synthetic accessibility of the designed molecules

    Development and application of a data-driven reaction classification model : comparison of an electronic lab notebook and the medicinal chemistry literature

    Get PDF
    Reaction classification has often been considered an important task for many different applications, and has traditionally been accomplished using hand-coded rule-based approaches. However, the availability of large collections of reactions enables data-driven approaches to be developed. We present the development and validation of a 336-class machine learning-based classification model integrated within a Conformal Prediction (CP) framework in order to associate reaction class predictions with confidence estimations. We also propose a data-driven approach for 'dynamic' reaction fingerprinting to maximise the effectiveness of reaction encoding, as well as developing a novel reaction classification system that organises labels in four hierarchical levels (SHREC: Sheffield Hierarchical REaction Classification). We show that the performance of the CP augmented model can be improved by defining confidence thresholds to detect predictions that are less likely to be false. For example, the external validation of the model reports 95% of predictions as correct by filtering out less than 15% of the uncertain classifications. The application of the model is demonstrated by classifying two reaction datasets: one extracted from an industrial ELN and the other from the medicinal chemistry literature. We show how confidence estimations and class compositions across different levels of information can be used to gain immediate insights on the nature of reaction collections and hidden relationship between reaction classes

    RENATE : a pseudo-retrosynthetic tool for synthetically accessible de novo design

    Get PDF
    Reaction-based de novo design refers to the generation of synthetically accessible molecules using transformation rules extracted from known reactions in the literature. In this context, we have previously described the extraction of reaction vectors from a reactions database and their coupling with a structure generation algorithm for the generation of novel molecules from a starting material. An issue when designing molecules from a starting material is the combinatorial explosion of possible product molecules that can be generated, especially for multistep syntheses. Here, we present the development of RENATE, a reaction-based de novo design tool, which is based on a pseudo-retrosynthetic fragmentation of a reference ligand and an inside-out approach to de novo design. The reference ligand is fragmented; each fragment is used to search for similar fragments as building blocks; the building blocks are combined into products using reaction vectors; and a synthetic route is suggested for each product molecule. The RENATE methodology is presented followed by a retrospective validation to recreate a set of approved drugs. Results show that RENATE can generate very similar or even identical structures to the corresponding input drugs, hence validating the fragmentation, search, and design heuristics implemented in the tool
    corecore