3 research outputs found

    Prediction of mefenamic acid crystal shape by random forest classification

    Get PDF
    Purpose: This study describes the development and application of machine-learning models to the prediction of the crystal shape of mefenamic acid recrystallized from organic solvents. Method: Mefenamic acid crystals were grown in 30 different solvents and categorized according to crystal shape as either polyhedral or needle. A total of 87 random forest classification models were trained on this data. Initially, 3 models were built to assess the efficacy of this method. These models were trained on datasets containing Molecular Operating Environment (MOE) descriptors for the solvents and crystal shapes labels obtained by visual inspection of microscope images. The subsequent 84 models tested prediction accuracy for individual solvents that were sequentially excluded from the model training sets. In total, three different sets of MOE descriptors (one set that contained all available 2D descriptors, a second set that focused on molecular structure and a third set that focused on physical properties) were investigated to determine which of these three sets of descriptors resulted in the highest overall prediction accuracy across the different solvents. Results: For the initial three models, the highest prediction accuracy of crystal shape observed was 93.5% as assessed by 4-fold cross-validation. When solvents were sequentially excluded from training data, 32 out of 84 models predicted the shape of mefenamic acid crystals for the excluded solvent with 100% accuracy and a further 21 models had prediction accuracies from 50-100%. Reducing the feature set to only solvent physical property descriptors and supersaturations resulted in higher overall prediction accuracies than the models using atom count, bond count, and pharmacophore descriptors and the models using all solvent molecular descriptors. For the 8 solvents on which the models performed poorly (<50% accuracy), further characterisation of crystals grown in these solvents resulted in the discovery of a new mefenamic acid solvate. However, all other crystals were the previously known form I. Conclusion: Random forest classification models using solvent physical property descriptors can reliably predict crystal morphologies for mefenamic acid crystals grown in 20 out of the 28 solvents included in this work. Poor prediction accuracies for the remaining 7 solvents may be an indication that the factors not adequately covered by the training data result in these solvents being outliers

    Crystallisation thermodynamics and random forest classification for the prediction of crystallisation outcomes

    No full text
    Crystallisation is one of the key unit operations in the pharmaceutical industry. A wide range of crystal attributes affects the bulk particle properties of a crystalline material as well as its downstream manufacturability. Therefore, understanding and controlling the crystallisation process to achieve the desired quality attributes are of significant interest. This thesis investigated the potential of machine learning techniques in terms of the prediction of crystallisation outcomes, focusing on the shapes of mefenamic acid (MFA) crystals from various organic solvents, and solvated structures of small organic molecules considered by Powder X-ray Diffraction (PXRD) patterns. The solubility and nucleation of MFA were also explored in this thesis in an attempt to understand the thermodynamic and kinetic interactions during the crystallisation process of MFA. It was observed that the nucleation of MFA in methanol, ethanol, 2-propanol, 2-butanol, acetone, and tetrahydrofuran (THF) follows a two-step mechanism, in which the crystals nucleate within the metastable clusters. The comparison between surface free energy determined from nucleation rates and that calculated by Turnbull’s rule also proposes that the crystals nucleated faster via two-step nucleation compared to classical nucleation theory (CNT), due to the smaller nucleation barrier. For the machine learning application for predicting the crystallisation outcomes, the result showed that random forest classification models using solvent physical property descriptors can reliably predict crystal morphologies for MFA crystals grown in 20 out of the 28 solvents included in this work. Further characterization of the crystals grown in the remaining 8 solvents with poor model performance also resulted in the discovery of a new THF solvated form of MFA crystals. The ability of machine learning was also investigated to predict the solvated form of small organic molecules from the PXRD patterns derived from Cambridge Structural Database (CSD). The best model in this study showed 68.74% of prediction accuracy. These findings demonstrate the potential role of machine learning and data mining to assist the decision-making in crystallisation while reducing the uses of materials and time spent during the process development.Crystallisation is one of the key unit operations in the pharmaceutical industry. A wide range of crystal attributes affects the bulk particle properties of a crystalline material as well as its downstream manufacturability. Therefore, understanding and controlling the crystallisation process to achieve the desired quality attributes are of significant interest. This thesis investigated the potential of machine learning techniques in terms of the prediction of crystallisation outcomes, focusing on the shapes of mefenamic acid (MFA) crystals from various organic solvents, and solvated structures of small organic molecules considered by Powder X-ray Diffraction (PXRD) patterns. The solubility and nucleation of MFA were also explored in this thesis in an attempt to understand the thermodynamic and kinetic interactions during the crystallisation process of MFA. It was observed that the nucleation of MFA in methanol, ethanol, 2-propanol, 2-butanol, acetone, and tetrahydrofuran (THF) follows a two-step mechanism, in which the crystals nucleate within the metastable clusters. The comparison between surface free energy determined from nucleation rates and that calculated by Turnbull’s rule also proposes that the crystals nucleated faster via two-step nucleation compared to classical nucleation theory (CNT), due to the smaller nucleation barrier. For the machine learning application for predicting the crystallisation outcomes, the result showed that random forest classification models using solvent physical property descriptors can reliably predict crystal morphologies for MFA crystals grown in 20 out of the 28 solvents included in this work. Further characterization of the crystals grown in the remaining 8 solvents with poor model performance also resulted in the discovery of a new THF solvated form of MFA crystals. The ability of machine learning was also investigated to predict the solvated form of small organic molecules from the PXRD patterns derived from Cambridge Structural Database (CSD). The best model in this study showed 68.74% of prediction accuracy. These findings demonstrate the potential role of machine learning and data mining to assist the decision-making in crystallisation while reducing the uses of materials and time spent during the process development

    Prediction of mefenamic acid crystal shape by random forest classification

    Get PDF
    Research problem: Crystal shape is one of the key attributes affecting the bulk particle properties of a crystalline material as well as its downstream manufacturability1. However, the prediction of experimental crystal shapes remains very challenging. This research aims to explore the potential application of machine learning algorithms to solve this problem
    corecore