35 research outputs found
Recommended from our members
Imputation versus prediction: applications in machine learning for drug discovery
Imputation is a powerful statistical method that is distinct from the predictive modelling techniques more commonly used in drug discovery. Imputation uses sparse experimental data in an incomplete dataset to predict missing values by leveraging correlations between experimental assays. This contrasts with quantitative structure–activity relationship methods that use only descriptor – assay correlations. We summarize three recent imputation strategies – heterogeneous deep imputation, assay profile methods and matrix factorization – and compare these with quantitative structure–activity relationship methods, including deep learning, in drug discovery settings. We comment on the value added by imputation methods when used in an ongoing project and find that imputation produces stronger models, earlier in the project, over activity and absorption, distribution, metabolism and elimination end points. </jats:p
Recommended from our members
Deep imputation on large‐scale drug discovery data
More accurate predictions of the biological properties of chemical compounds would guide the selection and design of new compounds in drug discovery and help to address the enormous cost and low success-rate of pharmaceutical R&D. However this domain presents a significant challenge for AI methods due to the sparsity of compound data and the noise inherent in results from biological experiments. In this paper, we demonstrate how data imputation using deep learning provides substantial improvements over quantitative structure-activity relationship (QSAR) machine learning models that are widely applied in drug discovery. We present the largest-to-date successful application of deep-learning imputation to datasetswhich arecomparablein sizetothe corporate data repository of a pharmaceutical company (678,994 compounds by 1166 endpoints). We demonstrate this improvement for three areas of practical application linked to distinct use cases; i) target activity data compiled from a range of drug discovery projects, ii) a high value and heterogeneous datasetcovering complex absorption, distribution, metabolism and elimination properties and, iii) high throughput screeningdata, testing thealgorithm’slimits on early-stage noisy and very sparse data.Achieving median coefficients of determination, 2, of 0.69, 0.36 and 0.43 respectively across these applications, the deep learning imputation method offers an unambiguous improvement over random forest QSAR methods, which achieve median 2 values of 0.28, 0.19 and 0.23 respectively.We also demonstrate that robust estimates of the uncertainties in the predicted values correlate strongly with the accuracies in prediction, enabling greater confidence in decision-making based on the imputed values.Optibrium Ltd, Intellegens Ltd, Takeda, Royal Societ
First principles methods using CASTEP.
Abstract. The CASTEP code for first principles electronic structure calculations will be described. A brief, nontechnical overview will be given and some of the features and capabilities highlighted. Some features which are unique to CASTEP will be described and near-future development plans outlined
Recommended from our members
Mathematical analysis of the Escherichia coli chemotaxis signalling pathway
We undertake a detailed mathematical analysis of a recent nonlinear ordinary differential equation (ODE) model describing the chemotactic signalling cascade within an {\it Escherichia coli} cell. The model includes a detailed description of the cell signalling cascade and an average approximation of the receptor activity. A steady-state stability analysis reveals the system exhibits one positive real steady-state which is shown to be asymptotically stable. Given the occurrence of a negative feedback between phosphorylated CheB (CheB-P) and the receptor state, we ask under what conditions, the system may exhibit oscillatory type behaviour. A detailed analysis of parameter space reveals that whilst variation in kinetic rate parameters within known biological limits is unlikely to lead to such behaviour, changes in the total concentration of the signalling proteins does. We postulate that experimentally observed overshoot behaviour can actually be described by damped oscillatory dynamics and consider the relationship between overshoot amplitude, total cell protein concentration and the magnitude of the external ligand stimulus. Model reductions of the full ODE model allow us to understand the link between phosphorylation events and the negative feedback between CheB-P and receptor methylation, as well as elucidate why some mathematical models exhibit overshoot and others do not. Our manuscript closes by discussing intercell variability of total protein concentration as means of ensuring the overall survival of a population as cells are subjected to different environments
Digital reconstruction of the inner ear of Leptictidium auderiense (Leptictida, Mammalia) and North American leptictids reveals new insight into leptictidan locomotor agility
Leptictida are basal Paleocene to Oligocene eutherians from Europe and North America comprising species with highly specialized postcranial features including elongated hind limbs. Among them, the European Leptictidium was probably a bipedal runner or jumper. Because the semicircular canals of the inner ear are involved in detecting angular acceleration of the head, their morphometry can be used as a proxy to elucidate the agility in fossil mammals. Here we provide the first insight into inner ear anatomy and morphometry of Leptictida based on high-resolution computed tomography of a new specimen of Leptictidium auderiense from the middle Eocene Messel Pit (Germany) and specimens of the North American Leptictis and Palaeictops. The general morphology of the bony labyrinth reveals several plesiomorphic mammalian features, such as a secondary crus commune. Leptictidium is derived from the leptictidan groundplan in lacking the secondary bony lamina and having proportionally larger semicircular canals than the leptictids under study. Our estimations reveal that Leptictidium was a very agile animal with agility score values (4.6 and 5.5, respectively) comparable to Macroscelidea and extant bipedal saltatory placentals. Leptictis and Palaeictops have lower agility scores (3.4 to 4.1), which correspond to the more generalized types of locomotion (e.g., terrestrial, cursorial) of most extant mammals. In contrast, the angular velocity magnitude predicted from semicircular canal angles supports a conflicting pattern of agility among leptictidans, but the significance of these differences might be challenged when more is known about intraspecific variation and the pattern of semicircular canal angles in non-primate mammals
Recommended from our members
Imputation versus prediction: applications in machine learning for drug discovery
Imputation is a powerful statistical method that is distinct from the predictive modelling techniques more commonly used in drug discovery. Imputation uses sparse experimental data in an incomplete dataset to predict missing values by leveraging correlations between experimental assays. This contrasts with quantitative structure–activity relationship methods that use only descriptor – assay correlations. We summarize three recent imputation strategies – heterogeneous deep imputation, assay profile methods and matrix factorization – and compare these with quantitative structure–activity relationship methods, including deep learning, in drug discovery settings. We comment on the value added by imputation methods when used in an ongoing project and find that imputation produces stronger models, earlier in the project, over activity and absorption, distribution, metabolism and elimination end points
Transferable Machine Learning Interatomic Potential for Bond Dissociation Energy Prediction of Drug-like Molecules
We present a transferable MACE interatomic potential that is applicable to open- and closed-shell drug-like molecules containing hydrogen, carbon, and oxygen atoms. Including an accurate description of radical species extends the scope of possible applications to bond dissociation energy prediction, for example, in the context of cytochrome P450 (CYP) metabolism. The transferability of the MACE potential was validated on the COMP6 dataset, containing only closed-shell molecules, where it reaches better accuracy than the readily available general ANI-2x potential. MACE achieves similar accuracy on two CYP metabolism-specific datasets, which include open- and closed-shell structures. This model enables us to calculate the aliphatic C-H bond dissociation energy (BDE), which allows us to compare reaction energies of hydrogen abstraction, which is the rate-limiting step of the aliphatic hydroxylation reaction catalysed by CYPs. On the “CYP 3A4” dataset, MACE achieves a BDE RMSE of 1.37 kcal/mol and better prediction of BDE ranks than alternatives - the semi-empirical AM1 and GFN2-xTB methods and the ALFABET model by St. John et al.1 that predicts bond dissociation enthalpies. Finally, we highlight the smoothness of the MACE potential over paths of sp3C-H bond elongation and show that a minimal extension is enough for the MACE model to start finding reasonable minimum energy paths of methoxy radical-mediated hydrogen abstraction. Altogether, this work lays the ground for further extensions of scope in terms of chemical elements, (CYP-mediated) reaction classes and modelling the full reaction paths, not only bond dissociation energies
Recommended from our members
Transferable Machine Learning Interatomic Potential for Bond Dissociation Energy Prediction of Drug-like Molecules.
We present a transferable MACE interatomic potential that is applicable to open- and closed-shell drug-like molecules containing hydrogen, carbon, and oxygen atoms. Including an accurate description of radical species extends the scope of possible applications to bond dissociation energy (BDE) prediction, for example, in the context of cytochrome P450 (CYP) metabolism. The transferability of the MACE potential was validated on the COMP6 data set, containing only closed-shell molecules, where it reaches better accuracy than the readily available general ANI-2x potential. MACE achieves similar accuracy on two CYP metabolism-specific data sets, which include open- and closed-shell structures. This model enables us to calculate the aliphatic C-H BDE, which allows us to compare reaction energies of hydrogen abstraction, which is the rate-limiting step of the aliphatic hydroxylation reaction catalyzed by CYPs. On the "CYP 3A4" data set, MACE achieves a BDE RMSE of 1.37 kcal/mol and better prediction of BDE ranks than alternatives: the semiempirical AM1 and GFN2-xTB methods and the ALFABET model that directly predicts bond dissociation enthalpies. Finally, we highlight the smoothness of the MACE potential over paths of sp3C-H bond elongation and show that a minimal extension is enough for the MACE model to start finding reasonable minimum energy paths of methoxy radical-mediated hydrogen abstraction. Altogether, this work lays the ground for further extensions of scope in terms of chemical elements, (CYP-mediated) reaction classes and modeling the full reaction paths, not only BDEs
Avoiding Missed Opportunities by Analyzing the Sensitivity of Our Decisions
Drug discovery is
a multiparameter optimization process in which
the goal of a project is to identify compounds that meet multiple
property criteria required to achieve a therapeutic objective. However,
once a profile of property criteria has been chosen, the impact of
these criteria on the decisions made regarding progression of compounds
or chemical series should be carefully considered. In some cases the
decision is very sensitive to a specific property criterion, and such
a criterion may artificially distort the direction of the project;
any uncertainty in the “correct” value or the importance
of this criterion may lead to valuable opportunities being missed.
In this paper, we describe a method for analyzing the sensitivity
of the prioritization of compounds to a multiparameter profile of
property criteria. We show how the results can be easily interpreted
and illustrate how this analysis can highlight new avenues for exploration