56 research outputs found

    A novel hybrid ultrafast shape descriptor method for use in virtual screening.

    Get PDF
    BACKGROUND: We have introduced a new Hybrid descriptor composed of the MACCS key descriptor encoding topological information and Ballester and Richards' Ultrafast Shape Recognition (USR) descriptor. The latter one is calculated from the moments of the distribution of the interatomic distances, and in this work we also included higher moments than in the original implementation. RESULTS: The performance of this Hybrid descriptor is assessed using Random Forest and a dataset of 116,476 molecules. Our dataset includes 5,245 molecules in ten classes from the 2005 World Anti-Doping Agency (WADA) dataset and 111,231 molecules from the National Cancer Institute (NCI) database. In a 10-fold Monte Carlo cross-validation this dataset was partitioned into three distinct parts for training, optimisation of an internal threshold that we introduced, and validation of the resulting model. The standard errors obtained were used to assess statistical significance of observed improvements in performance of our new descriptor. CONCLUSION: The Hybrid descriptor was compared to the MACCS key descriptor, USR with the first three (USR), four (UF4) and five (UF5) moments, and a combination of MACCS with USR (three moments). The MACCS key descriptor was not combined with UF5, due to similar performance of UF5 and UF4. Superior performance in terms of all figures of merit was found for the MACCS/UF4 Hybrid descriptor with respect to all other descriptors examined. These figures of merit include recall in the top 1% and top 5% of the ranked validation sets, precision, F-measure, area under the Receiver Operating Characteristic curve and Matthews Correlation Coefficient

    Simultaneous feature selection and parameter optimisation using an artificial ant colony: case study of melting point prediction.

    Get PDF
    BACKGROUND: We present a novel feature selection algorithm, Winnowing Artificial Ant Colony (WAAC), that performs simultaneous feature selection and model parameter optimisation for the development of predictive quantitative structure-property relationship (QSPR) models. The WAAC algorithm is an extension of the modified ant colony algorithm of Shen et al. (J Chem Inf Model 2005, 45: 1024-1029). We test the ability of the algorithm to develop a predictive partial least squares model for the Karthikeyan dataset (J Chem Inf Model 2005, 45: 581-590) of melting point values. We also test its ability to perform feature selection on a support vector machine model for the same dataset. RESULTS: Starting from an initial set of 203 descriptors, the WAAC algorithm selected a PLS model with 68 descriptors which has an RMSE on an external test set of 46.6 degrees C and R2 of 0.51. The number of components chosen for the model was 49, which was close to optimal for this feature selection. The selected SVM model has 28 descriptors (cost of 5, epsilon of 0.21) and an RMSE of 45.1 degrees C and R2 of 0.54. This model outperforms a kNN model (RMSE of 48.3 degrees C, R2 of 0.47) for the same data and has similar performance to a Random Forest model (RMSE of 44.5 degrees C, R2 of 0.55). However it is much less prone to bias at the extremes of the range of melting points as shown by the slope of the line through the residuals: -0.43 for WAAC/SVM, -0.53 for Random Forest. CONCLUSION: With a careful choice of objective function, the WAAC algorithm can be used to optimise machine learning and regression models that suffer from overfitting. Where model parameters also need to be tuned, as is the case with support vector machine and partial least squares models, it can optimise these simultaneously. The moving probabilities used by the algorithm are easily interpreted in terms of the best and current models of the ants, and the winnowing procedure promotes the removal of irrelevant descriptors

    Predicting the mechanism of phospholipidosis.

    Get PDF
    The mechanism of phospholipidosis is still not well understood. Numerous different mechanisms have been proposed, varying from direct inhibition of the breakdown of phospholipids to the binding of a drug compound to the phospholipid, preventing breakdown. We have used a probabilistic method, the Parzen-Rosenblatt Window approach, to build a model from the ChEMBL dataset which can predict from a compound's structure both its primary pharmaceutical target and other targets with which it forms off-target, usually weaker, interactions. Using a small dataset of 182 phospholipidosis-inducing and non-inducing compounds, we predict their off-target activity against targets which could relate to phospholipidosis as a side-effect of a drug. We link these targets to specific mechanisms of inducing this lysosomal build-up of phospholipids in cells. Thus, we show that the induction of phospholipidosis is likely to occur by separate mechanisms when triggered by different cationic amphiphilic drugs. We find that both inhibition of phospholipase activity and enhanced cholesterol biosynthesis are likely to be important mechanisms. Furthermore, we provide evidence suggesting four specific protein targets. Sphingomyelin phosphodiesterase, phospholipase A2 and lysosomal phospholipase A1 are shown to be likely targets for the induction of phospholipidosis by inhibition of phospholipase activity, while lanosterol synthase is predicted to be associated with phospholipidosis being induced by enhanced cholesterol biosynthesis. This analysis provides the impetus for further experimental tests of these hypotheses.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

    Analysis of Iterative Screening with Stepwise Compound Selection Based on Novartis In-house HTS Data.

    Get PDF
    With increased automation and larger compound collections, the development of high-throughput screening (HTS) started replacing previous approaches in drug discovery from around the 1980s onward. However, even today it is not always appropriate, or even feasible, to screen large collections of compounds in a particular assay. Here, we present an efficient method for iterative screening of small subsets of compound libraries. With this method, the retrieval of active compounds is optimized using their structural information and biological activity fingerprints. We validated this approach retrospectively on 34 Novartis in-house HTS assays covering a wide range of assay biology, including cell proliferation, antibacterial activity, gene expression, and phosphorylation. This method was employed to retrieve subsets of compounds for screening, where selected hits from any given round of screening were used as starting points to select chemically and biologically similar compounds for the next iteration. By only screening ∼1% of the full screening collection (∼15 000 compounds), the method consistently retrieves diverse compounds belonging to the top 0.5% of the most active compounds for the HTS campaign. For most of the assays, over half of the compounds selected by the method were found to be among the 5% most active compounds of the corresponding full-deck HTS. In addition, the stringency of the iterative method can be modified depending on the number of compounds one can afford to screen, making it a flexible tool to discover active compounds efficiently.S. Paricharak thanks the Netherlands Organisation for Scientific Research (NWO, grant number NWO-017.009-065), Novartis Institutes for BioMedical Research (NIBR) and the Prins Bernhard Cultuurfonds for funding and C. Parker, M. Frederiksen, G. Landrum and N. Fechner for insightful discussions.This is the author accepted manuscript. The final version is available from ACS Publications via http://dx.doi.org/10.1021/acschembio.6b0002

    Determination of minimal transcriptional signatures of compounds for target prediction

    Get PDF
    The identification of molecular target and mechanism of action of compounds is a key hurdle in drug discovery. Multiplexed techniques for bead-based expression profiling allow the measurement of transcriptional signatures of compound-treated cells in high-throughput mode. Such profiles can be used to gain insight into compounds' mode of action and the protein targets they are modulating. Through the proxy of target prediction from such gene signatures we explored important aspects of the use of transcriptional profiles to capture biological variability of perturbed cellular assays. We found that signatures derived from expression data and signatures derived from biological interaction networks performed equally well, and we showed that gene signatures can be optimised using a genetic algorithm. Gene signatures of approximately 128 genes seemed to be most generic, capturing a maximum of the perturbation inflicted on cells through compound treatment. Moreover, we found evidence for oxidative phosphorylation to be one of the most general ways to capture compound perturbation

    Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information

    Get PDF
    The Online Chemical Modeling Environment is a web-based platform that aims to automate and simplify the typical steps required for QSAR modeling. The platform consists of two major subsystems: the database of experimental measurements and the modeling framework. A user-contributed database contains a set of tools for easy input, search and modification of thousands of records. The OCHEM database is based on the wiki principle and focuses primarily on the quality and verifiability of the data. The database is tightly integrated with the modeling framework, which supports all the steps required to create a predictive model: data search, calculation and selection of a vast variety of molecular descriptors, application of machine learning methods, validation, analysis of the model and assessment of the applicability domain. As compared to other similar systems, OCHEM is not intended to re-implement the existing tools or models but rather to invite the original authors to contribute their results, make them publicly available, share them with other users and to become members of the growing research community. Our intention is to make OCHEM a widely used platform to perform the QSPR/QSAR studies online and share it with other users on the Web. The ultimate goal of OCHEM is collecting all possible chemoinformatics tools within one simple, reliable and user-friendly resource. The OCHEM is free for web users and it is available online at http://www.ochem.eu

    The RSPO–LGR4/5–ZNRF3/RNF43 module controls liver zonation and size

    Get PDF
    LGR4/5 receptors and their cognate RSPO ligands potentiate Wnt/β-catenin signalling and promote proliferation and tissue homeostasis in epithelial stem cell compartments. In the liver, metabolic zonation requires a Wnt/β-catenin signalling gradient, but the instructive mechanism controlling its spatiotemporal regulation is not known. We have now identified the RSPO-LGR4/5-ZNRF3/RNF43 module as a master regulator of Wnt/β-catenin-mediated metabolic liver zonation. Liver-specific LGR4/5 loss of function (LOF) or RSPO blockade disrupted hepatic Wnt/β-catenin signalling and zonation. Conversely, pathway activation in ZNRF3/RNF43 LOF mice or with recombinant RSPO1 protein expanded the hepatic Wnt/β-catenin signalling gradient in a reversible and LGR4/5-dependent manner. Recombinant RSPO1 protein increased liver size and improved liver regeneration, whereas LGR4/5 LOF caused the opposite effects, resulting in hypoplastic livers. Furthermore, we show that LGR4(+) hepatocytes throughout the lobule contribute to liver homeostasis without zonal dominance. Taken together, our results indicate that the RSPO-LGR4/5-ZNRF3/RNF43 module controls metabolic liver zonation and is a hepatic growth/size rheostat during development, homeostasis and regeneration

    Postdoc position advertisement

    No full text
    Postdoctoral position in Computational Sciences, Basel We are currently seeking candidates for a postdoctoral position for a computational project. We are characterizing fibrotic processes in the liver in the context of non-alcoholic fatty liver disease/steatohepatitis (NAFLD/NASH) with the full range of discovery technologies available: both single cell and bulk transcriptomic sequencing of patient-derived material, genome-wide CRISPR/Cas9 genetic screening, low molecular weight compound screening, preclinical animal models and organoid systems. Concurrently, we characterize the biology of non-healing topical wounds (e.g. diabetic foot ulcer). Focusing on the biological processes involved, it is emerging that the biology at the core of fibrosis, wound healing, and some connective tissue disorders, is conserved. The aim of the research project is to better understand the pathophysiology of extracellular matrix biology in the context of fibrotic disease. We will use all appropriate data of sufficient quality to generate a comprehensive understanding of conserved homeostatic and disease processes of relevance to fibrosis across different organs. These data will allow us to derive tissue-specific disease signatures that we then use to identify potential novel approaches for pharmacological modulation (e.g., computational repurposing) of the malfunctioning processes and pathways, and we will pursue experimental validation of these. This research project is highly interdisciplinary, and combines data from current state-of-the-art technologies. Through the application of advanced computational analysis of extracellular matrix biology we aim to define new starting points for drug discovery

    View-Based Object Recognition Using ND Tensor Supervised Neighborhood Embedding

    No full text
    corecore