778 research outputs found

    Chemoinformatics Research at the University of Sheffield: A History and Citation Analysis

    Get PDF
    This paper reviews the work of the Chemoinformatics Research Group in the Department of Information Studies at the University of Sheffield, focusing particularly on the work carried out in the period 1985-2002. Four major research areas are discussed, these involving the development of methods for: substructure searching in databases of three-dimensional structures, including both rigid and flexible molecules; the representation and searching of the Markush structures that occur in chemical patents; similarity searching in databases of both two-dimensional and three-dimensional structures; and compound selection and the design of combinatorial libraries. An analysis of citations to 321 publications from the Group shows that it attracted a total of 3725 residual citations during the period 1980-2002. These citations appeared in 411 different journals, and involved 910 different citing organizations from 54 different countries, thus demonstrating the widespread impact of the Group's work

    Computational approaches to virtual screening in human central nervous system therapeutic targets

    Get PDF
    In the past several years of drug design, advanced high-throughput synthetic and analytical chemical technologies are continuously producing a large number of compounds. These large collections of chemical structures have resulted in many public and commercial molecular databases. Thus, the availability of larger data sets provided the opportunity for developing new knowledge mining or virtual screening (VS) methods. Therefore, this research work is motivated by the fact that one of the main interests in the modern drug discovery process is the development of new methods to predict compounds with large therapeutic profiles (multi-targeting activity), which is essential for the discovery of novel drug candidates against complex multifactorial diseases like central nervous system (CNS) disorders. This work aims to advance VS approaches by providing a deeper understanding of the relationship between chemical structure and pharmacological properties and design new fast and robust tools for drug designing against different targets/pathways. To accomplish the defined goals, the first challenge is dealing with big data set of diverse molecular structures to derive a correlation between structures and activity. However, an extendable and a customizable fully automated in-silico Quantitative-Structure Activity Relationship (QSAR) modeling framework was developed in the first phase of this work. QSAR models are computationally fast and powerful tool to screen huge databases of compounds to determine the biological properties of chemical molecules based on their chemical structure. The generated framework reliably implemented a full QSAR modeling pipeline from data preparation to model building and validation. The main distinctive features of the designed framework include a)efficient data curation b) prior estimation of data modelability and, c)an-optimized variable selection methodology that was able to identify the most biologically relevant features responsible for compound activity. Since the underlying principle in QSAR modeling is the assumption that the structures of molecules are mainly responsible for their pharmacological activity, the accuracy of different structural representation approaches to decode molecular structural information largely influence model predictability. However, to find the best approach in QSAR modeling, a comparative analysis of two main categories of molecular representations that included descriptor-based (vector space) and distance-based (metric space) methods was carried out. Results obtained from five QSAR data sets showed that distance-based method was superior to capture the more relevant structural elements for the accurate characterization of molecular properties in highly diverse data sets (remote chemical space regions). This finding further assisted to the development of a novel tool for molecular space visualization to increase the understanding of structure-activity relationships (SAR) in drug discovery projects by exploring the diversity of large heterogeneous chemical data. In the proposed visual approach, four nonlinear DR methods were tested to represent molecules lower dimensionality (2D projected space) on which a non-parametric 2D kernel density estimation (KDE) was applied to map the most likely activity regions (activity surfaces). The analysis of the produced probabilistic surface of molecular activities (PSMAs) from the four datasets showed that these maps have both descriptive and predictive power, thus can be used as a spatial classification model, a tool to perform VS using only structural similarity of molecules. The above QSAR modeling approach was complemented with molecular docking, an approach that predicts the best mode of drug-target interaction. Both approaches were integrated to develop a rational and re-usable polypharmacology-based VS pipeline with improved hits identification rate. For the validation of the developed pipeline, a dual-targeting drug designing model against Parkinson’s disease (PD) was derived to identify novel inhibitors for improving the motor functions of PD patients by enhancing the bioavailability of dopamine and avoiding neurotoxicity. The proposed approach can easily be extended to more complex multi-targeting disease models containing several targets and anti/offtargets to achieve increased efficacy and reduced toxicity in multifactorial diseases like CNS disorders and cancer. This thesis addresses several issues of cheminformatics methods (e.g., molecular structures representation, machine learning, and molecular similarity analysis) to improve and design new computational approaches used in chemical data mining. Moreover, an integrative drug-designing pipeline is designed to improve polypharmacology-based VS approach. This presented methodology can identify the most promising multi-targeting candidates for experimental validation of drug-targets network at the systems biology level in the drug discovery process

    Review of QSAR Models and Software Tools for predicting Biokinetic Properties

    Get PDF
    In the assessment of industrial chemicals, cosmetic ingredients, and active substances in pesticides and biocides, metabolites and degradates are rarely tested for their toxicologcal effects in mammals. In the interests of animal welfare and cost-effectiveness, alternatives to animal testing are needed in the evaluation of these types of chemicals. In this report we review the current status of various types of in silico estimation methods for Absorption, Distribution, Metabolism and Excretion (ADME) properties, which are often important in discriminating between the toxicological profiles of parent compounds and their metabolites/degradation products. The review was performed in a broad sense, with emphasis on QSARs and rule-based approaches and their applicability to estimation of oral bioavailability, human intestinal absorption, blood-brain barrier penetration, plasma protein binding, metabolism and. This revealed a vast and rapidly growing literature and a range of software tools. While it is difficult to give firm conclusions on the applicability of such tools, it is clear that many have been developed with pharmaceutical applications in mind, and as such may not be applicable to other types of chemicals (this would require further research investigation). On the other hand, a range of predictive methodologies have been explored and found promising, so there is merit in pursuing their applicability in the assessment of other types of chemicals and products. Many of the software tools are not transparent in terms of their predictive algorithms or underlying datasets. However, the literature identifies a set of commonly used descriptors that have been found useful in ADME prediction, so further research and model development activities could be based on such studies.JRC.DG.I.6-Systems toxicolog

    Machine Learning Approaches for Improving Prediction Performance of Structure-Activity Relationship Models

    Get PDF
    In silico bioactivity prediction studies are designed to complement in vivo and in vitro efforts to assess the activity and properties of small molecules. In silico methods such as Quantitative Structure-Activity/Property Relationship (QSAR) are used to correlate the structure of a molecule to its biological property in drug design and toxicological studies. In this body of work, I started with two in-depth reviews into the application of machine learning based approaches and feature reduction methods to QSAR, and then investigated solutions to three common challenges faced in machine learning based QSAR studies. First, to improve the prediction accuracy of learning from imbalanced data, Synthetic Minority Over-sampling Technique (SMOTE) and Edited Nearest Neighbor (ENN) algorithms combined with bagging as an ensemble strategy was evaluated. The Friedman’s aligned ranks test and the subsequent Bergmann-Hommel post hoc test showed that this method significantly outperformed other conventional methods. SMOTEENN with bagging became less effective when IR exceeded a certain threshold (e.g., \u3e40). The ability to separate the few active compounds from the vast amounts of inactive ones is of great importance in computational toxicology. Deep neural networks (DNN) and random forest (RF), representing deep and shallow learning algorithms, respectively, were chosen to carry out structure-activity relationship-based chemical toxicity prediction. Results suggest that DNN significantly outperformed RF (p \u3c 0.001, ANOVA) by 22-27% for four metrics (precision, recall, F-measure, and AUPRC) and by 11% for another (AUROC). Lastly, current features used for QSAR based machine learning are often very sparse and limited by the logic and mathematical processes used to compute them. Transformer embedding features (TEF) were developed as new continuous vector descriptors/features using the latent space embedding from a multi-head self-attention. The significance of TEF as new descriptors was evaluated by applying them to tasks such as predictive modeling, clustering, and similarity search. An accuracy of 84% on the Ames mutagenicity test indicates that these new features has a correlation to biological activity. Overall, the findings in this study can be applied to improve the performance of machine learning based Quantitative Structure-Activity/Property Relationship (QSAR) efforts for enhanced drug discovery and toxicology assessments

    Applicability domains of neural networks for toxicity prediction

    Get PDF
    In this paper, the term "applicability domain" refers to the range of chemical compounds for which the statistical quantitative structure-activity relationship (QSAR) model can accurately predict their toxicity. This is a crucial concept in the development and practical use of these models. First, a multidisciplinary review is provided regarding the theory and practice of applicability domains in the context of toxicity problems using the classical QSAR model. Then, the advantages and improved performance of neural networks (NNs), which are the most promising machine learning algorithms, are reviewed. Within the domain of medicinal chemistry, nine different methods using NNs for toxicity prediction were compared utilizing 29 alternative artificial intelligence (AI) techniques. Similarly, seven NN-based toxicity prediction methodologies were compared to six other AI techniques within the realm of food safety, 11 NN-based methodologies were compared to 16 different AI approaches in the environmental sciences category and four specific NN-based toxicity prediction methodologies were compared to nine alternative AI techniques in the field of industrial hygiene. Within the reviewed approaches, given known toxic compound descriptors and behaviors, we observed a difficulty in being able to extrapolate and predict the effects with untested chemical compounds. Different methods can be used for unsupervised clustering, such as distance-based approaches and consensus-based decision methods. Additionally, the importance of model validation has been highlighted within a regulatory context according to the Organization for Economic Co-operation and Development (OECD) principles, to predict the toxicity of potential new drugs in medicinal chemistry, to determine the limits of detection for harmful substances in food to predict the toxicity limits of chemicals in the environment, and to predict the exposure limits to harmful substances in the workplace. Despite its importance, a thorough application of toxicity models is still restricted in the field of medicinal chemistry and is virtually overlooked in other scientific domains. Consequently, only a small proportion of the toxicity studies conducted in medicinal chemistry consider the applicability domain in their mathematical models, thereby limiting their predictive power to untested drugs. Conversely, the applicability of these models is crucial; however, this has not been sufficiently assessed in toxicity prediction or in other related areas such as food science, environmental science, and industrial hygiene. Thus, this review sheds light on the prevalent use of Neural Networks in toxicity prediction, thereby serving as a valuable resource for researchers and practitioners across these multifaceted domains that could be extended to other fields in future research

    Toxicity prediction using multi-disciplinary data integration and novel computational approaches

    Get PDF
    Current predictive tools used for human health assessment of potential chemical hazards rely primarily on either chemical structural information (i.e., cheminformatics) or bioassay data (i.e., bioinformatics). Emerging data sources such as chemical libraries, high throughput assays and health databases offer new possibilities for evaluating chemical toxicity as an integrated system and overcome the limited predictivity of current fragmented efforts; yet, few studies have combined the new data streams. This dissertation tested the hypothesis that integrative computational toxicology approaches drawing upon diverse data sources would improve the prediction and interpretation of chemically induced diseases. First, chemical structures and toxicogenomics data were used to predict hepatotoxicity. Compared with conventional cheminformatics or toxicogenomics models, interpretation was enriched by the chemical and biological insights even though prediction accuracy did not improve. This motivated the second project that developed a novel integrative method, chemical-biological read-across (CBRA), that led to predictive and interpretable models amenable to visualization. CBRA was consistently among the most accurate models on four chemical-biological data sets. It highlighted chemical and biological features for interpretation and the visualizations aided transparency. Third, we developed an integrative workflow that interfaced cheminformatics prediction with pharmacoepidemiology validation using a case study of Stevens Johnson Syndrome (SJS), an adverse drug reaction (ADR) of major public health concern. Cheminformatics models first predicted potential SJS inducers and non-inducers, prioritizing them for subsequent pharmacoepidemiology evaluation, which then confirmed that predicted non-inducers were statistically associated with fewer SJS occurrences. By combining cheminformatics' ability to predict SJS as soon as drug structures are known, and pharmacoepidemiology's statistical rigor, we have provided a universal scheme for more effective study of SJS and other ADRs. Overall, this work demonstrated that integrative approaches could deliver more predictive and interpretable models. These models can then reliably prioritize high risk chemicals for further testing, allowing optimization of testing resources. A broader implication of this research is the growing role we envision for integrative methods that will take advantage of the various emerging data sources.Doctor of Philosoph
    • …
    corecore