6 research outputs found

    SMARTS Approach to Chemical Data Mining and Physicochemical Property Prediction.

    Full text link
    The calculation of physicochemical and biological properties is essential in order to facilitate modern drug discovery. Chemical spaces dimensionalized by these descriptors have been used to scaffold-hop in order to discover new lead and drug-like molecules. Broadening the boundaries of structure based drug design, these molecules are expected to share the same physiological target and have similar efficacy, as do known drug molecules sharing the same region in chemical property space. In the past few decades physicochemical and ADMET (absorption, distribution, metabolism, elimination, and toxicity) property predictors have been the subject of increased focus in academia and the pharmaceutical industry. Due to the ever increasing attention given to data mining and property predictions, we first discuss the sources of experimental pKa values and current methodologies used for pKa prediction in proteins and small molecules. Of particular concern is an analysis of the scope, statistical validity, overall accuracy, and predictive power of these methods. The expressed concerns are not limited to predicting pKa, but apply to all empirical predictive methodologies. In a bottom-up approach, we explored the influence of freely generated SMARTS string representations of molecular fragments on chelation and cytotoxicity. Later investigations, involving the derivation of predictive models, use stepwise regression to determine the optimal pool of SMARTS strings having the greatest influence over the property of interest. By applying a unique scoring system to sets of highly generalized SMARTS strings, we have constructed well balanced regression trees with predictive accuracy exceeding that of many published and commercially available models for cytotoxicity, pKa, and aqueous solubility. The methodology is robust, extremely adaptable, and can handle any molecular dataset with experimental data. This story details our struggles of data gathering, curation, and the development of a machine learning methodology able to derive and validate highly accurate regression trees capable of extremely fast property predictions. Regression trees created by our method are well suited to calculate descriptors for large in silico molecular libraries, facilitating data mining of chemical spaces in search of new lead molecules in drug discovery.Ph.D.Medicinal ChemistryUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/64627/1/adamclee_1.pd

    Examining the PM6 semiempirical method for pKa prediction across a wide range of oxyacids

    Get PDF
    The pK~a~ estimation ability of the semiempirical PM6 method was evaluated across a broad range of oxyacids and compared to results obtained using the SPARC software program. Compound classes under consideration included acetic acids, alicyclic and aromatic heterocyclic acids, benzoic acids, boronic acids, hydroxamic acids, oximes, peroxides, peroxyacids, phenols, α-saturated acids, α-saturated alcohols, sulfinic acids, α-unsaturated acids, and α-unsaturated alcohols. PM6 accurately predicts the acidity of acetic and benzoic acids and their derivatives, but is less reliable for alicyclic and aromatic heterocyclic acids and phenols. α-Saturated acids are reliably modeled by PM6 except for polyacid derivatives with α-alcohol moieties. α-Saturated alcohols only appear to yield reliable PM6 results where an α-hydroxy or α-alkoxy moiety is absent. Carboxylic acids with simple α-alkene unsaturation are well approximated by PM6 except where alkyne α-unsaturation or α-carboxylation are also present. The PM6 and SPARC methods exhibit approximately equal pKa prediction performance for the acetic, alicyclic, and benzoic acids. SPARC outperforms PM6 on the peroxides, peroxyacids, phenols, and α-saturated acids and α-saturated alcohols. pKa values for boron, nitrogen, and sulfur oxyacids do not appear to be reliably estimated by either the PM6 or SPARC methods. The findings will help guide the potential appropriateness of results from the PM6 pK~a~ estimation method for waste treatment and environmental fate investigations

    Dimorphite-DL and biotite-tools, two open source programs for the acceleration of structure-based drug design

    Get PDF
    Computer-aided drug design has seen a proliferation of tools that allow the manipulation of small molecule and macromolecular structures in increasingly high-throughput settings. Molecular dynamics simulations, small molecule docking software, and visualization tools allow researchers to rapidly identify drug candidates and narrow the list of compounds that experimentalists must consider for further testing. Any gap in automating computer-aided drug design thus delays potentially life-saving discoveries. Here we present two open-source programs we developed to address challenges facing both protein and ligand preparation. Dimorphite-DL is a lightweight python program that predicts protonation states of small molecules using an empirical approach to ensure accurate docking and modelling calculations. The presence or absence of a hydrogen atom often determines whether a given ligand will bind a protein of interest. Biotite-tools is a python package that provides several popular statistical functions for analyzing molecular dynamics simulations in an easy-to-use way. Conformational fluctuation is complex, and it can be challenging to extract insight from what is essentially a “protein movie.” As such, simulation analysis has largely been restricted to those with backgrounds in computation, limiting the scope of such a powerful tool. Biotite-tools aims to accelerate the efforts of those already working with molecular dynamics and make analysis more accessible to experimentalists

    Characterisation of data resources for in silico modelling: benchmark datasets for ADME properties.

    Get PDF
    Introduction: The cost of in vivo and in vitro screening of ADME properties of compounds has motivated efforts to develop a range of in silico models. At the heart of the development of any computational model are the data; high quality data are essential for developing robust and accurate models. The characteristics of a dataset, such as its availability, size, format and type of chemical identifiers used, influence the modelability of the data. Areas covered: This review explores the usefulness of publicly available ADME datasets for researchers to use in the development of predictive models. More than 140 ADME datasets were collated from publicly available resources and the modelability of 31selected datasets were assessed using specific criteria derived in this study. Expert opinion: Publicly available datasets differ significantly in information content and presentation. From a modelling perspective, datasets should be of adequate size, available in a user-friendly format with all chemical structures associated with one or more chemical identifiers suitable for automated processing (e.g. CAS number, SMILES string or InChIKey). Recommendations for assessing dataset suitability for modelling and publishing data in an appropriate format are discussed

    AMBIT RESTful web services: an implementation of the OpenTox application programming interface

    Get PDF
    The AMBIT web services package is one of the several existing independent implementations of the OpenTox Application Programming Interface and is built according to the principles of the Representational State Transfer (REST) architecture. The Open Source Predictive Toxicology Framework, developed by the partners in the EC FP7 OpenTox project, aims at providing a unified access to toxicity data and predictive models, as well as validation procedures. This is achieved by i) an information model, based on a common OWL-DL ontology ii) links to related ontologies; iii) data and algorithms, available through a standardized REST web services interface, where every compound, data set or predictive method has a unique web address, used to retrieve its Resource Description Framework (RDF) representation, or initiate the associated calculations

    Prediction of transformation products during ozonation of micropollutant-containing waters:kinetics and mechanisms

    Get PDF
    Ozonation, which is widely used for drinking water disinfection, has recently been applied to mitigate potentially harmful effects of micropollutants (e.g., pharmaceuticals, personal care products, pesticides, etc.) present in municipal wastewater effluents. Generally, ozonation is efficient for the abatement of biological effects caused by micropollutants. However, limited empirical information is available about the transformation products formed during ozonation of micropollutants due to analy-tical limitations and a large number of micropollutants present in wastewater effluents. In this thesis, a computer-based prediction platform for kinetics and mechanisms for the reactions of ozone with micropollutants was developed to provide information about (i) the reactivity of micropollutants with ozone expressed as second-order rate constants (kO3, M-1s-1) and (ii) potential transformation products formed from the reactions of ozone with micropollutants. Regarding (i), kO3 for micropollutants were predictable using linear relationships between experimental kO3 in log units for compounds of certain chemical groups (e.g., phenols, olefins, amines, etc.) and the corresponding molecular orbital energies (e.g., highest occupied molecular orbital (HOMO) or natural bond orbital (NBO)) obtained from quantum chemical computations (mostly R2 = 0.75 - 0.95 for 14 compound groups consisting of 284 model compounds in total). Overall, the developed kO3 prediction models could predict kO3 on average within a factor of ~5 of an experimental kO3 for model compounds used for the development of the kO3 prediction models as well as tetrachlorobutadienes, which were externally validated. In contrast, poor kO3 predictions (>10 fold) were observed for some model compounds excluded from the correlations as outliers as well as cetirizine, two pentachlorobutadiene congeners, and hexachlorobutadiene, which were used for external validation. (ii) A prediction tool for potential transformation products was developed based on numerous reaction pathways proposed in literature, which were encoded into 340 individual reaction rules using appropriate chemoinformatics tools. The predicted pathways and the transformation products for some micropollutants (i.e., carbamazepine and tramadol) were shown to be consistent with experimental observations. However, in the future, both kO3 and the pathway prediction modules need to be further validated with more compounds with experimental data and to be improved/updated accordingly. The developed prediction platform is expected to be useful for various groups of end-users in research and practice such as environmental engineers, chemists, or toxicologists. In addition, the treatability of 9 polychlorobutadienes, which are groundwater contaminants, with ozone, UV photolysis at 254nm, and their advanced oxidation processes (i.e., O3/H2O2 and UV/H2O2) was investigated. The abatement efficiencies for poly-chlorobutadienes during ozonation or O3/H2O2 in a natural groundwater could be well explained based on the experimental kO3 and kOH-values. UV treatment was shown to be effective for the abatement of polychlorobutadienes. However, the potential formation of photoisomers from UV irradiation of chlorobutadienes with either E or Z configurations needs to be taken into account because this isomerization will not necessarily lead to a loss of the biological effects of these compounds
    corecore