45 research outputs found

    Autonomous discovery in the chemical sciences part II: Outlook

    Full text link
    This two-part review examines how automation has contributed to different aspects of discovery in the chemical sciences. In this second part, we reflect on a selection of exemplary studies. It is increasingly important to articulate what the role of automation and computation has been in the scientific process and how that has or has not accelerated discovery. One can argue that even the best automated systems have yet to ``discover'' despite being incredibly useful as laboratory assistants. We must carefully consider how they have been and can be applied to future problems of chemical discovery in order to effectively design and interact with future autonomous platforms. The majority of this article defines a large set of open research directions, including improving our ability to work with complex data, build empirical models, automate both physical and computational experiments for validation, select experiments, and evaluate whether we are making progress toward the ultimate goal of autonomous discovery. Addressing these practical and methodological challenges will greatly advance the extent to which autonomous systems can make meaningful discoveries.Comment: Revised version available at 10.1002/anie.20190998

    PREDICTIVE CHEMINFORMATICS ANALYSIS OF DIVERSE CHEMOGENOMICS DATA SOURCES: APPLICATIONS TO DRUG DISCOVERY, ASSAY INTERFERENCE, AND TEXT MINING

    Get PDF
    In this dissertation, we describe the cheminformatics analysis of diverse chemogenomics data sources as well as the application of these data to several drug discovery efforts. In Chapter 1, we describe the discovery and characterization of novel Ebola virus inhibitors through QSAR-based virtual screening. In Chapter 2, we report the discovery and analysis of a series of potent and selective doublecortin-like kinase 1 (DCLK1) inhibitors using QSAR modeling, virtual screening, Matched Molecular Pair Analysis (MMPA), and molecular docking. In Chapter 3, we performed a large-scale analysis of publicly available data in PubChem to probe the reliability and applicability of Pan-Assay INterference compoundS (PAINS) alerts, a popular computational drug screening tool. In Chapter 4, we explore the PubMed database as a novel source of biomedical data and describe the development of Chemotext, a publicly available web server capable of text-mining the published literature.Doctor of Philosoph

    Computational studies of biomolecules

    Get PDF
    In modern drug discovery, lead discovery is a term used to describe the overall process from hit discovery to lead optimisation, with the goal being to identify drug candidates. This can be greatly facilitated by the use of computer-aided (or in silico) techniques, which can reduce experimentation costs along the drug discovery pipeline. The range of relevant techniques include: molecular modelling to obtain structural information, molecular dynamics (which will be covered in Chapter 2), activity or property prediction by means of quantitative structure activity/property models (QSAR/QSPR), where machine learning techniques are introduced (to be covered in Chapter 1) and quantum chemistry, used to explain chemical structure, properties and reactivity. This thesis is divided into five parts. Chapter 1 starts with an outline of the early stages of drug discovery; introducing the use of virtual screening for hit and lead identification. Such approaches may roughly be divided into structure-based (docking, by far the most often referred to) and ligand-based, leading to a set of promising compounds for further evaluation. Then, the use of machine learning techniques, the issue of which will be frequently encountered, followed by a brief review of the "no free lunch" theorem, that describes how no learning algorithm can perform optimally on all problems. This implies that validation of predictive accuracy in multiple models is required for optimal model selection. As the dimensionality of the feature space increases, the issue referred to as "the curse of dimensionality" becomes a challenge. In closing, the last sections focus on supervised classification Random Forests. Computer-based analyses are an integral part of drug discovery. Chapter 2 begins with discussions of molecular docking; including strategies incorporating protein flexibility at global and local levels, then a specific focus on an automated docking program – AutoDock, which uses a Lamarckian genetic algorithm and empirical binding free energy function. In the second part of the chapter, a brief introduction of molecular dynamics will be given. Chapter 3 describes how we constructed a dataset of known binding sites with co-crystallised ligands, used to extract features characterising the structural and chemical properties of the binding pocket. A machine learning algorithm was adopted to create a three-way predictive model, capable of assigning each case to one of the classes (regular, orthosteric and allosteric) for in silico selection of allosteric sites, and by a feature selection algorithm (Gini) to rationalize the selection of important descriptors, most influential in classifying the binding pockets. In Chapter 4, we made use of structure-based virtual screening, and we focused on docking a fluorescent sensor to a non-canonical DNA quadruplex structure. The preferred binding poses, binding site, and the interactions are scored, followed by application of an ONIOM model to re-score the binding poses of some DNA-ligand complexes, focusing on only the best pose (with the lowest binding energy) from AutoDock. The use of a pre-generated conformational ensemble using MD to account for the receptors' flexibility followed by docking methods are termed “relaxed complex” schemes. Chapter 5 concerns the BLUF domain photocycle. We will be focused on conformational preference of some critical residues in the flavin binding site after a charge redistribution has been introduced. This work provides another activation model to address controversial features of the BLUF domain

    Development and critical evaluation of group contribution methods for the estimation of critical properties, liquid vapour pressure and liquid viscosity of organic compounds.

    Get PDF
    Thesis (Ph.D.)-University of KwaZulu-Natal, Durban, 2006.Critical properties, liquid vapour pressures and liquid viscosities are important thermophysical properties required for the design, simulation and optimisation of chemical plants. Unfortunately, experimental data for these properties are in most cases not available. Synthesis of sufficiently pure material and measurements of these data are expensive and time consuming. In many cases, the chemicals degrade or are hazardous to handle which makes experimental measurements difficult or impossible. Consequently, estimation methods are of great value to engineers. In this work, new group contribution methods have been developed for the estimation of critical properties, liquid vapour pressures and liquid viscosities of non-electrolyte organic compounds. The methods are based on the previous work of Nannoolal (2004) & Nannoolal et al. (2004) with minor modifications of structural group definitions. Critical properties, viz. critical temperature, critical pressure and critical volume, are of great practical importance as they must be known in order to use correlations based on the law of corresponding states. However, there is a lack of critical property data in literature as these data are difficult or in many cases impossible to measure. Critical property data are usually only available for smaller molecules of sufficient thermal stability. The proposed group contribution method for the estimation of critical properties reported an average absolute deviation of 4.3 K (0.74%), 100 kPa (2.96%) and 6.4 cm3.mol1 (1.79%) for a set of 588 critical temperatures, 486 critical pressures and 348 critical volumes stored in the Dortmund Data Bank (DDB (2006)), respectively. These results were the lowest deviations obtained when compared to ten well known estimation methods from literature. In addition, the method showed a wider range of applicability and the lowest probability of prediction failure and leads to physically realistic extrapolation when applied to a test set of components not included in the training set. For the estimation of the critical temperature using the new method, knowledge about the normal boiling point is required. If there is no information on the latter property, then the previous group contribution estimation method can be employed for estimation. Because of their great importance in chemical engineering, liquid vapour pressures have received much attention in literature. There is currently an abundance of experimental data for vapour pressures, especially for smaller molecules, but data are scarce or of low quality for larger and more complex molecules of low volatility. The estimation of liquid vapour pressures from molecular structure has met with very limited success. This is partly due to the high quality predictions required for vapour pressures for use in the design of for example distillation columns. This work presents a new technique for the estimation of liquid vapour pressures by developing a two-parameter equation where separate parameters model the absolute value and slope while at the same time the equation is able to approximate the nonlinearity of the curve. The fixed point or absolute value chosen was the normal boiling point for which a large amount of experimental data is available. A group contribution estimation of the slope was then developed which showed nearly no probability of prediction failure (high deviation). Employing experimental normal boiling points in the method, an absolute relative deviation of 6.2% in pressure for 1663 components or 68835 (68670 from DDB and 165 from Beilstein) data points was obtained. This result is in comparable accuracy or slightly higher in deviation than correlative models such as the Antoine and DIPPR equations (direct correlations). A test of the predictive capability by employing data that were not used in the training set also showed similar results. Estimations are possible up to the inflection point or a reduced normal boiling temperature of ±1.2. If there is no information about the experimental normal boiling point, two options are recommended to obtain this value. The first and more reliable is back-calculation using the known boiling point at other pressures and the estimated slope of the vapour pressure equation. Results in this case are similar to cases where experimental normal boiling points were used. The second possibility is to estimate the normal boiling point using the method developed previously. In this case, an absolute relative deviation of 27.0% in pressure is obtained. The saturated liquid viscosity is an important transport property that is required for many engineering applications. For this property, experimental data are limited to mostly simple and more common components and, even for these components the data often cover only a small temperature range. There have been many different approaches to estimate liquid viscosities of organic compounds. However, correlative and empirical methods are often the only or preferred means to obtain liquid viscosities. The technique used for the estimation of the liquid viscosity is similar to that in case of liquid vapour pressures, i.e. a two-parameter equation models the absolute value, slope and the non-linearity of the curve. As there was no convenient reference point at a standard viscosity available to model the absolute value (viscosity reference temperature), an algorithm was developed to calculate this temperature which was chosen at a viscosity of 1.3 cP. This work then presents a group contribution estimation of the slope and using calculated or adjusted reference temperatures, an absolute relative deviation of 3.4% in viscosity for 829 components or 12861 data points stored in the DDB was obtained. This result is in comparable accuracy or slightly higher in deviation than correlative models such as the Andrade and Vogel equations (direct correlations). The estimation method has an upper temperature limit which is similar to the limit in case of liquid vapour pressures. If no data are available for a viscosity close to 1.3 cP then, as in case of the vapour pressure estimation method, the temperature can be back calculated from data at other viscosity values. Alternately, the viscosity reference temperature can be estimated by a group contribution method developed in this work. This method reported an average absolute deviation of 7.1 K (2.5%) for 813 components. In case both the slope and absolute value were estimated for the liquid viscosity curve, an average absolute deviation of 15.3 % in viscosity for 813 components or 12139 data points stored in the DDB was obtained. The new method was shown to be far more accurate than other group contribution methods and at the same time has a wider range of applicability and lower probability of prediction failure. For the group contribution predictions, only the molecular structure of the compound is used. Structural groups were defined in a standardized form and fragmentation of the molecular structures was performed by an automatic procedure to eliminate any arbitrary assumptions. To enable comparison, chemical family definitions have been developed that allow one to automatically classify new components and thus inform the user about the expected reliability of the different methods for a component of interest. Chemical family definitions are based on the kind and frequency of the different structural groups in the molecule

    Development of a laboratory river model to determine the environmental impacts of key xenobiotic compounds.

    Get PDF
    Thesis (M.Sc.)-University of Natal, Pietermaritzburg, 1996.Microorganisms are increasingly used in toxicological studies to determine potential environmental impacts of xenobiotic compounds. A multi-stage laboratory model was developed to facilitate the examination of environmental impacts of selected pollutants on fundamental cycling processes inherent to aquatic ecosystems, namely, the degradation of organic substances and nitrogen transformations under aerobic conditions. A microbial association representative of riverine ecosystems was enriched for, isolated and cultured within the model. Characterisation of the microbial association were undertaken. Scanning electron microscopy and bright field microscopy revealed that a diverse heterogenous community of microorganisms had established within the model. Successional metabolic events, namely organic carbon catabolism, ammonification of organic nitrogen and the process of nitrification were differentiated in time and space with the microbial association integrity still being retained. The establishment of a microbial association within the model was primarily dependent on: dilution rates, specific growth rates and interactions between microorganisms and the prevailing environmental conditions. Growth-rate independent populations of microorganisms established within the model and were thought to contribute significantly to the metabolic processes within the model. Nitrifying activity was identified as a rate-limiting process within the model. Following separation of metabolic events, the ecotoxicological impacts of phenol and 2,4-dichlorophenol on the association were assessed. The biological oxidation of ammonia through to nitrate (nitrification) was found to be a sensitive indicator of perturbation. The model was found to be suitable for testing both acute and chronic intoxication by pollutant compounds as well as for biodegradation testing and the possible evaluation of ecotoxicological impacts of wastewater treatment plants. The main disadvantages of the model arose from its operational complexity, its empirical nature and its impracticality for screening large numbers of compounds. A bioassay based on the inhibition of ammonium oxidation was developed in order to fulfil the requirements for a simple and rapid test protocol for the initial screening of perturbant compounds
    corecore