31 research outputs found
Inductive queries for a drug designing robot scientist
It is increasingly clear that machine learning algorithms need to be integrated in an iterative scientific discovery loop, in which data is queried repeatedly by means of inductive queries and where the computer provides guidance to the experiments that are being performed. In this chapter, we summarise several key challenges in achieving this integration of machine learning and data mining algorithms in methods for the discovery of Quantitative Structure Activity Relationships (QSARs). We introduce the concept of a robot scientist, in which all steps of the discovery process are automated; we discuss the representation of molecular data such that knowledge discovery tools can analyse it, and we discuss the adaptation of machine learning and data mining algorithms to guide QSAR experiments
Recommended from our members
Enumerating molecules.
This report is a comprehensive review of the field of molecular enumeration from early isomer counting theories to evolutionary algorithms that design molecules in silico. The core of the review is a detail account on how molecules are counted, enumerated, and sampled. The practical applications of molecular enumeration are also reviewed for chemical information, structure elucidation, molecular design, and combinatorial library design purposes. This review is to appear as a chapter in Reviews in Computational Chemistry volume 21 edited by Kenny B. Lipkowitz
Women in Science 2016
Women in Science 2016 summarizes research done by Smith College’s Summer Research Fellowship (SURF) Program participants. Ever since its 1967 start, SURF has been a cornerstone of Smith’s science education. In 2016, 150 students participated in SURF (144 hosted on campus and nearby eld sites), supervised by 56 faculty mentor-advisors drawn from the Clark Science Center and connected to its eighteen science, mathematics, and engineering departments and programs and associated centers and units. At summer’s end, SURF participants were asked to summarize their research experiences for this publication.https://scholarworks.smith.edu/clark_womeninscience/1005/thumbnail.jp
The Polytope Formalism: isomerism and associated unimolecular isomerisation
This thesis concerns the ontology of isomerism, this encompassing the conceptual frameworks and relationships that comprise the subject matter; the necessary formal definitions, nomenclature, and representations that have impacts reaching into unexpected areas such as drug registration and patent specifications; the requisite controlled and precise vocabulary that facilitates nuanced communication; and the digital/computational formalisms that underpin the chemistry software and database tools that empower chemists to perform much of their work.
Using conceptual tools taken from Combinatorics, and Graph Theory, means are presented to provide a unified description of isomerism and associated unimolecular isomerisation spanning both constitutional isomerism and stereoisomerism called the Polytope Formalism. This includes unification of the varying approaches historically taken to describe and understand stereoisomerism in organic and inorganic compounds.
Work for this Thesis began with the synthesis, isolation, and characterisation of compounds not adequately describable using existing IUPAC recommendations. Generalisation of the polytopal-rearrangements model of stereoisomerisation used for inorganic chemistry led to the prescriptions that could deal with the synthesised compounds, revealing an unrecognised fundamental form of isomerism called akamptisomerism.
Following on, this Thesis describes how in attempting to place akamptisomerism within the context of existing stereoisomerism reveals significant systematic deficiencies in the IUPAC recommendations. These shortcomings have limited the conceptualisation of broad classes of compounds and hindered development of molecules for medicinal and technological applications.
It is shown how the Polytope Formalism can be applied to the description of constitutional isomerism in a practical manner. Finally, a radically different medicinal chemistry design strategy with broad application, based upon the principles, is describe
Recommended from our members
Cheminformatics for genome-scale metabolic reconstructions
Genome-scale metabolic reconstructions are an important resource in the study of metabolism. They provide both a system and component level view of the biochemical transformations of metabolites. As more reconstructions have been created it remains a challenge to integrate and reason about their contents. This thesis focuses on the development of computational methods to allow on-demand comparison and alignment of metabolic reconstructions.
A novel method is introduced that utilises chemical structure representations to identify equivalent metabolites between reconstructions. Using a graph theoretic representation allows the identification and reasoning of metabolites that have a non-exact match. A key advantage is that the method uses the contents of reconstructions directly and does not rely on the creation or use of a common reference.
To annotate reconstructions with chemical structure representations an interactive desktop application is introduced. The application assists in the creation and curation of metabolic information using manual, semi-auto\-mated, and automated methods. Chemical structure representations can be retrieved, drawn, or generated to allow precise metabolite annotation.
In processing chemical information, efficient and optimised algorithms are required. Several areas are addressed and implementations have been contributed to the Chemistry Development Kit. Rings are a fundamental property of chemical structures therefore multiple ring definitions and fast algorithms are explored. Conversion and standardisation between structure representations present a challenge. Efficient algorithms to determine aromaticity, assign a Kekulé form, and generate tautomers are detailed.
Many enzymes are selective and specific to stereochemistry. Methods for the identification, depiction, comparison, and description of stereochemistry are described.The project was funded by Unilever, the Biotechnology and Biological Sciences Research Council [BB/I532153/1], and the European Molecular Biology Laboratory
Nenad Trinajstić – Pioneer of Chemical Graph Theory
We present a brief overview of many contributions of Nenad Trinajstić to Chemical Graph Theory, an important and fast developing branch of Theoretical Chemistry. In addition, we outline briefly the various activities of Trinajstić within the chemical community of Croatia. As
can be seen, his scientific work has been very productive and has not abated despite the hostilities towards the Chemical Graph Theory in certain chemical circles over the past 30 years. On the contrary, Trinajstić continued, widened the areas of his research interest, which started with investigating the close relationship between Graph Theory and HMO, and demonstrated the importance of Chemical Graph theory for chemistry. In more than one way he has proven the opponents of Chemical Graph Theory wrong, though some continue to fail to recognize the importance of Graph Theory in Chemistry
Estimation method for the thermochemical properties of polycyclic aromatic molecules
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Chemical Engineering, 2005.Includes bibliographical references.Polycyclic aromatic molecules, including polycyclic aromatic hydrocarbons (PAHs) have attracted considerable attention in the past few decades. They are formed during the incomplete combustion of hydrocarbon fuels and are precursors of soot. Some PAHs are known carcinogens, and control of their emissions is an important issue. These molecules are found in many materials, including coal, fuel oils, lubricants, and carbon black. They are also implicated in the formation of fullerenes, one of the most. chemically versatile class of molecules known. Clearly, models that provide predictive capability for their formation and growth are highly desirable. Thlermochemical properties of the species in the model are often the most important parameter, particularly for high temperature processes such as the formation of PAH and other aromatic molecules. Thermodynamic consistency requires that reverse rate constants be calculated from the forward rate constants and from the equilibrium constants. The later are obtained from the thermochemical properties of reactants and products. The predictive ability of current kinetic models is significantly limited by the scarcity of available thermochemical data.(cont.) In this work we present the development of a Bond-Centered Group Additivity method for the estimation of the thermochemical properties of polycyclic aromatic molecules, including PAHs, molecules with the furan substructure, molecules with triple bonds, substituted PAHs, and radicals. This method is based on thermochemical values of about two hundred polycyclic aromatic molecules and radicals obtained from quantum chemical calculations at the B3LYP/6-31G(d) level. A consistent set of homodesmic reactions has been developed to accurately calculate the heat of formation from the absolute energy. The entropies calculated from the B3LYP/6-31G(d) vibrational frequencies are shown to be at least as reliable as the few available experimental values. This new Bond-Centered Group Additivity method predicts the thermochemistry of C₆₀ and C₇₀ fullerenes, as well as smaller aromatic molecules, with accuracy comparable to both experiments and the best quantum calculations. This Bond-Centered Group Additivity method is shown to extrapolate reasonably to infinite graphene sheets.(cont.) The Bond-Centered Group Additivity method has been implemented into a computer code within the automatic Reaction Mechanism Generation software (RMG) developed in our group. The database has been organized as a tree structure, making its maintenance and possible extension very straightforward. This computer code allows the fast and easy use of this estimation method by non-expert users. Moreover, since it is incorporated into RMG, it will allow users to generate reaction mechanisms that include aromatic molecules whose thermochemical properties are calculated using the Bond-Centered Group Additivity method. Exploratory equilibrium studies were performed (l. Equilibrium concentrations of individual species depend strongly on the thermochemistry of the individual species, emphasizing the importance of consistent thermochemistry for all the species involved in the calculations. Equilibrium calculations can provide many interesting insights into the relationship between PAH and fullerenes in combustion.by Joanna Yu.Ph.D
Development and Improvement of Tools and Algorithms for the Problem of Atom Type Perception and for the Assessment of Protein-Ligand-Complex Geometries
In context of the present work, a scoring function for protein-ligand complexes has been developed, not aimed at affinity prediction, but rather a good recognition rate of near native geometries. The developed program DSX makes use of the same formalism as the knowledge-based scoring function DrugScore, hence using the knowledge from crystallographic databases and atom-type specific distance-dependent distribution functions. It is based on newly defined atom-types. Additionally, the program is augmented by two novel potentials which evaluate the torsion angles and (de-)solvation effects. Validation of DSX is based on a literature-known, comprehensive data-set that allows for comparison with other popular scoring functions.
DSX is intended for the recognition of near-native binding modes. In this important task, DSX outperforms the competitors, but is also among the best scoring functions regarding the ranking of different compounds.
Another essential step in the development of DSX was the automatical assignment of the new atom types. A powerful programming framework was implemented to fulfill this task. Validation was done on a literature-known data-set and showed superior efficiency and quality compared to similar programs where this data was available. The front-end fconv was developed to share this functionality with the scientific community. Multiple features useful in computational drug-design workflows are also included and fconv was made freely available as Open Source Project.
Based on the developed potentials for DSX, a number of further applications was created and impemented:
The program HotspotsX calculates favorable interaction fields in protein binding pockets that can be used as a starting point for pharmacophoric models and that indicate possible directions for the optimization of lead structures.
The program DSFP calculates scores based on fingerprints for given binding geometries. These fingerprints are compared with reference fingerprints that are derived from DSX interactions in known crystal structures of the particular target.
Finally, the program DSX_wat was developed to predict stable water networks within a binding pocket. DSX interaction fields are used to calculate the putative water positions