58 research outputs found

    Enumerating tree-like polyphenyl isomers

    Get PDF
    NSFC [10831001]Enumeration of molecules is one of the fundamental problems in bioinformatics and plays an important role in drug discovery, experimental structure elucidation (e.g., by using NMR or mass spectrometry), molecular design and virtual library construction. We consider the enumeration of tree-like polyphenyls (C(6)nH(4n+2)). For this purpose, we de fine two generating functions T (x) and R (x) involving the numbers t(n) and r(n) of tree-like polyphenyls (TL-polyphenyls) and monosubstituted tree-like polyphenyls (MTL-polyphenyls), respectively. By characterizing the symmetry groups with respect to TL-polyphenyls and MTL-polyphenyls, we establish two functional equations for these two generating functions. This yields for the first time an efficient recursion formula for calculating the numbers t(n) and r(n). The two functional equations are also the fundamentals for analyzing their asymptotic behaviors, from which we derive the precise asymptotic values for both r(n) and t(n). The resulting asymptotic values are shown to fit well to the numerical results obtained by using our recursion formula. Finally, we give an explicit enumerating expression for TL-polyphenyls of a particular type: the linear polyphenyls

    Enumerating molecules.

    Full text link

    A New Integer Linear Programming Formulation to the Inverse QSAR/QSPR for Acyclic Chemical Compounds Using Skeleton Trees

    Get PDF
    33rd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2020, Kitakyushu, Japan, September 22-25, 2020.Computer-aided drug design is one of important application areas of intelligent systems. Recently a novel method has been proposed for inverse QSAR/QSPR using both artificial neural networks (ANN) and mixed integer linear programming (MILP), where inverse QSAR/QSPR is a major approach for drug design. This method consists of two phases: In the first phase, a feature function f is defined so that each chemical compound G is converted into a vector f(G) of several descriptors of G, and a prediction function ψ is constructed with an ANN so that ψ(f(G)) takes a value nearly equal to a given chemical property π for many chemical compounds G in a data set. In the second phase, given a target value y∗ of the chemical property π , a chemical structure G∗ is inferred in the following way. An MILP M is formulated so that M admits a feasible solution (x∗, y∗) if and only if there exist vectors x∗, y∗ and a chemical compound G∗ such that ψ(x∗)=y∗ and f(G∗)=x∗. The method has been implemented for inferring acyclic chemical compounds. In this paper, we propose a new MILP for inferring acyclic chemical compounds by introducing a novel concept, skeleton tree, and conducted computational experiments. The results suggest that the proposed method outperforms the existing method when the diameter of graphs is up to around 6 to 8. For an instance for inferring acyclic chemical compounds with 38 non-hydrogen atoms from C, O and S and diameter 6, our method was 5×104 times faster

    A novel method for inference of chemical compounds of cycle index two with desired properties based on artificial neural networks and integer programming

    Get PDF
    Inference of chemical compounds with desired properties is important for drug design, chemo-informatics, and bioinformatics, to which various algorithmic and machine learning techniques have been applied. Recently, a novel method has been proposed for this inference problem using both artificial neural networks (ANN) and mixed integer linear programming (MILP). This method consists of the training phase and the inverse prediction phase. In the training phase, an ANN is trained so that the output of the ANN takes a value nearly equal to a given chemical property for each sample. In the inverse prediction phase, a chemical structure is inferred using MILP and enumeration so that the structure can have a desired output value for the trained ANN. However, the framework has been applied only to the case of acyclic and monocyclic chemical compounds so far. In this paper, we significantly extend the framework and present a new method for the inference problem for rank-2 chemical compounds (chemical graphs with cycle index 2). The results of computational experiments using such chemical properties as octanol/water partition coefficient, melting point, and boiling point suggest that the proposed method is much more useful than the previous method

    A novel method for inference of acyclic chemical compounds with bounded branch-height based on artificial neural networks and integer programming

    Get PDF
    Analysis of chemical graphs is becoming a major research topic in computational molecular biology due to its potential applications to drug design. One of the major approaches in such a study is inverse quantitative structure activity/property relationship (inverse QSAR/QSPR) analysis, which is to infer chemical structures from given chemical activities/properties. Recently, a novel two-phase framework has been proposed for inverse QSAR/QSPR, where in the first phase an artificial neural network (ANN) is used to construct a prediction function. In the second phase, a mixed integer linear program (MILP) formulated on the trained ANN and a graph search algorithm are used to infer desired chemical structures. The framework has been applied to the case of chemical compounds with cycle index up to 2 so far. The computational results conducted on instances with n non-hydrogen atoms show that a feature vector can be inferred by solving an MILP for up to n=40, whereas graphs can be enumerated for up to n=15. When applied to the case of chemical acyclic graphs, the maximum computable diameter of a chemical structure was up to 8. In this paper, we introduce a new characterization of graph structure, called “branch-height” based on which a new MILP formulation and a new graph search algorithm are designed for chemical acyclic graphs. The results of computational experiments using such chemical properties as octanol/water partition coefficient, boiling point and heat of combustion suggest that the proposed method can infer chemical acyclic graphs with around n=50 and diameter 30

    Analysis of Generative Chemistries

    Get PDF
    For the modelling of chemistry we use undirected, labelled graphs as explicit models of molecules and graph transformation rules for modelling generalised chemical reactions. This is used to define artificial chemistries on the level of individual bonds and atoms, where formal graph grammars implicitly represent large spaces of chemical compounds. We use a graph rewriting formalism, rooted in category theory, called the Double Pushout approach, which directly expresses the transition state of chemical reactions. Using concurrency theory for transformation rules, we define algorithms for the composition of rewrite rules in a chemically intuitive manner that enable automatic abstraction of the level of detail in chemical pathways. Based on this rule composition we define an algorithmic framework for generation of vast reaction networks for specific spaces of a given chemistry, while still maintaining the level of detail of the model down to the atomic level. The framework also allows for computation with graphs and graph grammars, which is utilised to model non-trivial chemical systems. The graph generation relies on graph isomorphism testing, and we review the general individualisation-refinement paradigm used in the state-of-the-art algorithms for graph canonicalisation, isomorphism testing, and automorphism discovery. We present a model for chemical pathways based on a generalisation of network flows from ordinary directed graphs to directed hypergraphs. The model allows for reasoning about the flow of individual molecules in general pathways, and the introduction of chemically motivated routing constraints. It further provides the foundation for defining specialised pathway motifs, which is illustrated by defining necessary topological constraints for both catalytic and autocatalytic pathways. We also prove that central types of pathway questions are NP-complete, even for restricted classes of reaction networks. The complete pathway model, including constraints for catalytic and autocatalytic pathways, is implemented using integer linear programming. This implementation is used in a tree search method to enumerate both optimal and near-optimal pathway solutions. The formal methods are applied to multiple chemical systems: the enzyme catalysed beta-lactamase reaction, variations of the glycolysis pathway, and the formose process. In each of these systems we use rule composition to abstract pathways and calculate traces for isotope labelled carbon atoms. The pathway model is used to automatically enumerate alternative non-oxidative glycolysis pathways, and enumerate thousands of candidates for autocatalytic pathways in the formose process

    Estimation method for the thermochemical properties of polycyclic aromatic molecules

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Chemical Engineering, 2005.Includes bibliographical references.Polycyclic aromatic molecules, including polycyclic aromatic hydrocarbons (PAHs) have attracted considerable attention in the past few decades. They are formed during the incomplete combustion of hydrocarbon fuels and are precursors of soot. Some PAHs are known carcinogens, and control of their emissions is an important issue. These molecules are found in many materials, including coal, fuel oils, lubricants, and carbon black. They are also implicated in the formation of fullerenes, one of the most. chemically versatile class of molecules known. Clearly, models that provide predictive capability for their formation and growth are highly desirable. Thlermochemical properties of the species in the model are often the most important parameter, particularly for high temperature processes such as the formation of PAH and other aromatic molecules. Thermodynamic consistency requires that reverse rate constants be calculated from the forward rate constants and from the equilibrium constants. The later are obtained from the thermochemical properties of reactants and products. The predictive ability of current kinetic models is significantly limited by the scarcity of available thermochemical data.(cont.) In this work we present the development of a Bond-Centered Group Additivity method for the estimation of the thermochemical properties of polycyclic aromatic molecules, including PAHs, molecules with the furan substructure, molecules with triple bonds, substituted PAHs, and radicals. This method is based on thermochemical values of about two hundred polycyclic aromatic molecules and radicals obtained from quantum chemical calculations at the B3LYP/6-31G(d) level. A consistent set of homodesmic reactions has been developed to accurately calculate the heat of formation from the absolute energy. The entropies calculated from the B3LYP/6-31G(d) vibrational frequencies are shown to be at least as reliable as the few available experimental values. This new Bond-Centered Group Additivity method predicts the thermochemistry of C₆₀ and C₇₀ fullerenes, as well as smaller aromatic molecules, with accuracy comparable to both experiments and the best quantum calculations. This Bond-Centered Group Additivity method is shown to extrapolate reasonably to infinite graphene sheets.(cont.) The Bond-Centered Group Additivity method has been implemented into a computer code within the automatic Reaction Mechanism Generation software (RMG) developed in our group. The database has been organized as a tree structure, making its maintenance and possible extension very straightforward. This computer code allows the fast and easy use of this estimation method by non-expert users. Moreover, since it is incorporated into RMG, it will allow users to generate reaction mechanisms that include aromatic molecules whose thermochemical properties are calculated using the Bond-Centered Group Additivity method. Exploratory equilibrium studies were performed (l. Equilibrium concentrations of individual species depend strongly on the thermochemistry of the individual species, emphasizing the importance of consistent thermochemistry for all the species involved in the calculations. Equilibrium calculations can provide many interesting insights into the relationship between PAH and fullerenes in combustion.by Joanna Yu.Ph.D

    Publications

    Get PDF
    This Annual Report covers from 1 January to 31 December 201

    Building robust chemical reaction mechanisms : next generation of automatic model construction software

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Chemical Engineering, 2004.Includes bibliographical references (p. 308-319).Building proper reaction mechanisms is crucial to model the system dynamic properties for many industrial processes with complex chemical reaction phenomena. Because of the complexity of a reaction mechanism, computer-aided reaction mechanism generation software appeared in recent years to help people efficiently build large reaction mechanisms. However, the limitations of those programs, such as not being able to model different types of reaction systems and to provide sufficiently precise thermodynamic and kinetic parameters impede their broad usage in real reaction system modeling. Targeting the drawbacks in current first-generation reaction modeling software, this thesis presents the second-generation of reaction mechanism construction software, Reaction Mechanism Generator, (RMG). In RMG, a new reaction template method is proposed to help quickly and flexibly define different types of reaction families, so that users can easily characterize any reaction system of interest without modifying the software. Furthermore, this work also presents new functional group tree approaches to construct hierarchical structured thermodynamics and kinetics databases for managing a large number of parameters, so that people are able to quickly and precisely identify better kinetics for different reactions in the same reaction family and to easily extend and update the databases with the latest research results. This new data-model dramatically improves the interface between the chemistry and computer science, removing many of the ambiguities that have plagued the field of chemical kinetics for many years, and greatly facilitating the maintenance and documentation of both the software and the databases that provide the key inputs to any chemical kinetic model.(cont.) The author applied object-oriented technology and unified modeling language in system analysis, architecture design, and implementation of RMG. Therefore it is designed and developed into a robust software with good architecture and detailed documentation, so that this software can be easily maintained, reused, and extended. RMG is successfully applied to generate a reaction mechanism for n-butane low temperature oxidation, which includes a complex autoignition process. The model generated by RMG caught the fundamental phenomena of autoignition, and the predicted ignition delay time and many major products' yields are in very good agreement with experimental data. This is the first time that model generation software automatically generated such a complicated reaction mechanism without human interference, and provided precise predictions on ignition delay and major products yields consistent with experimental data.by Jing Song.Ph.D
    corecore