37 research outputs found
A New Integer Linear Programming Formulation to the Inverse QSAR/QSPR for Acyclic Chemical Compounds Using Skeleton Trees
33rd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2020, Kitakyushu, Japan, September 22-25, 2020.Computer-aided drug design is one of important application areas of intelligent systems. Recently a novel method has been proposed for inverse QSAR/QSPR using both artificial neural networks (ANN) and mixed integer linear programming (MILP), where inverse QSAR/QSPR is a major approach for drug design. This method consists of two phases: In the first phase, a feature function f is defined so that each chemical compound G is converted into a vector f(G) of several descriptors of G, and a prediction function ψ is constructed with an ANN so that ψ(f(G)) takes a value nearly equal to a given chemical property π for many chemical compounds G in a data set. In the second phase, given a target value y∗ of the chemical property π , a chemical structure G∗ is inferred in the following way. An MILP M is formulated so that M admits a feasible solution (x∗, y∗) if and only if there exist vectors x∗, y∗ and a chemical compound G∗ such that ψ(x∗)=y∗ and f(G∗)=x∗. The method has been implemented for inferring acyclic chemical compounds. In this paper, we propose a new MILP for inferring acyclic chemical compounds by introducing a novel concept, skeleton tree, and conducted computational experiments. The results suggest that the proposed method outperforms the existing method when the diameter of graphs is up to around 6 to 8. For an instance for inferring acyclic chemical compounds with 38 non-hydrogen atoms from C, O and S and diameter 6, our method was 5×104 times faster
Generation, Ranking and Unranking of Ordered Trees with Degree Bounds
We study the problem of generating, ranking and unranking of unlabeled
ordered trees whose nodes have maximum degree of . This class of trees
represents a generalization of chemical trees. A chemical tree is an unlabeled
tree in which no node has degree greater than 4. By allowing up to
children for each node of chemical tree instead of 4, we will have a
generalization of chemical trees. Here, we introduce a new encoding over an
alphabet of size 4 for representing unlabeled ordered trees with maximum degree
of . We use this encoding for generating these trees in A-order with
constant average time and O(n) worst case time. Due to the given encoding, with
a precomputation of size and time O(n^2) (assuming is constant), both
ranking and unranking algorithms are also designed taking O(n) and O(nlogn)
time complexities.Comment: In Proceedings DCM 2015, arXiv:1603.0053
A new approach to the design of acyclic chemical compounds using skeleton trees and integer linear programming
Intelligent systems are applied in a wide range of areas, and computer-aided drug design is a highly important one. One major approach to drug design is the inverse QSAR/QSPR (quantitative structure-activity and structure-property relationship), for which a method that uses both artificial neural networks (ANN) and mixed integer linear programming (MILP) has been proposed recently. This method consists of two phases: a forward prediction phase, and an inverse, inference phase. In the prediction phase, a feature function f over chemical compounds is defined, whereby a chemical compound G is represented as a vector f(G) of descriptors. Following, for a given chemical property π, using a dataset of chemical compounds with known values for property π, a regressive prediction function ψ is computed by an ANN. It is desired that ψ(f(G)) takes a value that is close to the true value of property π for the compound G for many of the compounds in the dataset. In the inference phase, one starts with a target value y∗ of the chemical property π, and then a chemical structure G∗ such that ψ(f(G∗)) is within a certain tolerance level of y∗ is constructed from the solution to a specially formulated MILP. This method has been used for the case of inferring acyclic chemical compounds. With this paper, we propose a new concept on acyclic chemical graphs, called a skeleton tree, and based on it develop a new MILP formulation for inferring acyclic chemical compounds. Our computational experiments indicate that our newly proposed method significantly outperforms the existing method when the diameter of graphs is up to 8. In a particular example where we inferred acyclic chemical compounds with 38 non-hydrogen atoms from the set {C, O, S} times faster
A novel method for inference of chemical compounds of cycle index two with desired properties based on artificial neural networks and integer programming
Inference of chemical compounds with desired properties is important for drug design, chemo-informatics, and bioinformatics, to which various algorithmic and machine learning techniques have been applied. Recently, a novel method has been proposed for this inference problem using both artificial neural networks (ANN) and mixed integer linear programming (MILP). This method consists of the training phase and the inverse prediction phase. In the training phase, an ANN is trained so that the output of the ANN takes a value nearly equal to a given chemical property for each sample. In the inverse prediction phase, a chemical structure is inferred using MILP and enumeration so that the structure can have a desired output value for the trained ANN. However, the framework has been applied only to the case of acyclic and monocyclic chemical compounds so far. In this paper, we significantly extend the framework and present a new method for the inference problem for rank-2 chemical compounds (chemical graphs with cycle index 2). The results of computational experiments using such chemical properties as octanol/water partition coefficient, melting point, and boiling point suggest that the proposed method is much more useful than the previous method
Enumerating tree-like polyphenyl isomers
NSFC [10831001]Enumeration of molecules is one of the fundamental problems in bioinformatics and plays an important role in drug discovery, experimental structure elucidation (e.g., by using NMR or mass spectrometry), molecular design and virtual library construction. We consider the enumeration of tree-like polyphenyls (C(6)nH(4n+2)). For this purpose, we de fine two generating functions T (x) and R (x) involving the numbers t(n) and r(n) of tree-like polyphenyls (TL-polyphenyls) and monosubstituted tree-like polyphenyls (MTL-polyphenyls), respectively. By characterizing the symmetry groups with respect to TL-polyphenyls and MTL-polyphenyls, we establish two functional equations for these two generating functions. This yields for the first time an efficient recursion formula for calculating the numbers t(n) and r(n). The two functional equations are also the fundamentals for analyzing their asymptotic behaviors, from which we derive the precise asymptotic values for both r(n) and t(n). The resulting asymptotic values are shown to fit well to the numerical results obtained by using our recursion formula. Finally, we give an explicit enumerating expression for TL-polyphenyls of a particular type: the linear polyphenyls
Recommended from our members
Modelling the evolution of biological complexity with a two-dimensional lattice self-assembly process
Self-assembling systems are prevalent across numerous scales of nature, lying at the heart of diverse physical and biological phenomena.
Individual protein subunits self-assembling into complexes is often a vital first step of biological processes.
Errors during protein assembly, due to mutations or misfolds, can have devastating effects and are responsible for an assortment of protein diseases, known as proteopathies.
With proteins exhibiting endless layers of complexity, building any all-encompassing model is unrealistic.
Coarse-grained models, despite not faithfully capturing every detail of the original system, have massive potential to assist understanding complex phenomenon.
A principal actor in self-assembly is the binding interactions between subunits, and so geometric constraints, polarity, kinetic forces, etc. can often be marginalised.
This work explores how self-assembly and its outcomes are inextricably tied to the involved interactions through the use of a two-dimensional lattice polyomino model.
%Armed with this tractable model, we can probe how dynamics acting on evolution are reflected in interaction properties.
First, this thesis addresses how the interaction characteristics of self-assembly building blocks determine what structures they form.
Specifically, if the same structures are consistently produced and remain finite in size.
Assembly graphs store subunit interaction information and are used in classifying these two properties, the determinism and boundedness respectively.
Arbitrary sets of building blocks are classified without the costly overhead of repeated stochastic assembling, improving both the analysis speed and accuracy.
Furthermore, assembly graphs naturally integrate combinatorial and graph techniques, enabling a wider range of future polyomino studies.
The second part narrows in on implications of nondeterministic assembly on interaction strength evolution.
Generalising subunit binding sites with mutable binary strings introduces such interaction strengths into the polyomino model.
Deterministic assemblies obey analytic expectations.
Conversely, interactions in nondeterministic assemblies rapidly diverge from equilibrium to minimise assembly inconsistency.
Optimal interaction strengths during assembly are also reflected in evolution.
Transitions between certain polyominoes are strongly forbidden when interaction strengths are misaligned.
The third aspect focuses on genetic duplication, an evolutionary event observed in organisms across all taxa.
Through polyomino evolutions, a duplication-heteromerisation pathway emerges as an efficient process.
This pathway exploits the advantages of both self-interactions and pairwise-interactions, and accelerates evolution by avoiding complexity bottlenecks.
Several simulation predictions are successfully validated against a large data set of protein complexes.
These results focus on coarse-grained models rather than quantified biological insight.
Despite this, they reinforce existing observations of protein complexes, as well as posing several new mechanisms for the evolution of biological complexity
A novel method for inference of acyclic chemical compounds with bounded branch-height based on artificial neural networks and integer programming
Analysis of chemical graphs is becoming a major research topic in computational molecular biology due to its potential applications to drug design. One of the major approaches in such a study is inverse quantitative structure activity/property relationship (inverse QSAR/QSPR) analysis, which is to infer chemical structures from given chemical activities/properties. Recently, a novel two-phase framework has been proposed for inverse QSAR/QSPR, where in the first phase an artificial neural network (ANN) is used to construct a prediction function. In the second phase, a mixed integer linear program (MILP) formulated on the trained ANN and a graph search algorithm are used to infer desired chemical structures. The framework has been applied to the case of chemical compounds with cycle index up to 2 so far. The computational results conducted on instances with n non-hydrogen atoms show that a feature vector can be inferred by solving an MILP for up to n=40, whereas graphs can be enumerated for up to n=15. When applied to the case of chemical acyclic graphs, the maximum computable diameter of a chemical structure was up to 8. In this paper, we introduce a new characterization of graph structure, called “branch-height” based on which a new MILP formulation and a new graph search algorithm are designed for chemical acyclic graphs. The results of computational experiments using such chemical properties as octanol/water partition coefficient, boiling point and heat of combustion suggest that the proposed method can infer chemical acyclic graphs with around n=50 and diameter 30