484 research outputs found

    Generation, Ranking and Unranking of Ordered Trees with Degree Bounds

    Full text link
    We study the problem of generating, ranking and unranking of unlabeled ordered trees whose nodes have maximum degree of Δ\Delta. This class of trees represents a generalization of chemical trees. A chemical tree is an unlabeled tree in which no node has degree greater than 4. By allowing up to Δ\Delta children for each node of chemical tree instead of 4, we will have a generalization of chemical trees. Here, we introduce a new encoding over an alphabet of size 4 for representing unlabeled ordered trees with maximum degree of Δ\Delta. We use this encoding for generating these trees in A-order with constant average time and O(n) worst case time. Due to the given encoding, with a precomputation of size and time O(n^2) (assuming Δ\Delta is constant), both ranking and unranking algorithms are also designed taking O(n) and O(nlogn) time complexities.Comment: In Proceedings DCM 2015, arXiv:1603.0053

    A novel method for inference of acyclic chemical compounds with bounded branch-height based on artificial neural networks and integer programming

    Get PDF
    Analysis of chemical graphs is becoming a major research topic in computational molecular biology due to its potential applications to drug design. One of the major approaches in such a study is inverse quantitative structure activity/property relationship (inverse QSAR/QSPR) analysis, which is to infer chemical structures from given chemical activities/properties. Recently, a novel two-phase framework has been proposed for inverse QSAR/QSPR, where in the first phase an artificial neural network (ANN) is used to construct a prediction function. In the second phase, a mixed integer linear program (MILP) formulated on the trained ANN and a graph search algorithm are used to infer desired chemical structures. The framework has been applied to the case of chemical compounds with cycle index up to 2 so far. The computational results conducted on instances with n non-hydrogen atoms show that a feature vector can be inferred by solving an MILP for up to n=40, whereas graphs can be enumerated for up to n=15. When applied to the case of chemical acyclic graphs, the maximum computable diameter of a chemical structure was up to 8. In this paper, we introduce a new characterization of graph structure, called “branch-height” based on which a new MILP formulation and a new graph search algorithm are designed for chemical acyclic graphs. The results of computational experiments using such chemical properties as octanol/water partition coefficient, boiling point and heat of combustion suggest that the proposed method can infer chemical acyclic graphs with around n=50 and diameter 30

    A novel method for inference of chemical compounds of cycle index two with desired properties based on artificial neural networks and integer programming

    Get PDF
    Inference of chemical compounds with desired properties is important for drug design, chemo-informatics, and bioinformatics, to which various algorithmic and machine learning techniques have been applied. Recently, a novel method has been proposed for this inference problem using both artificial neural networks (ANN) and mixed integer linear programming (MILP). This method consists of the training phase and the inverse prediction phase. In the training phase, an ANN is trained so that the output of the ANN takes a value nearly equal to a given chemical property for each sample. In the inverse prediction phase, a chemical structure is inferred using MILP and enumeration so that the structure can have a desired output value for the trained ANN. However, the framework has been applied only to the case of acyclic and monocyclic chemical compounds so far. In this paper, we significantly extend the framework and present a new method for the inference problem for rank-2 chemical compounds (chemical graphs with cycle index 2). The results of computational experiments using such chemical properties as octanol/water partition coefficient, melting point, and boiling point suggest that the proposed method is much more useful than the previous method

    A Novel Method for Inference of Acyclic Chemical Compounds with Bounded Branch-height Based on Artificial Neural Networks and Integer Programming

    Get PDF
    Analysis of chemical graphs is a major research topic in computational molecular biology due to its potential applications to drug design. One approach is inverse quantitative structure activity/property relationship (inverse QSAR/QSPR) analysis, which is to infer chemical structures from given chemical activities/properties. Recently, a framework has been proposed for inverse QSAR/QSPR using artificial neural networks (ANN) and mixed integer linear programming (MILP). This method consists of a prediction phase and an inverse prediction phase. In the first phase, a feature vector f(G)f(G) of a chemical graph GG is introduced and a prediction function ψ\psi on a chemical property π\pi is constructed with an ANN. In the second phase, given a target value yy^* of property π\pi, a feature vector xx^* is inferred by solving an MILP formulated from the trained ANN so that ψ(x)\psi(x^*) is close to yy^* and then a set of chemical structures GG^* such that f(G)=xf(G^*)= x^* is enumerated by a graph search algorithm. The framework has been applied to the case of chemical compounds with cycle index up to 2. The computational results conducted on instances with nn non-hydrogen atoms show that a feature vector xx^* can be inferred for up to around n=40n=40 whereas graphs GG^* can be enumerated for up to n=15n=15. When applied to the case of chemical acyclic graphs, the maximum computable diameter of GG^* was around up to around 8. We introduce a new characterization of graph structure, "branch-height," based on which an MILP formulation and a graph search algorithm are designed for chemical acyclic graphs. The results of computational experiments using properties such as octanol/water partition coefficient, boiling point and heat of combustion suggest that the proposed method can infer chemical acyclic graphs GG^* with n=50n=50 and diameter 30

    Efficient computation of rank probabilities in posets

    Get PDF
    As the title of this work indicates, the central theme in this work is the computation of rank probabilities of posets. Since the probability space consists of the set of all linear extensions of a given poset equipped with the uniform probability measure, in first instance we develop algorithms to explore this probability space efficiently. We consider in particular the problem of counting the number of linear extensions and the ability to generate extensions uniformly at random. Algorithms based on the lattice of ideals representation of a poset are developed. Since a weak order extension of a poset can be regarded as an order on the equivalence classes of a partition of the given poset not contradicting the underlying order, and thus as a generalization of the concept of a linear extension, algorithms are developed to count and generate weak order extensions uniformly at random as well. However, in order to reduce the inherent complexity of the problem, the cardinalities of the equivalence classes is fixed a priori. Due to the exponential nature of these algorithms this approach is still not always feasible, forcing one to resort to approximative algorithms if this is the case. It is well known that Markov chain Monte Carlo methods can be used to generate linear extensions uniformly at random, but no such approaches have been used to generate weak order extensions. Therefore, an algorithm that can be used to sample weak order extensions uniformly at random is introduced. A monotone assignment of labels to objects from a poset corresponds to the choice of a weak order extension of the poset. Since the random monotone assignment of such labels is a step in the generation process of random monotone data sets, the ability to generate random weak order extensions clearly is of great importance. The contributions from this part therefore prove useful in e.g. the field of supervised classification, where a need for synthetic random monotone data sets is present. The second part focuses on the ranking of the elements of a partially ordered set. Algorithms for the computation of the (mutual) rank probabilities that avoid having to enumerate all linear extensions are suggested and applied to a real-world data set containing pollution data of several regions in Baden-Württemberg (Germany). With the emergence of several initiatives aimed at protecting the environment like the REACH (Registration, Evaluation, Authorisation and Restriction of Chemicals) project of the European Union, the need for objective methods to rank chemicals, regions, etc. on the basis of several criteria still increases. Additionally, an interesting relation between the mutual rank probabilities and the average rank probabilities is proven. The third and last part studies the transitivity properties of the mutual rank probabilities and the closely related linear extension majority cycles or LEM cycles for short. The type of transitivity is translated into the cycle-transitivity framework, which has been tailor-made for characterizing transitivity of reciprocal relations, and is proven to be situated between strong stochastic transitivity and a new type of transitivity called delta*-transitivity. It is shown that the latter type is situated between strong stochastic transitivity and a kind of product transitivity. Furthermore, theoretical upper bounds for the minimum cutting level to avoid LEM cycles are found. Cutting levels for posets on up to 13 elements are obtained experimentally and a theoretic lower bound for the cutting level to avoid LEM cycles of length 4 is computed. The research presented in this work has been published in international peer-reviewed journals and has been presented on international conferences. A Java implementation of several of the algorithms presented in this work, as well as binary files containing all posets on up to 13 elements with LEM cycles, can be downloaded from the website http://www.kermit.ugent.be

    A Distance-Based Test of Association Between Paired Heterogeneous Genomic Data

    Full text link
    Due to rapid technological advances, a wide range of different measurements can be obtained from a given biological sample including single nucleotide polymorphisms, copy number variation, gene expression levels, DNA methylation and proteomic profiles. Each of these distinct measurements provides the means to characterize a certain aspect of biological diversity, and a fundamental problem of broad interest concerns the discovery of shared patterns of variation across different data types. Such data types are heterogeneous in the sense that they represent measurements taken at very different scales or described by very different data structures. We propose a distance-based statistical test, the generalized RV (GRV) test, to assess whether there is a common and non-random pattern of variability between paired biological measurements obtained from the same random sample. The measurements enter the test through distance measures which can be chosen to capture particular aspects of the data. An approximate null distribution is proposed to compute p-values in closed-form and without the need to perform costly Monte Carlo permutation procedures. Compared to the classical Mantel test for association between distance matrices, the GRV test has been found to be more powerful in a number of simulation settings. We also report on an application of the GRV test to detect biological pathways in which genetic variability is associated to variation in gene expression levels in ovarian cancer samples, and present results obtained from two independent cohorts
    corecore