671 research outputs found

    Substructure Discovery Using Minimum Description Length and Background Knowledge

    Full text link
    The ability to identify interesting and repetitive substructures is an essential component to discovering knowledge in structural data. We describe a new version of our SUBDUE substructure discovery system based on the minimum description length principle. The SUBDUE system discovers substructures that compress the original data and represent structural concepts in the data. By replacing previously-discovered substructures in the data, multiple passes of SUBDUE produce a hierarchical description of the structural regularities in the data. SUBDUE uses a computationally-bounded inexact graph match that identifies similar, but not identical, instances of a substructure and finds an approximate measure of closeness of two substructures when under computational constraints. In addition to the minimum description length principle, other background knowledge can be used by SUBDUE to guide the search towards more appropriate substructures. Experiments in a variety of domains demonstrate SUBDUE's ability to find substructures capable of compressing the original data and to discover structural concepts important to the domain. Description of Online Appendix: This is a compressed tar file containing the SUBDUE discovery system, written in C. The program accepts as input databases represented in graph form, and will output discovered substructures with their corresponding value.Comment: See http://www.jair.org/ for an online appendix and other files accompanying this articl

    On the Relation of the Total Graph of a Ring and a Product of Graphs

    Get PDF
    The total graph of a ring R, denoted as T(Γ(R)), is defined to be a graph with vertex set V(T(Γ(R)))=R and two distinct vertices u,v∈V(T(Γ(R))) are adjacent if and only if u+v∈Z(R), where Z(R) is the zero divisor of R. The Cartesian product of two graphs G and H is a graph with the vertex set V(G×H)=V(G)×V(H) and two distinct vertices (u_1,v_1 ) and (u_2,v_2 ) are adjacent if and only if: 1) u_1=u_2 and v_1 v_2∈H; or 2) v_1=v_2 and u_1 u_2∈E(G). An isomorphism of graphs G dan H is a bijection ϕ:V(G)→V(H) such that u,v∈V(G) are adjacent if and only if f(u),f(v)∈V(H) are adjacent. This paper proved that T(Γ(Z_2p )) and P_2×K_p are isomorphic for every odd prime p

    Analysis of Generative Chemistries

    Get PDF
    For the modelling of chemistry we use undirected, labelled graphs as explicit models of molecules and graph transformation rules for modelling generalised chemical reactions. This is used to define artificial chemistries on the level of individual bonds and atoms, where formal graph grammars implicitly represent large spaces of chemical compounds. We use a graph rewriting formalism, rooted in category theory, called the Double Pushout approach, which directly expresses the transition state of chemical reactions. Using concurrency theory for transformation rules, we define algorithms for the composition of rewrite rules in a chemically intuitive manner that enable automatic abstraction of the level of detail in chemical pathways. Based on this rule composition we define an algorithmic framework for generation of vast reaction networks for specific spaces of a given chemistry, while still maintaining the level of detail of the model down to the atomic level. The framework also allows for computation with graphs and graph grammars, which is utilised to model non-trivial chemical systems. The graph generation relies on graph isomorphism testing, and we review the general individualisation-refinement paradigm used in the state-of-the-art algorithms for graph canonicalisation, isomorphism testing, and automorphism discovery. We present a model for chemical pathways based on a generalisation of network flows from ordinary directed graphs to directed hypergraphs. The model allows for reasoning about the flow of individual molecules in general pathways, and the introduction of chemically motivated routing constraints. It further provides the foundation for defining specialised pathway motifs, which is illustrated by defining necessary topological constraints for both catalytic and autocatalytic pathways. We also prove that central types of pathway questions are NP-complete, even for restricted classes of reaction networks. The complete pathway model, including constraints for catalytic and autocatalytic pathways, is implemented using integer linear programming. This implementation is used in a tree search method to enumerate both optimal and near-optimal pathway solutions. The formal methods are applied to multiple chemical systems: the enzyme catalysed beta-lactamase reaction, variations of the glycolysis pathway, and the formose process. In each of these systems we use rule composition to abstract pathways and calculate traces for isotope labelled carbon atoms. The pathway model is used to automatically enumerate alternative non-oxidative glycolysis pathways, and enumerate thousands of candidates for autocatalytic pathways in the formose process

    Leveraging Relational Structure through Message Passing for Modelling Non-Euclidean Data

    Get PDF
    Modelling non-Euclidean data is difficult since objects for comparison can be formed of different numbers of constituent parts with different numbers of relations between them, and traditional (Euclidean) methods are non-trivial to apply. Message passing enables such modelling by leveraging the structure of the relations within a (or between) given object(s) in order to represent and compare structure in a vectorized form of fixed dimensions. In this work, we contribute novel message passing techniques that improve state of the art for non-Euclidean modelling in a set of specifically chosen domains. In particular, (1) we introduce an attention-based structure-aware global pooling operator for graph classification and demonstrate its effectiveness on a range of chemical property prediction benchmarks, we also show that our method outperforms state of the art graph classifiers in a graph isomorphism test, and demonstrate the interpretability of our method with respect to the learned attention coefficients. (2) We propose a style similarity measure for Boundary Representations (B-Reps) that leverages the style signals in the second order statistics of the activations in a pre-trained (unsupervised) 3D encoder, and learns their relative importance to an end-user through few-shot learning. Our approach differs from existing data-driven 3D style methods since it may be used in completely unsupervised settings. We show quantitatively that our proposed method with B-Reps is able to capture stronger style signals than alternative methods on meshes and point clouds despite its significantly greater computational efficiency. We also show it is able to generate meaningful style gradients with respect to the input shape. (3) We introduce a novel message passing-based model of computation and demonstrate its effectiveness in expressing the complex dependencies of biological systems necessary to model life-like systems and tracing cell lineage during cancerous tumour growth, and demonstrate the improvement over existing methods in terms of post-analysis

    Malleable coding: compressed palimpsests

    Full text link
    A malleable coding scheme considers not only compression efficiency but also the ease of alteration, thus encouraging some form of recycling of an old compressed version in the formation of a new one. Malleability cost is the difficulty of synchronizing compressed versions, and malleable codes are of particular interest when representing information and modifying the representation are both expensive. We examine the trade-off between compression efficiency and malleability cost under a malleability metric defined with respect to a string edit distance. This problem introduces a metric topology to the compressed domain. We characterize the achievable rates and malleability as the solution of a subgraph isomorphism problem. This can be used to argue that allowing conditional entropy of the edited message given the original message to grow linearly with block length creates an exponential increase in code length.First author draf
    • …
    corecore