673 research outputs found

    Compressing Graphs by Grammars

    Get PDF

    Substructure Discovery Using Minimum Description Length and Background Knowledge

    Full text link
    The ability to identify interesting and repetitive substructures is an essential component to discovering knowledge in structural data. We describe a new version of our SUBDUE substructure discovery system based on the minimum description length principle. The SUBDUE system discovers substructures that compress the original data and represent structural concepts in the data. By replacing previously-discovered substructures in the data, multiple passes of SUBDUE produce a hierarchical description of the structural regularities in the data. SUBDUE uses a computationally-bounded inexact graph match that identifies similar, but not identical, instances of a substructure and finds an approximate measure of closeness of two substructures when under computational constraints. In addition to the minimum description length principle, other background knowledge can be used by SUBDUE to guide the search towards more appropriate substructures. Experiments in a variety of domains demonstrate SUBDUE's ability to find substructures capable of compressing the original data and to discover structural concepts important to the domain. Description of Online Appendix: This is a compressed tar file containing the SUBDUE discovery system, written in C. The program accepts as input databases represented in graph form, and will output discovered substructures with their corresponding value.Comment: See http://www.jair.org/ for an online appendix and other files accompanying this articl

    KMS states on Quantum Grammars

    Get PDF
    We consider quantum (unitary) continuous time evolution of spins on a lattice together with quantum evolution of the lattice itself. In physics such evolution was discussed in connection with quantum gravity. It is also related to what is called quantum circuits, one of the incarnations of a quantum computer. We consider simpler models for which one can obtain exact mathematical results. We prove existence of the dynamics in both Schroedinger and Heisenberg pictures, construct KMS states on appropriate C*-algebras. We show (for high temperatures) that for each system where the lattice undergoes quantum evolution, there is a natural scaling leading to a quantum spin system on a fixed lattice, defined by a renormalized Hamiltonian.Comment: 22 page

    Compressing Permutation Groups into Grammars and Polytopes. A Graph Embedding Approach

    Get PDF
    It can be shown that each permutation group G ? ?_n can be embedded, in a well defined sense, in a connected graph with O(n+|G|) vertices. Some groups, however, require much fewer vertices. For instance, ?_n itself can be embedded in the n-clique K_n, a connected graph with n vertices. In this work, we show that the minimum size of a context-free grammar generating a finite permutation group G? ?_n can be upper bounded by three structural parameters of connected graphs embedding G: the number of vertices, the treewidth, and the maximum degree. More precisely, we show that any permutation group G ? ?_n that can be embedded into a connected graph with m vertices, treewidth k, and maximum degree ?, can also be generated by a context-free grammar of size 2^{O(k?log?)}? m^{O(k)}. By combining our upper bound with a connection established by Pesant, Quimper, Rousseau and Sellmann [Gilles Pesant et al., 2009] between the extension complexity of a permutation group and the grammar complexity of a formal language, we also get that these permutation groups can be represented by polytopes of extension complexity 2^{O(k?log?)}? m^{O(k)}. The above upper bounds can be used to provide trade-offs between the index of permutation groups, and the number of vertices, treewidth and maximum degree of connected graphs embedding these groups. In particular, by combining our main result with a celebrated 2^{?(n)} lower bound on the grammar complexity of the symmetric group ?_n due to Glaister and Shallit [Glaister and Shallit, 1996] we have that connected graphs of treewidth o(n/log n) and maximum degree o(n/log n) embedding subgroups of ?_n of index 2^{cn} for some small constant c must have n^{?(1)} vertices. This lower bound can be improved to exponential on graphs of treewidth n^{?} for ? < 1 and maximum degree o(n/log n)

    DAFSA: a Python library for Deterministic Acyclic Finite State Automata [Software]

    No full text
    This work describes dafsa, a Python library for computing graphs from lists of strings for identifying, visualizing, and inspecting patterns of substrings. The library is designed for usage by linguists in studies on morphology and formal grammars, and is intended for faster, easier, and simpler generation of visualizations. It collects frequency weights by default, it can condense structures, and it provides several export options. Figure 1 depicts a basic DAFSA, based upon five English words and generated with default settings

    Data complexity measured by principal graphs

    Full text link
    How to measure the complexity of a finite set of vectors embedded in a multidimensional space? This is a non-trivial question which can be approached in many different ways. Here we suggest a set of data complexity measures using universal approximators, principal cubic complexes. Principal cubic complexes generalise the notion of principal manifolds for datasets with non-trivial topologies. The type of the principal cubic complex is determined by its dimension and a grammar of elementary graph transformations. The simplest grammar produces principal trees. We introduce three natural types of data complexity: 1) geometric (deviation of the data's approximator from some "idealized" configuration, such as deviation from harmonicity); 2) structural (how many elements of a principal graph are needed to approximate the data), and 3) construction complexity (how many applications of elementary graph transformations are needed to construct the principal object starting from the simplest one). We compute these measures for several simulated and real-life data distributions and show them in the "accuracy-complexity" plots, helping to optimize the accuracy/complexity ratio. We discuss various issues connected with measuring data complexity. Software for computing data complexity measures from principal cubic complexes is provided as well.Comment: Computers and Mathematics with Applications, in pres
    • …
    corecore