673 research outputs found
Substructure Discovery Using Minimum Description Length and Background Knowledge
The ability to identify interesting and repetitive substructures is an
essential component to discovering knowledge in structural data. We describe a
new version of our SUBDUE substructure discovery system based on the minimum
description length principle. The SUBDUE system discovers substructures that
compress the original data and represent structural concepts in the data. By
replacing previously-discovered substructures in the data, multiple passes of
SUBDUE produce a hierarchical description of the structural regularities in the
data. SUBDUE uses a computationally-bounded inexact graph match that identifies
similar, but not identical, instances of a substructure and finds an
approximate measure of closeness of two substructures when under computational
constraints. In addition to the minimum description length principle, other
background knowledge can be used by SUBDUE to guide the search towards more
appropriate substructures. Experiments in a variety of domains demonstrate
SUBDUE's ability to find substructures capable of compressing the original data
and to discover structural concepts important to the domain. Description of
Online Appendix: This is a compressed tar file containing the SUBDUE discovery
system, written in C. The program accepts as input databases represented in
graph form, and will output discovered substructures with their corresponding
value.Comment: See http://www.jair.org/ for an online appendix and other files
accompanying this articl
KMS states on Quantum Grammars
We consider quantum (unitary) continuous time evolution of spins on a lattice
together with quantum evolution of the lattice itself. In physics such
evolution was discussed in connection with quantum gravity. It is also related
to what is called quantum circuits, one of the incarnations of a quantum
computer. We consider simpler models for which one can obtain exact
mathematical results. We prove existence of the dynamics in both Schroedinger
and Heisenberg pictures, construct KMS states on appropriate C*-algebras. We
show (for high temperatures) that for each system where the lattice undergoes
quantum evolution, there is a natural scaling leading to a quantum spin system
on a fixed lattice, defined by a renormalized Hamiltonian.Comment: 22 page
Compressing Permutation Groups into Grammars and Polytopes. A Graph Embedding Approach
It can be shown that each permutation group G ? ?_n can be embedded, in a well defined sense, in a connected graph with O(n+|G|) vertices. Some groups, however, require much fewer vertices. For instance, ?_n itself can be embedded in the n-clique K_n, a connected graph with n vertices.
In this work, we show that the minimum size of a context-free grammar generating a finite permutation group G? ?_n can be upper bounded by three structural parameters of connected graphs embedding G: the number of vertices, the treewidth, and the maximum degree. More precisely, we show that any permutation group G ? ?_n that can be embedded into a connected graph with m vertices, treewidth k, and maximum degree ?, can also be generated by a context-free grammar of size 2^{O(k?log?)}? m^{O(k)}. By combining our upper bound with a connection established by Pesant, Quimper, Rousseau and Sellmann [Gilles Pesant et al., 2009] between the extension complexity of a permutation group and the grammar complexity of a formal language, we also get that these permutation groups can be represented by polytopes of extension complexity 2^{O(k?log?)}? m^{O(k)}.
The above upper bounds can be used to provide trade-offs between the index of permutation groups, and the number of vertices, treewidth and maximum degree of connected graphs embedding these groups. In particular, by combining our main result with a celebrated 2^{?(n)} lower bound on the grammar complexity of the symmetric group ?_n due to Glaister and Shallit [Glaister and Shallit, 1996] we have that connected graphs of treewidth o(n/log n) and maximum degree o(n/log n) embedding subgroups of ?_n of index 2^{cn} for some small constant c must have n^{?(1)} vertices. This lower bound can be improved to exponential on graphs of treewidth n^{?} for ? < 1 and maximum degree o(n/log n)
DAFSA: a Python library for Deterministic Acyclic Finite State Automata [Software]
This work describes dafsa, a Python library for computing graphs from lists of strings for identifying, visualizing, and inspecting patterns of substrings. The library is designed for usage by linguists in studies on morphology and formal grammars, and is intended for faster, easier, and simpler generation of visualizations. It collects frequency weights by default, it can condense structures, and it provides several export options. Figure 1 depicts a basic DAFSA, based upon five English words and generated with default settings
Data complexity measured by principal graphs
How to measure the complexity of a finite set of vectors embedded in a
multidimensional space? This is a non-trivial question which can be approached
in many different ways. Here we suggest a set of data complexity measures using
universal approximators, principal cubic complexes. Principal cubic complexes
generalise the notion of principal manifolds for datasets with non-trivial
topologies. The type of the principal cubic complex is determined by its
dimension and a grammar of elementary graph transformations. The simplest
grammar produces principal trees.
We introduce three natural types of data complexity: 1) geometric (deviation
of the data's approximator from some "idealized" configuration, such as
deviation from harmonicity); 2) structural (how many elements of a principal
graph are needed to approximate the data), and 3) construction complexity (how
many applications of elementary graph transformations are needed to construct
the principal object starting from the simplest one).
We compute these measures for several simulated and real-life data
distributions and show them in the "accuracy-complexity" plots, helping to
optimize the accuracy/complexity ratio. We discuss various issues connected
with measuring data complexity. Software for computing data complexity measures
from principal cubic complexes is provided as well.Comment: Computers and Mathematics with Applications, in pres
- …