4,405 research outputs found

    Fast and Tiny Structural Self-Indexes for XML

    Full text link
    XML document markup is highly repetitive and therefore well compressible using dictionary-based methods such as DAGs or grammars. In the context of selectivity estimation, grammar-compressed trees were used before as synopsis for structural XPath queries. Here a fully-fledged index over such grammars is presented. The index allows to execute arbitrary tree algorithms with a slow-down that is comparable to the space improvement. More interestingly, certain algorithms execute much faster over the index (because no decompression occurs). E.g., for structural XPath count queries, evaluating over the index is faster than previous XPath implementations, often by two orders of magnitude. The index also allows to serialize XML results (including texts) faster than previous systems, by a factor of ca. 2-3. This is due to efficient copy handling of grammar repetitions, and because materialization is totally avoided. In order to compare with twig join implementations, we implemented a materializer which writes out pre-order numbers of result nodes, and show its competitiveness.Comment: 13 page

    A Corpus-based Toy Model for DisCoCat

    Get PDF
    The categorical compositional distributional (DisCoCat) model of meaning rigorously connects distributional semantics and pregroup grammars, and has found a variety of applications in computational linguistics. From a more abstract standpoint, the DisCoCat paradigm predicates the construction of a mapping from syntax to categorical semantics. In this work we present a concrete construction of one such mapping, from a toy model of syntax for corpora annotated with constituent structure trees, to categorical semantics taking place in a category of free R-semimodules over an involutive commutative semiring R.Comment: In Proceedings SLPCS 2016, arXiv:1608.0101

    On Some Closure Properties of nc-eNCE Graph Grammars

    Full text link
    In the study of automata and grammars, closure properties of the associated languages have been studied extensively. In particular, closure properties of various types of graph grammars have been examined in (Rozenberg and Welzl, Inf. and Control,1986) and (Rozenberg and Welzl, Acta Informatica,1986). In this paper we examine some critical closure properties of the nc-eNCE graph grammars discussed in (Jayakrishna and Mathew, Symmetry 2023) and (Jayakrishna and Mathew, ICMICDS 2022).Comment: 14 pages,9 figures, to be submitted to Theory of Computin

    On the Degree of Extension of Some Models Defining Non-Regular Languages

    Full text link
    This work is a survey of the main results reported for the degree of extension of two models defining non-regular languages, namely the context-free grammar and the extended automaton over groups. More precisely, we recall the main results regarding the degree on non-regularity of a context-free grammar as well as the degree of extension of finite automata over groups. Finally, we consider a similar measure for the finite automata with translucent letters and present some preliminary results. This measure could be considered for many mechanisms that extend a less expressive one.Comment: In Proceedings AFL 2023, arXiv:2309.0112
    corecore