4,405 research outputs found
Fast and Tiny Structural Self-Indexes for XML
XML document markup is highly repetitive and therefore well compressible
using dictionary-based methods such as DAGs or grammars. In the context of
selectivity estimation, grammar-compressed trees were used before as synopsis
for structural XPath queries. Here a fully-fledged index over such grammars is
presented. The index allows to execute arbitrary tree algorithms with a
slow-down that is comparable to the space improvement. More interestingly,
certain algorithms execute much faster over the index (because no decompression
occurs). E.g., for structural XPath count queries, evaluating over the index is
faster than previous XPath implementations, often by two orders of magnitude.
The index also allows to serialize XML results (including texts) faster than
previous systems, by a factor of ca. 2-3. This is due to efficient copy
handling of grammar repetitions, and because materialization is totally
avoided. In order to compare with twig join implementations, we implemented a
materializer which writes out pre-order numbers of result nodes, and show its
competitiveness.Comment: 13 page
A Corpus-based Toy Model for DisCoCat
The categorical compositional distributional (DisCoCat) model of meaning
rigorously connects distributional semantics and pregroup grammars, and has
found a variety of applications in computational linguistics. From a more
abstract standpoint, the DisCoCat paradigm predicates the construction of a
mapping from syntax to categorical semantics. In this work we present a
concrete construction of one such mapping, from a toy model of syntax for
corpora annotated with constituent structure trees, to categorical semantics
taking place in a category of free R-semimodules over an involutive commutative
semiring R.Comment: In Proceedings SLPCS 2016, arXiv:1608.0101
On Some Closure Properties of nc-eNCE Graph Grammars
In the study of automata and grammars, closure properties of the associated
languages have been studied extensively. In particular, closure properties of
various types of graph grammars have been examined in (Rozenberg and Welzl,
Inf. and Control,1986) and (Rozenberg and Welzl, Acta Informatica,1986). In
this paper we examine some critical closure properties of the nc-eNCE graph
grammars discussed in (Jayakrishna and Mathew, Symmetry 2023) and (Jayakrishna
and Mathew, ICMICDS 2022).Comment: 14 pages,9 figures, to be submitted to Theory of Computin
Recommended from our members
Automatic parsing of sports videos with grammars
Motivated by the analogies between languages and sports videos, we introduce a novel
approach for video parsing with grammars. It utilizes compiler techniques for integrating both semantic
annotation and syntactic analysis to generate a semantic index of events and a table of content for a given
sports video. The video sequence is first segmented and annotated by event detection with domain
knowledge. A grammar-based parser is then used to identify the structure of the video content.
Meanwhile, facilities for error handling are introduced which are particularly useful when the results of
automatic parsing need to be adjusted. As a case study, we have developed a system for video parsing in
the particular domain of TV diving programs. Experimental results indicate the proposed approach is
effectiv
On the Degree of Extension of Some Models Defining Non-Regular Languages
This work is a survey of the main results reported for the degree of
extension of two models defining non-regular languages, namely the context-free
grammar and the extended automaton over groups. More precisely, we recall the
main results regarding the degree on non-regularity of a context-free grammar
as well as the degree of extension of finite automata over groups. Finally, we
consider a similar measure for the finite automata with translucent letters and
present some preliminary results. This measure could be considered for many
mechanisms that extend a less expressive one.Comment: In Proceedings AFL 2023, arXiv:2309.0112
- …