1,985 research outputs found
Transducers from Rewrite Rules with Backreferences
Context sensitive rewrite rules have been widely used in several areas of
natural language processing, including syntax, morphology, phonology and speech
processing. Kaplan and Kay, Karttunen, and Mohri & Sproat have given various
algorithms to compile such rewrite rules into finite-state transducers. The
present paper extends this work by allowing a limited form of backreferencing
in such rules. The explicit use of backreferencing leads to more elegant and
general solutions.Comment: 8 pages, EACL 1999 Berge
Pushdown Compression
The pressing need for eficient compression schemes for XML documents has
recently been focused on stack computation [6, 9], and in particular calls for
a formulation of information-lossless stack or pushdown compressors that allows
a formal analysis of their performance and a more ambitious use of the stack in
XML compression, where so far it is mainly connected to parsing mechanisms. In
this paper we introduce the model of pushdown compressor, based on pushdown
transducers that compute a single injective function while keeping the widest
generality regarding stack computation. The celebrated Lempel-Ziv algorithm
LZ78 [10] was introduced as a general purpose compression algorithm that
outperforms finite-state compressors on all sequences. We compare the
performance of the Lempel-Ziv algorithm with that of the pushdown compressors,
or compression algorithms that can be implemented with a pushdown transducer.
This comparison is made without any a priori assumption on the data's source
and considering the asymptotic compression ratio for infinite sequences. We
prove that Lempel-Ziv is incomparable with pushdown compressors
AT-GIS: highly parallel spatial query processing with associative transducers
Users in many domains, including urban planning, transportation, and environmental science want to execute analytical queries over continuously updated spatial datasets. Current solutions for largescale spatial query processing either rely on extensions to RDBMS, which entails expensive loading and indexing phases when the data changes, or distributed map/reduce frameworks, running on resource-hungry compute clusters. Both solutions struggle with the sequential bottleneck of parsing complex, hierarchical spatial data formats, which frequently dominates query execution time. Our goal is to fully exploit the parallelism offered by modern multicore CPUs for parsing and query execution, thus providing the performance of a cluster with the resources of a single machine. We describe AT-GIS, a highly-parallel spatial query processing system that scales linearly to a large number of CPU cores. ATGIS integrates the parsing and querying of spatial data using a new computational abstraction called associative transducers(ATs). ATs can form a single data-parallel pipeline for computation without requiring the spatial input data to be split into logically independent blocks. Using ATs, AT-GIS can execute, in parallel, spatial query operators on the raw input data in multiple formats, without any pre-processing. On a single 64-core machine, AT-GIS provides 3× the performance of an 8-node Hadoop cluster with 192 cores for containment queries, and 10× for aggregation queries
Probabilistic Parsing Strategies
We present new results on the relation between purely symbolic context-free
parsing strategies and their probabilistic counter-parts. Such parsing
strategies are seen as constructions of push-down devices from grammars. We
show that preservation of probability distribution is possible under two
conditions, viz. the correct-prefix property and the property of strong
predictiveness. These results generalize existing results in the literature
that were obtained by considering parsing strategies in isolation. From our
general results we also derive negative results on so-called generalized LR
parsing.Comment: 36 pages, 1 figur
Coding-theorem Like Behaviour and Emergence of the Universal Distribution from Resource-bounded Algorithmic Probability
Previously referred to as `miraculous' in the scientific literature because
of its powerful properties and its wide application as optimal solution to the
problem of induction/inference, (approximations to) Algorithmic Probability
(AP) and the associated Universal Distribution are (or should be) of the
greatest importance in science. Here we investigate the emergence, the rates of
emergence and convergence, and the Coding-theorem like behaviour of AP in
Turing-subuniversal models of computation. We investigate empirical
distributions of computing models in the Chomsky hierarchy. We introduce
measures of algorithmic probability and algorithmic complexity based upon
resource-bounded computation, in contrast to previously thoroughly investigated
distributions produced from the output distribution of Turing machines. This
approach allows for numerical approximations to algorithmic
(Kolmogorov-Chaitin) complexity-based estimations at each of the levels of a
computational hierarchy. We demonstrate that all these estimations are
correlated in rank and that they converge both in rank and values as a function
of computational power, despite fundamental differences between computational
models. In the context of natural processes that operate below the Turing
universal level because of finite resources and physical degradation, the
investigation of natural biases stemming from algorithmic rules may shed light
on the distribution of outcomes. We show that up to 60\% of the
simplicity/complexity bias in distributions produced even by the weakest of the
computational models can be accounted for by Algorithmic Probability in its
approximation to the Universal Distribution.Comment: 27 pages main text, 39 pages including supplement. Online complexity
calculator: http://complexitycalculator.com
- …