96 research outputs found
Algorithms for determining the smallest number of nonterminals (states) sufficient for generating (accepting) a regular language R with R1⊆R⊆R2 for given regular languages R1,R2
AbstractGiven two regular languages R1 and R2 with R1⊆R2, one can effectively determine the number of nonterminals in a nonterminal-minimal (generalized) right linear grammar generating a regular language R with R1⊆R⊆R2, and the number of states in a state-minimal (generalized) nondeterministic finite automaton accepting a regular language R with R1⊆R⊆R2
Methods for Structural Pattern Recognition: Complexity and Applications
Katedra kybernetik
Stream Processing using Grammars and Regular Expressions
In this dissertation we study regular expression based parsing and the use of
grammatical specifications for the synthesis of fast, streaming
string-processing programs.
In the first part we develop two linear-time algorithms for regular
expression based parsing with Perl-style greedy disambiguation. The first
algorithm operates in two passes in a semi-streaming fashion, using a constant
amount of working memory and an auxiliary tape storage which is written in the
first pass and consumed by the second. The second algorithm is a single-pass
and optimally streaming algorithm which outputs as much of the parse tree as is
semantically possible based on the input prefix read so far, and resorts to
buffering as many symbols as is required to resolve the next choice. Optimality
is obtained by performing a PSPACE-complete pre-analysis on the regular
expression.
In the second part we present Kleenex, a language for expressing
high-performance streaming string processing programs as regular grammars with
embedded semantic actions, and its compilation to streaming string transducers
with worst-case linear-time performance. Its underlying theory is based on
transducer decomposition into oracle and action machines, and a finite-state
specialization of the streaming parsing algorithm presented in the first part.
In the second part we also develop a new linear-time streaming parsing
algorithm for parsing expression grammars (PEG) which generalizes the regular
grammars of Kleenex. The algorithm is based on a bottom-up tabulation algorithm
reformulated using least fixed points and evaluated using an instance of the
chaotic iteration scheme by Cousot and Cousot
Parallel parsing made practical
The property of local parsability allows to parse inputs through inspecting only a bounded-length string around the current token. This in turn enables the construction of a scalable, data-parallel parsing algorithm, which is presented in this work. Such an algorithm is easily amenable to be automatically generated via a parser generator tool, which was realized, and is also presented in the following. Furthermore, to complete the framework of a parallel input analysis, a parallel scanner can also combined with the parser. To prove the practicality of a parallel lexing and parsing approach, we report the results of the adaptation of JSON and Lua to a form fit for parallel parsing (i.e. an operator-precedence grammar) through simple grammar changes and scanning transformations. The approach is validated with performance figures from both high performance and embedded multicore platforms, obtained analyzing real-world inputs as a test-bench. The results show that our approach matches or dominates the performances of production-grade LR parsers in sequential execution, and achieves significant speedups and good scaling on multi-core machines. The work is concluded by a broad and critical survey of the past work on parallel parsing and future directions on the integration with semantic analysis and incremental parsing
Relationships Between Bounded Languages, Counter Machines, Finite-Index Grammars, Ambiguity, and Commutative Equivalence
It is shown that for every language family that is a trio containing only semilinear languages, all bounded languages in it can be accepted by one-way deterministic reversal-bounded multicounter machines (DCM). This implies that for every semilinear trio (where these properties are effective), it is possible to decide containment, equivalence, and disjointness concerning its bounded languages. A condition is also provided for when the bounded languages in a semilinear trio coincide exactly with those accepted by DCM machines, and it is used to show that many grammar systems of finite index — such as finite-index matrix grammars (Mfin) and finite-index ET0L (ET0Lfin) — have identical bounded languages as DCM. Then connections between ambiguity, counting regularity, and commutative regularity are made, as many machines and grammars that are unambiguous can only generate/accept counting regular or com- mutatively regular languages. Thus, such a system that can generate/accept a non-counting regular or non-commutatively regular language implies the existence of inherently ambiguous languages over that system. In addition, it is shown that every language generated by an unambiguous Mfin has a rational char- acteristic series in commutative variables, and is counting regular. This result plus the connections are used to demonstrate that the grammar systems Mfin and ET0Lfin can generate inherently ambiguous languages (over their grammars), as do several machine models. It is also shown that all bounded languages generated by these two grammar systems (those in any semilinear trio) can be generated unambiguously within the systems. Finally, conditions on Mfin and ET0Lfin languages implying commutative regularity are obtained. In particular, it is shown that every finite-index ED0L language is commutatively regular
Regulated rewriting in formal language theory
Thesis (MSc (Mathematical Sciences))--University of Stellenbosch, 2008.Context-free grammars are well-studied and well-behaved in terms of decidability, but many
real-world problems cannot be described with context-free grammars. Grammars with regulated
rewriting are grammars with mechanisms to regulate the applications of rules, so that
certain derivations are avoided. Thus, with context-free rules and regulated rewriting mechanisms,
one can often generate languages that are not context-free.
In this thesis we study grammars with regulated rewriting mechanisms. We consider problems
in which context-free grammars are insufficient and in which more descriptive grammars
are required. We compare bag context grammars with other well-known classes of grammars
with regulated rewriting mechanisms. We also discuss the relation between bag context grammars
and recognizing devices such as counter automata and Petri net automata. We show
that regular bag context grammars can generate any recursively enumerable language. We
reformulate the pumping lemma for random permitting context languages with context-free
rules, as introduced by Ewert and Van der Walt, by using the concept of a string homomorphism.
We conclude the thesis with decidability and complexity properties of grammars with
regulated rewriting
- …