96 research outputs found

    Algorithms for determining the smallest number of nonterminals (states) sufficient for generating (accepting) a regular language R with R1⊆R⊆R2 for given regular languages R1,R2

    Get PDF
    AbstractGiven two regular languages R1 and R2 with R1⊆R2, one can effectively determine the number of nonterminals in a nonterminal-minimal (generalized) right linear grammar generating a regular language R with R1⊆R⊆R2, and the number of states in a state-minimal (generalized) nondeterministic finite automaton accepting a regular language R with R1⊆R⊆R2

    Stream Processing using Grammars and Regular Expressions

    Full text link
    In this dissertation we study regular expression based parsing and the use of grammatical specifications for the synthesis of fast, streaming string-processing programs. In the first part we develop two linear-time algorithms for regular expression based parsing with Perl-style greedy disambiguation. The first algorithm operates in two passes in a semi-streaming fashion, using a constant amount of working memory and an auxiliary tape storage which is written in the first pass and consumed by the second. The second algorithm is a single-pass and optimally streaming algorithm which outputs as much of the parse tree as is semantically possible based on the input prefix read so far, and resorts to buffering as many symbols as is required to resolve the next choice. Optimality is obtained by performing a PSPACE-complete pre-analysis on the regular expression. In the second part we present Kleenex, a language for expressing high-performance streaming string processing programs as regular grammars with embedded semantic actions, and its compilation to streaming string transducers with worst-case linear-time performance. Its underlying theory is based on transducer decomposition into oracle and action machines, and a finite-state specialization of the streaming parsing algorithm presented in the first part. In the second part we also develop a new linear-time streaming parsing algorithm for parsing expression grammars (PEG) which generalizes the regular grammars of Kleenex. The algorithm is based on a bottom-up tabulation algorithm reformulated using least fixed points and evaluated using an instance of the chaotic iteration scheme by Cousot and Cousot

    Parallel parsing made practical

    Get PDF
    The property of local parsability allows to parse inputs through inspecting only a bounded-length string around the current token. This in turn enables the construction of a scalable, data-parallel parsing algorithm, which is presented in this work. Such an algorithm is easily amenable to be automatically generated via a parser generator tool, which was realized, and is also presented in the following. Furthermore, to complete the framework of a parallel input analysis, a parallel scanner can also combined with the parser. To prove the practicality of a parallel lexing and parsing approach, we report the results of the adaptation of JSON and Lua to a form fit for parallel parsing (i.e. an operator-precedence grammar) through simple grammar changes and scanning transformations. The approach is validated with performance figures from both high performance and embedded multicore platforms, obtained analyzing real-world inputs as a test-bench. The results show that our approach matches or dominates the performances of production-grade LR parsers in sequential execution, and achieves significant speedups and good scaling on multi-core machines. The work is concluded by a broad and critical survey of the past work on parallel parsing and future directions on the integration with semantic analysis and incremental parsing

    Relationships Between Bounded Languages, Counter Machines, Finite-Index Grammars, Ambiguity, and Commutative Equivalence

    Get PDF
    It is shown that for every language family that is a trio containing only semilinear languages, all bounded languages in it can be accepted by one-way deterministic reversal-bounded multicounter machines (DCM). This implies that for every semilinear trio (where these properties are effective), it is possible to decide containment, equivalence, and disjointness concerning its bounded languages. A condition is also provided for when the bounded languages in a semilinear trio coincide exactly with those accepted by DCM machines, and it is used to show that many grammar systems of finite index — such as finite-index matrix grammars (Mfin) and finite-index ET0L (ET0Lfin) — have identical bounded languages as DCM. Then connections between ambiguity, counting regularity, and commutative regularity are made, as many machines and grammars that are unambiguous can only generate/accept counting regular or com- mutatively regular languages. Thus, such a system that can generate/accept a non-counting regular or non-commutatively regular language implies the existence of inherently ambiguous languages over that system. In addition, it is shown that every language generated by an unambiguous Mfin has a rational char- acteristic series in commutative variables, and is counting regular. This result plus the connections are used to demonstrate that the grammar systems Mfin and ET0Lfin can generate inherently ambiguous languages (over their grammars), as do several machine models. It is also shown that all bounded languages generated by these two grammar systems (those in any semilinear trio) can be generated unambiguously within the systems. Finally, conditions on Mfin and ET0Lfin languages implying commutative regularity are obtained. In particular, it is shown that every finite-index ED0L language is commutatively regular

    Acta Cybernetica : Volume 13. Number 1.

    Get PDF

    Regulated rewriting in formal language theory

    Get PDF
    Thesis (MSc (Mathematical Sciences))--University of Stellenbosch, 2008.Context-free grammars are well-studied and well-behaved in terms of decidability, but many real-world problems cannot be described with context-free grammars. Grammars with regulated rewriting are grammars with mechanisms to regulate the applications of rules, so that certain derivations are avoided. Thus, with context-free rules and regulated rewriting mechanisms, one can often generate languages that are not context-free. In this thesis we study grammars with regulated rewriting mechanisms. We consider problems in which context-free grammars are insufficient and in which more descriptive grammars are required. We compare bag context grammars with other well-known classes of grammars with regulated rewriting mechanisms. We also discuss the relation between bag context grammars and recognizing devices such as counter automata and Petri net automata. We show that regular bag context grammars can generate any recursively enumerable language. We reformulate the pumping lemma for random permitting context languages with context-free rules, as introduced by Ewert and Van der Walt, by using the concept of a string homomorphism. We conclude the thesis with decidability and complexity properties of grammars with regulated rewriting

    Acta Cybernetica : Volume 15. Number 1.

    Get PDF

    Acta Cybernetica : Volume 12. Number 4.

    Get PDF
    • …
    corecore