Search CORE

15 research outputs found

Transducers from Rewrite Rules with Backreferences

Author: Gerdemann Dale
van Noord Gertjan
Publication venue
Publication date: 01/01/1999
Field of study

Context sensitive rewrite rules have been widely used in several areas of natural language processing, including syntax, morphology, phonology and speech processing. Kaplan and Kay, Karttunen, and Mohri & Sproat have given various algorithms to compile such rewrite rules into finite-state transducers. The present paper extends this work by allowing a limited form of backreferencing in such rules. The explicit use of backreferencing leads to more elegant and general solutions.Comment: 8 pages, EACL 1999 Berge

arXiv.org e-Print Archive

CiteSeerX

Derivative Based Extended Regular Expression Matching Supporting Intersection, Complement and Lookarounds

Author: Ernits Juhan-Peep
Varatalu Ian Erik
Veanes Margus
Publication venue
Publication date: 25/09/2023
Field of study

Regular expressions are widely used in software. Various regular expression engines support different combinations of extensions to classical regular constructs such as Kleene star, concatenation, nondeterministic choice (union in terms of match semantics). The extensions include e.g. anchors, lookarounds, counters, backreferences. The properties of combinations of such extensions have been subject of active recent research. In the current paper we present a symbolic derivatives based approach to finding matches to regular expressions that, in addition to the classical regular constructs, also support complement, intersection and lookarounds (both negative and positive lookaheads and lookbacks). The theory of computing symbolic derivatives and determining nullability given an input string is presented that shows that such a combination of extensions yields a match semantics that corresponds to an effective Boolean algebra, which in turn opens up possibilities of applying various Boolean logic rewrite rules to optimize the search for matches. In addition to the theoretical framework we present an implementation of the combination of extensions to demonstrate the efficacy of the approach accompanied with practical examples

arXiv.org e-Print Archive

Implementation of replace rules using preference operator

Author: Drobac Senka
Silfverberg Miikka
Yli-Jyrä Anssi Mikael
Publication venue: The Association for Computational Linguistics
Publication date: 23/07/2012
Field of study

We explain the implementation of replace rules with the .r-glc. operator and preference relations. Our modular approach combines various preference constraints to form different replace rules. In addition to describing the method, we present illustrative examples.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Analyzing Catastrophic Backtracking Behavior in Practical Regular Expression Matching

Author: Berglund Martin
Drewes Frank
van der Merwe Brink
Publication venue: 'Open Publishing Association'
Publication date: 01/05/2014
Field of study

We develop a formal perspective on how regular expression matching works in Java, a popular representative of the category of regex-directed matching engines. In particular, we define an automata model which captures all the aspects needed to study such matching engines in a formal way. Based on this, we propose two types of static analysis, which take a regular expression and tell whether there exists a family of strings which makes Java-style matching run in exponential time.Comment: In Proceedings AFL 2014, arXiv:1405.527

arXiv.org e-Print Archive

Directory of Open Access Journals

The practical efficiency of regular expression membership algorithms

Author: Gray Justin P.
Publication venue: Halifax, N.S. : Saint Mary's University
Publication date: 28/04/2022
Field of study

1 online resource (71 pages) : graphs, chartsIncludes abstract.Includes bibliographical references (69-71).Regular expressions encode text patterns and define languages of symbolic words. The membership problem decides if a given word is an element of the language described by a given regular expression. This problem has various well-studied algorithms, but current research only shows asymptotic complexity and performance with respect to samples of randomly generated regular expressions. Our research aims to answer how the algorithms perform when using practical regular expressions used in the real-world on a representative test set of words. A set of compatible regular expressions have been collected from public GitHub repositories. Each compatible expression (i.e., no backreferences or improper formatting) is then converted into an equivalent unambiguous mathematical representation. For each distinct expression, we have tested Thompson, Glushkov, position, follow, and partial derivative NFA constructions, as well as partial derivatives and exponential backtracking directly on the regular expression tree. These algorithms have been implemented into a modified version of the Python’s FAdo library and include UNIX-inspired extensions such as character classes, the wild dot, and UTF-8 support. We find that efficiently constructing a small NFA is the best approach to this problem; using follow and PDDAG algorithms are experimentally shown as the best

Saint Mary's University, Halifax: Institutional Repository

Merkityn kaksoisnegaation sovellukset

Author: Yli-Jyrä Anssi
Publication venue: Potsdam University Press,
Publication date: 01/01/2008
Field of study

Nested complementation plays an important role in expressing counter- i.e. star-free and first-order definable languages and their hierarchies. In addition, methods that compile phonological rules into finite-state networks use double-nested complementation or "double negation". This paper reviews how the double-nested complementation extends to a relatively new operation, generalized restriction (GR), coined by the author. ... The paper demonstrates that the GR operation has an interesting potential in expressing regular languages, various kinds of grammars, bimorphisms and relations. This motivates a further study of optimized implementation of the operation.Non peer reviewe

Helsingin yliopiston digitaalinen arkisto

Checking Cryptographic API Specifications in JavaScript

Author: Mitchell Duncan
Publication venue
Publication date: 01/01/2020
Field of study

Royal Holloway - Pure

Strings at MOSCA

Author: Hague Matthew
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 06/11/2019
Field of study

Royal Holloway - Pure

Stream Processing using Grammars and Regular Expressions

Author: Rasmussen Ulrik Terp
Publication venue
Publication date: 01/01/2016
Field of study

In this dissertation we study regular expression based parsing and the use of grammatical specifications for the synthesis of fast, streaming string-processing programs. In the first part we develop two linear-time algorithms for regular expression based parsing with Perl-style greedy disambiguation. The first algorithm operates in two passes in a semi-streaming fashion, using a constant amount of working memory and an auxiliary tape storage which is written in the first pass and consumed by the second. The second algorithm is a single-pass and optimally streaming algorithm which outputs as much of the parse tree as is semantically possible based on the input prefix read so far, and resorts to buffering as many symbols as is required to resolve the next choice. Optimality is obtained by performing a PSPACE-complete pre-analysis on the regular expression. In the second part we present Kleenex, a language for expressing high-performance streaming string processing programs as regular grammars with embedded semantic actions, and its compilation to streaming string transducers with worst-case linear-time performance. Its underlying theory is based on transducer decomposition into oracle and action machines, and a finite-state specialization of the streaming parsing algorithm presented in the first part. In the second part we also develop a new linear-time streaming parsing algorithm for parsing expression grammars (PEG) which generalizes the regular grammars of Kleenex. The algorithm is based on a bottom-up tabulation algorithm reformulated using least fixed points and evaluated using an instance of the chaotic iteration scheme by Cousot and Cousot

arXiv.org e-Print Archive

Copenhagen University Research Information System

A Novel Algorithm Combining Finite State Method and Genetic Algorithm for Solving Crude Oil Scheduling Problem

Author: Chang-Chun Pan
Gen-Ke Yang
Qian-Qian Duan
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

A hybrid optimization algorithm combining finite state method (FSM) and genetic algorithm (GA) is proposed to solve the crude oil scheduling problem. The FSM and GA are combined to take the advantage of each method and compensate deficiencies of individual methods. In the proposed algorithm, the finite state method makes up for the weakness of GA which is poor at local searching ability. The heuristic returned by the FSM can guide the GA algorithm towards good solutions. The idea behind this is that we can generate promising substructure or partial solution by using FSM. Furthermore, the FSM can guarantee that the entire solution space is uniformly covered. Therefore, the combination of the two algorithms has better global performance than the existing GA or FSM which is operated individually. Finally, a real-life crude oil scheduling problem from the literature is used for conducting simulation. The experimental results validate that the proposed method outperforms the state-of-art GA method

Crossref

Directory of Open Access Journals

PubMed Central