Search CORE

21 research outputs found

Transducers from Rewrite Rules with Backreferences

Author: Gerdemann Dale
van Noord Gertjan
Publication venue
Publication date: 01/01/1999
Field of study

Context sensitive rewrite rules have been widely used in several areas of natural language processing, including syntax, morphology, phonology and speech processing. Kaplan and Kay, Karttunen, and Mohri & Sproat have given various algorithms to compile such rewrite rules into finite-state transducers. The present paper extends this work by allowing a limited form of backreferencing in such rules. The explicit use of backreferencing leads to more elegant and general solutions.Comment: 8 pages, EACL 1999 Berge

arXiv.org e-Print Archive

CiteSeerX

Derivative Based Extended Regular Expression Matching Supporting Intersection, Complement and Lookarounds

Author: Ernits Juhan-Peep
Varatalu Ian Erik
Veanes Margus
Publication venue
Publication date: 25/09/2023
Field of study

Regular expressions are widely used in software. Various regular expression engines support different combinations of extensions to classical regular constructs such as Kleene star, concatenation, nondeterministic choice (union in terms of match semantics). The extensions include e.g. anchors, lookarounds, counters, backreferences. The properties of combinations of such extensions have been subject of active recent research. In the current paper we present a symbolic derivatives based approach to finding matches to regular expressions that, in addition to the classical regular constructs, also support complement, intersection and lookarounds (both negative and positive lookaheads and lookbacks). The theory of computing symbolic derivatives and determining nullability given an input string is presented that shows that such a combination of extensions yields a match semantics that corresponds to an effective Boolean algebra, which in turn opens up possibilities of applying various Boolean logic rewrite rules to optimize the search for matches. In addition to the theoretical framework we present an implementation of the combination of extensions to demonstrate the efficacy of the approach accompanied with practical examples

arXiv.org e-Print Archive

Merkityn kaksoisnegaation sovellukset

Author: Yli-Jyrä Anssi
Publication venue: Potsdam University Press,
Publication date: 01/01/2008
Field of study

Nested complementation plays an important role in expressing counter- i.e. star-free and first-order definable languages and their hierarchies. In addition, methods that compile phonological rules into finite-state networks use double-nested complementation or "double negation". This paper reviews how the double-nested complementation extends to a relatively new operation, generalized restriction (GR), coined by the author. ... The paper demonstrates that the GR operation has an interesting potential in expressing regular languages, various kinds of grammars, bimorphisms and relations. This motivates a further study of optimized implementation of the operation.Non peer reviewe

Helsingin yliopiston digitaalinen arkisto

P-model Alternative to the T-model

Author: Roberts Mark D.
Publication venue
Publication date: 01/01/2004
Field of study

Standard linguistic analysis of syntax uses the T-model. This model requires the ordering: D-structure

>

S-structure

>

LF, where D-structure is the deep structure, S-structure is the surface structure, and LF is logical form. Between each of these representations there is movement which alters the order of the constituent words; movement is achieved using the principles and parameters of syntactic theory. Psychological analysis of sentence production is usually either serial or connectionist. Psychological serial models do not accommodate the T-model immediately so that here a new model called the P-model is introduced. The P-model is different from previous linguistic and psychological models. Here it is argued that the LF representation should be replaced by a variant of Frege's three qualities (sense, reference, and force), called the Frege representation or F-representation. In the F-representation the order of elements is not necessarily the same as that in LF and it is suggested that the correct ordering is: F-representation

>

D-structure

>

S-structure. This ordering appears to lead to a more natural view of sentence production and processing. Within this framework movement originates as the outcome of emphasis applied to the sentence. The requirement that the F-representation precedes the D-structure needs a picture of the particular principles and parameters which pertain to movement of words between representations. In general this would imply that there is a preferred or optimal ordering of the symbolic string in the F-representation. The standard ordering is retained because the general way of producing such an optimal ordering is unclear. In this case it is possible to produce an analysis of movement between LF and D-structure similar to the usual analysis of movement between S-structure and LF. It is suggested that a maximal amount of information about a language's grammar and lexicon is stored, because of the necessity of analyzing corrupted data

PhilPapers

CogPrints Cognitive Sciences Eprint Archive

Checking Cryptographic API Specifications in JavaScript

Author: Mitchell Duncan
Publication venue
Publication date: 01/01/2020
Field of study

Royal Holloway - Pure

Strings at MOSCA

Author: Hague Matthew
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 06/11/2019
Field of study

Royal Holloway - Pure

A Novel Algorithm Combining Finite State Method and Genetic Algorithm for Solving Crude Oil Scheduling Problem

Author: Chang-Chun Pan
Gen-Ke Yang
Qian-Qian Duan
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

A hybrid optimization algorithm combining finite state method (FSM) and genetic algorithm (GA) is proposed to solve the crude oil scheduling problem. The FSM and GA are combined to take the advantage of each method and compensate deficiencies of individual methods. In the proposed algorithm, the finite state method makes up for the weakness of GA which is poor at local searching ability. The heuristic returned by the FSM can guide the GA algorithm towards good solutions. The idea behind this is that we can generate promising substructure or partial solution by using FSM. Furthermore, the FSM can guarantee that the entire solution space is uniformly covered. Therefore, the combination of the two algorithms has better global performance than the existing GA or FSM which is operated individually. Finally, a real-life crude oil scheduling problem from the literature is used for conducting simulation. The experimental results validate that the proposed method outperforms the state-of-art GA method

Crossref

Directory of Open Access Journals

PubMed Central

Quadratic Alignment Constraints and Finite State Optimality Theory

Author: Biró Tamás
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2003
Field of study

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Repository of the Academy's Library

Dissertations of the University of Groningen

Stream Processing using Grammars and Regular Expressions

Author: Rasmussen Ulrik Terp
Publication venue
Publication date: 01/01/2016
Field of study

In this dissertation we study regular expression based parsing and the use of grammatical specifications for the synthesis of fast, streaming string-processing programs. In the first part we develop two linear-time algorithms for regular expression based parsing with Perl-style greedy disambiguation. The first algorithm operates in two passes in a semi-streaming fashion, using a constant amount of working memory and an auxiliary tape storage which is written in the first pass and consumed by the second. The second algorithm is a single-pass and optimally streaming algorithm which outputs as much of the parse tree as is semantically possible based on the input prefix read so far, and resorts to buffering as many symbols as is required to resolve the next choice. Optimality is obtained by performing a PSPACE-complete pre-analysis on the regular expression. In the second part we present Kleenex, a language for expressing high-performance streaming string processing programs as regular grammars with embedded semantic actions, and its compilation to streaming string transducers with worst-case linear-time performance. Its underlying theory is based on transducer decomposition into oracle and action machines, and a finite-state specialization of the streaming parsing algorithm presented in the first part. In the second part we also develop a new linear-time streaming parsing algorithm for parsing expression grammars (PEG) which generalizes the regular grammars of Kleenex. The algorithm is based on a bottom-up tabulation algorithm reformulated using least fixed points and evaluated using an instance of the chaotic iteration scheme by Cousot and Cousot

arXiv.org e-Print Archive

Copenhagen University Research Information System