21 research outputs found
Transducers from Rewrite Rules with Backreferences
Context sensitive rewrite rules have been widely used in several areas of
natural language processing, including syntax, morphology, phonology and speech
processing. Kaplan and Kay, Karttunen, and Mohri & Sproat have given various
algorithms to compile such rewrite rules into finite-state transducers. The
present paper extends this work by allowing a limited form of backreferencing
in such rules. The explicit use of backreferencing leads to more elegant and
general solutions.Comment: 8 pages, EACL 1999 Berge
Derivative Based Extended Regular Expression Matching Supporting Intersection, Complement and Lookarounds
Regular expressions are widely used in software. Various regular expression
engines support different combinations of extensions to classical regular
constructs such as Kleene star, concatenation, nondeterministic choice (union
in terms of match semantics). The extensions include e.g. anchors, lookarounds,
counters, backreferences. The properties of combinations of such extensions
have been subject of active recent research.
In the current paper we present a symbolic derivatives based approach to
finding matches to regular expressions that, in addition to the classical
regular constructs, also support complement, intersection and lookarounds (both
negative and positive lookaheads and lookbacks). The theory of computing
symbolic derivatives and determining nullability given an input string is
presented that shows that such a combination of extensions yields a match
semantics that corresponds to an effective Boolean algebra, which in turn opens
up possibilities of applying various Boolean logic rewrite rules to optimize
the search for matches.
In addition to the theoretical framework we present an implementation of the
combination of extensions to demonstrate the efficacy of the approach
accompanied with practical examples
Merkityn kaksoisnegaation sovellukset
Nested complementation plays an important role in expressing counter- i.e. star-free and first-order definable languages and their hierarchies. In addition, methods that compile phonological rules into finite-state networks use double-nested complementation or "double negation". This paper reviews how the double-nested complementation extends to a relatively new operation, generalized restriction (GR), coined by the author. ... The paper demonstrates that the GR operation has an interesting potential in expressing regular languages, various kinds of grammars, bimorphisms and relations. This motivates a further study of optimized implementation of the operation.Non peer reviewe
P-model Alternative to the T-model
Standard linguistic analysis of syntax uses the T-model. This model
requires the ordering: D-structure S-structure LF,
where D-structure is the deep structure,
S-structure is the surface structure, and LF is logical form.
Between each of these representations there is movement which alters
the order of the constituent words; movement is achieved using the principles
and parameters of syntactic theory. Psychological analysis of sentence
production is usually either serial or connectionist. Psychological serial
models do not accommodate the T-model immediately so that here a new model
called the P-model is introduced. The P-model is different from previous
linguistic and psychological models. Here it is argued that the LF
representation should be replaced by a variant
of Frege's three qualities (sense, reference, and force),
called the Frege representation or F-representation.
In the F-representation the order of elements is not necessarily the same as
that in LF and it is suggested that the correct ordering is:
F-representation D-structure S-structure.
This ordering appears to lead to a more natural
view of sentence production and processing. Within this framework movement
originates as the outcome of emphasis applied to the sentence. The
requirement that the F-representation precedes the D-structure needs a picture
of the particular principles and parameters which pertain to movement of words
between representations. In general this would imply that there is a
preferred or optimal ordering of the symbolic string in the F-representation.
The standard ordering is retained because the general way of producing
such an optimal ordering is unclear. In this case it is possible to produce
an analysis of movement between LF and D-structure similar to the usual
analysis of movement between S-structure and LF.
It is suggested that a maximal amount of information about
a language's grammar and lexicon is stored,
because of the necessity of analyzing corrupted data
A Novel Algorithm Combining Finite State Method and Genetic Algorithm for Solving Crude Oil Scheduling Problem
A hybrid optimization algorithm combining finite state method (FSM) and genetic algorithm (GA) is proposed to solve the crude oil scheduling problem. The FSM and GA are combined to take the advantage of each method and compensate deficiencies of individual methods. In the proposed algorithm, the finite state method makes up for the weakness of GA which is poor at local searching ability. The heuristic returned by the FSM can guide the GA algorithm towards good solutions. The idea behind this is that we can generate promising substructure or partial solution by using FSM. Furthermore, the FSM can guarantee that the entire solution space is uniformly covered. Therefore, the combination of the two algorithms has better global performance than the existing GA or FSM which is operated individually. Finally, a real-life crude oil scheduling problem from the literature is used for conducting simulation. The experimental results validate that the proposed method outperforms the state-of-art GA method
Stream Processing using Grammars and Regular Expressions
In this dissertation we study regular expression based parsing and the use of
grammatical specifications for the synthesis of fast, streaming
string-processing programs.
In the first part we develop two linear-time algorithms for regular
expression based parsing with Perl-style greedy disambiguation. The first
algorithm operates in two passes in a semi-streaming fashion, using a constant
amount of working memory and an auxiliary tape storage which is written in the
first pass and consumed by the second. The second algorithm is a single-pass
and optimally streaming algorithm which outputs as much of the parse tree as is
semantically possible based on the input prefix read so far, and resorts to
buffering as many symbols as is required to resolve the next choice. Optimality
is obtained by performing a PSPACE-complete pre-analysis on the regular
expression.
In the second part we present Kleenex, a language for expressing
high-performance streaming string processing programs as regular grammars with
embedded semantic actions, and its compilation to streaming string transducers
with worst-case linear-time performance. Its underlying theory is based on
transducer decomposition into oracle and action machines, and a finite-state
specialization of the streaming parsing algorithm presented in the first part.
In the second part we also develop a new linear-time streaming parsing
algorithm for parsing expression grammars (PEG) which generalizes the regular
grammars of Kleenex. The algorithm is based on a bottom-up tabulation algorithm
reformulated using least fixed points and evaluated using an instance of the
chaotic iteration scheme by Cousot and Cousot