Search CORE

9 research outputs found

From Regular Expression Matching to Parsing

Author: Bille Philip
Gørtz Inge Li
Publication venue
Publication date: 29/01/2019
Field of study

Given a regular expression

R

and a string

Q

, the regular expression parsing problem is to determine if

Q

matches

R

and if so, determine how it matches, e.g., by a mapping of the characters of

Q

to the characters in

R

. Regular expression parsing makes finding matches of a regular expression even more useful by allowing us to directly extract subpatterns of the match, e.g., for extracting IP-addresses from internet traffic analysis or extracting subparts of genomes from genetic data bases. We present a new general techniques for efficiently converting a large class of algorithms that determine if a string

Q

matches regular expression

R

into algorithms that can construct a corresponding mapping. As a consequence, we obtain the first efficient linear space solutions for regular expression parsing

arXiv.org e-Print Archive

Online Research Database In Technology

NFAs with Tagged Transitions, their Conversion to Deterministic Automata and Application to Regular Expressions

Author
Publication venue
Publication date
Field of study

A conservative extension to traditional nondeterministic finite automata is proposed to keep track of the positions in the input string for the last uses of selected transitions, by adding ”tags ” to transitions. The resulting automata are reminiscent of nondeterministic Mealy machines. Formal semantics of automata with tagged transitions is given. An algorithm is given to convert these augmented automata to corresponding deterministic automata, which can be used to process strings efficiently. Application to regular expressions is discussed, explaining how the algorithms can be used to implement for example substring addressing and a lookahead operator, and some informal comparison to other widely used algorithms is done. 1

CiteSeerX

Leaving no stone unturned: flexible retrieval of idiomatic expressions from a large text corpus

Author: Filimonov Maxim
Hughes Callum
Spasic Irena
Wray Alison
Publication venue: 'MDPI AG'
Publication date: 03/03/2021
Field of study

Idioms are multi-word expressions whose meaning cannot always be deduced from the literal meaning of constituent words. A key feature of idioms that is central to this paper is their peculiar mixture of fixedness and variability, which poses challenges for their retrieval from large corpora using traditional search approaches. These challenges hinder insights into idiom usage affecting users who are conducting linguistic research as well as those involved in language ed-ucation. To facilitate access to idioms examples taken from real-world contexts, we introduce an information retrieval system designed specifically for idioms. Given a search query that represents an idiom, typically in its canonical form, the system expands it automatically to account for the most common types of idiom variation including inflection, open slots, adjectival or adverbial modification, and passivisation. As a by-product of query expansion, other types of idiom varia-tion captured include derivation, compounding, negation, distribution across multiple clauses as well as other unforeseen types of variation. The system was implemented on top of Elasticsearch, an open-source, distributed, scalable, real-time search engine. Flexible retrieval of idioms is supported by a combination of linguistic pre-processing of the search queries, their translation into a set of query clauses written in a query language called Query DSL, and analysis, an indexing process that involves tokenisation and normalisation. Our system outperformed the phrase search in terms of recall and outperformed the keyword search in terms of precision. Out of the three, our approach was found to provide the best balance between precision and recall. By providing a fast and easy way of finding idioms in large corpora, our approach can facilitate further developments in fields such as linguistics, language education and natural language processing. Keywords: information retrieval; natural language processing; corpus linguistics; multi-word expressions; idiom

Multidisciplinary Digital Publishing Institute

Online Research @ Cardiff

A text pattern-matching tool based on Parsing Expression Grammars

Author: Aho
Clarke
Ford
Griswold
Hagen
Hutton
Ierusalimschy
Ierusalimschy
Knuth
Laurikari
Thompson
Wall
Publication venue: 'Wiley'
Publication date
Field of study

Crossref

Type inference for unique pattern matching

Author: Abiteboul S.
Book R.
Elgaard J.
Frisch A.
Frisch A.
Frisch A.
Frisch A.
Laurikari V.
Levin M. Y.
Murata M.
Murata M.
Neumann A.
Neven F.
Stijn Vansummeren
Tabuchi N.
Vianu V.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref