Search CORE

3,966 research outputs found

Stream Processing using Grammars and Regular Expressions

Author: Rasmussen Ulrik Terp
Publication venue
Publication date: 01/01/2016
Field of study

In this dissertation we study regular expression based parsing and the use of grammatical specifications for the synthesis of fast, streaming string-processing programs. In the first part we develop two linear-time algorithms for regular expression based parsing with Perl-style greedy disambiguation. The first algorithm operates in two passes in a semi-streaming fashion, using a constant amount of working memory and an auxiliary tape storage which is written in the first pass and consumed by the second. The second algorithm is a single-pass and optimally streaming algorithm which outputs as much of the parse tree as is semantically possible based on the input prefix read so far, and resorts to buffering as many symbols as is required to resolve the next choice. Optimality is obtained by performing a PSPACE-complete pre-analysis on the regular expression. In the second part we present Kleenex, a language for expressing high-performance streaming string processing programs as regular grammars with embedded semantic actions, and its compilation to streaming string transducers with worst-case linear-time performance. Its underlying theory is based on transducer decomposition into oracle and action machines, and a finite-state specialization of the streaming parsing algorithm presented in the first part. In the second part we also develop a new linear-time streaming parsing algorithm for parsing expression grammars (PEG) which generalizes the regular grammars of Kleenex. The algorithm is based on a bottom-up tabulation algorithm reformulated using least fixed points and evaluated using an instance of the chaotic iteration scheme by Cousot and Cousot

arXiv.org e-Print Archive

Copenhagen University Research Information System

An Intelligent Text Extraction and Navigation System

Author: Günter Neumann
Jakub Piskorski
Publication venue
Publication date: 01/01/1999
Field of study

We present sppc, a high-performance system for intelligent text extraction and navigation from German free text documents. The main purpose of sppc is to extract as much linguistic structure as possible for performing domain-specific processing. sppc consists of a set of domain-independent shallow core components which are realized by means of cascaded weighted finite state machines and generic dynamic tries. All extracted information is represented uniformly in one data structure (called the text chart) in a highly compact and linked form in order to support indexing and navigation through the set of solutions. Germa

CiteSeerX

Proceedings

Author: Bick Eckhard
Hagen Kristin
Müürisep Kaili
Trosterud Trond
Publication venue
Publication date: 17/11/2011
Field of study

Proceedings of the NODALIDA 2011 Workshop Constraint Grammar Applications. Editors: Eckhard Bick, Kristin Hagen, Kaili Müürisep, Trond Trosterud. NEALT Proceedings Series, Vol. 14 (2011), vi+69 pp. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/19231

DSpace at Tartu University Library

Tehokas sisäänpäindeterministisiin automaatteihin perustuva Constraint Grammar -jäsennin

Author: Yli-Jyrä Anssi Mikael
Publication venue: Northern European Association for Language Technology
Publication date: 17/11/2011
Field of study

Proceeding volume: 14 (2011)Pappret conceptualizes parsning med Constraint Grammar på ett nytt sätt som en process med två viktiga representationer. En representation innehåller lokala tvetydighet och den andra sammanfattar egenskaperna hos den lokala tvetydighet klasser. Båda representationer manipuleras med ren finite-state metoder, men deras samtrafik är en ad hoc -tillämpning av rationella potensserier. Den nya tolkningen av parsning systemet har flera praktiska fördelar, bland annat det inåt deterministiska sättet att beräkna, representera och räkna om alla potentiella tillämpningar av reglerna i meningen.Paperi uudelleenkonseptualisoi Constraint Grammarin sellaisena viitekehyksenä, jossa säännöt tarkentavat paikallisen ambiguiteetin tiivistä esitysmuotoa samalla kun sääntöjen ehdot sovitetaan piirrevektoreita vasten, jotka esittävät tiivistetyjen esitysmuotojen summia. Molemmat näkökulmat monitulkintaisuuteen käsitellään käyttäen puhtaita (pure) äärellistilaisia operaatioita. Tiivis esitysmuoto kuvataan piirrevektoreihin rationaalisten potenssisarjojen avulla. Tämä yhteys ei ole yhtään vähemmän puhdas kuin aikaisemmin vallalla ollut tulkinta, jonka edellyttää että leksikaalisen transduktorin tuottama sanan luentajoukko maagisesti linearisoidaan merkatuksi luentojen peräkkäinasetteluksi, joka syötetään puhtaille (äärellistilaisille) transduktoreille. Esitetyllä lähestymistavalla on useita käytännöllisiä etuja: mm. sisäänpäin deterministinen tapa laskea, esittää ja ylläpitää kaikki mahdolliset kohdat, joissa säännöt voivat soveltua virkkeeseen.The paper reconceptualizes Constraint Grammar as a framework where the rules refine the compact representations of local ambiguity while the rule conditions are matched against a string of feature vectors that summarize the compact representations. Both views to the ambiguity are processed with pure finite-state operations. The compact representations are mapped to feature vectors with the aid of a rational power series. This magical interconnection is not less pure than a prevalent interpretation that requires that the reading set provided by a lexical transducer is magically linearized to a marked concatenation of readings given to pure transducers. The current approach has several practical benefits, including the inward deterministic way to compute, represent and maintain all the applications of the rules in the sentence.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

DSpace at Tartu University Library

Collapse of transient gels in colloid-polymer mixtures

Author: Starrs Laura
Publication venue: The University of Edinburgh
Publication date: 01/01/1999
Field of study

Edinburgh Research Archive

A Semi-automatic and Low Cost Approach to Build Scalable Lemma-based Lexical Resources for Arabic Verbs

Author: Abdelali Ahmed,
Doumi Noureddine
Lehireche Ahmed
Maurel Denis
Publication venue: IJIT
Publication date: 01/02/2016
Field of study

International audienceThis work presents a method that enables Arabic NLP community to build scalable lexical resources. The proposed method is low cost and efficient in time in addition to its scalability and extendibility. The latter is reflected in the ability for the method to be incremental in both aspects, processing resources and generating lexicons. Using a corpus; firstly, tokens are drawn from the corpus and lemmatized. Secondly, finite state transducers (FSTs) are generated semi-automatically. Finally, FSTsare used to produce all possible inflected verb forms with their full morphological features. Among the algorithm’s strength is its ability to generate transducers having 184 transitions, which is very cumbersome, if manually designed. The second strength is a new inflection scheme of Arabic verbs; this increases the efficiency of FST generation algorithm. The experimentation uses a representative corpus of Modern Standard Arabic. The number of semi-automatically generated transducers is 171. The resulting open lexical resources coverage is high. Our resources cover more than 70% Arabic verbs. The built resources contain 16,855 verb lemmas and 11,080,355 fully, partially and not vocalized verbal inflected forms. All these resources are being made public and currently used as an open package in the Unitex framework available under the LGPL license

Directory of Open Access Journals

HAL Descartes

HAL Université de Tours

Hal-Diderot

Generalising tree traversals and tree transformations to DAGs:Exploiting sharing without the pain

Author: Abbott
Aho
Anantharaman
Axelsson
Axelsson
Bahr
Bahr
Bahr
Bahr
Bahr
Bahr
Barr
Bille
Bird
Bird
Buneman
Busatto
Charatonik
Charikar
Chlipala
Downey
Eisenberg
Ekman
Emil Axelsson
Engelfriet
Engelfriet
Fila
Fokker
Fors
Fujiyoshi
Fülöp
Gibbons
Gill
Hasuo
Hedin
Hidaka
Johann
Kamimura
Khedker
Khedker
Kiselyov
Knuth
Knuth
Kobayashi
Kuiper
Lerner
Lewis
Lohrey
Magnusson
Maneth
Meijer
Middelkoop
Moor
Morrison
Oliveira
Oliveira
Patrick Bahr
Peyton Jones
Pierce
Quernheim
Raoult
Rompf
Rosendahl
Sakr
Saraiva
Sloane
Uustalu
Viera
Vogt
Wilson
Yakushev
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

We present a recursion scheme based on attribute grammars that can be transparently applied to trees and acyclic graphs. Our recursion scheme allows the programmer to implement a tree traversal or a tree transformation and then apply it to compact graph representations of trees instead. The resulting graph traversal or graph transformation avoids recomputation of intermediate results for shared nodes – even if intermediate results are used in different contexts. Consequently, this approach leads to asymptotic speedup proportional to the compression provided by the graph representation. In general, however, this sharing of intermediate results is not sound. Therefore, we complement our implementation of the recursion scheme with a number of correspondence theorems that ensure soundness for various classes of traversals. We illustrate the practical applicability of the implementation as well as the complementing theory with a number of examples

CiteSeerX

Crossref

The IT University of Copenhagen's Repository

Chalmers Research

Chalmers Publication Library