11 research outputs found

    Transducers from Rewrite Rules with Backreferences

    Full text link
    Context sensitive rewrite rules have been widely used in several areas of natural language processing, including syntax, morphology, phonology and speech processing. Kaplan and Kay, Karttunen, and Mohri & Sproat have given various algorithms to compile such rewrite rules into finite-state transducers. The present paper extends this work by allowing a limited form of backreferencing in such rules. The explicit use of backreferencing leads to more elegant and general solutions.Comment: 8 pages, EACL 1999 Berge

    Building a Syllabic Analyzer for Persian Using Finite State Transducers

    Get PDF
    Persian follows a concatenative morphology, where new morphemes are generated by chaining different morphemes together to form a new compound word.  Whenever, two morphemes bind to form a new morpheme, there is a possibility that the syllables at the morpheme boundaries undergo structural change.  This study suggests that these syllabic alterations may be captured using a finite state approach.  It further argues that syllabification may be incorporated into the process of lexicon building.  This approach allows the syllabification rules to be encoded in the lexical knowledge, when a lexicon is built using the finite state methods.  The rules captured here can also assist the processing of syllabic alterations in word boundaries as well.  It is particularly useful to process meter in Persian poetry

    Annotated Bibliography for the DEWPOINT project

    Get PDF
    This bibliography covers aspects of the Detection and Early Warning of Proliferation from Online INdicators of Threat (DEWPOINT) project including 1) data management and querying, 2) baseline and advanced methods for classifying free text, and 3) algorithms to achieve the ultimate goal of inferring intent from free text sources. Metrics for assessing the quality and correctness of classification are addressed in the second group. Data management and querying include methods for efficiently storing, indexing, searching, and organizing the data we expect to operate on within the DEWPOINT project

    An Extendible Regular Expression Compiler for Finite-state Approaches in Natural Language Processing

    No full text
    Finite-state techniques are widely used in various areas of Natural Language Processing (NLP). As Kaplan and Kay [12] have argued, regular expressions are the appropriate level of abstraction for thinking about finite-state languages and finite-state relations. More complex finite-state operations (such as contexted replacement) are defined on the basis of basic operations (such as Kleene closure, complementation, composition). In order to be able to experiment with such complex finite-state operations the FSA Utilities (version 5) provides an extendible regular expression compiler. The paper discusses the regular expression operations provided by the compiler, and the possibilities to create new regular expression operators. The benefits of such an extendible regular expression compiler are illustrated with a number of examples taken from recent publications in the area of finite-state approaches to NLP
    corecore