19 research outputs found

    Tehokas sisäänpäindeterministisiin automaatteihin perustuva Constraint Grammar -jäsennin

    Get PDF
    Proceeding volume: 14 (2011)Pappret conceptualizes parsning med Constraint Grammar på ett nytt sätt som en process med två viktiga representationer. En representation innehåller lokala tvetydighet och den andra sammanfattar egenskaperna hos den lokala tvetydighet klasser. Båda representationer manipuleras med ren finite-state metoder, men deras samtrafik är en ad hoc -tillämpning av rationella potensserier. Den nya tolkningen av parsning systemet har flera praktiska fördelar, bland annat det inåt deterministiska sättet att beräkna, representera och räkna om alla potentiella tillämpningar av reglerna i meningen.Paperi uudelleenkonseptualisoi Constraint Grammarin sellaisena viitekehyksenä, jossa säännöt tarkentavat paikallisen ambiguiteetin tiivistä esitysmuotoa samalla kun sääntöjen ehdot sovitetaan piirrevektoreita vasten, jotka esittävät tiivistetyjen esitysmuotojen summia. Molemmat näkökulmat monitulkintaisuuteen käsitellään käyttäen puhtaita (pure) äärellistilaisia operaatioita. Tiivis esitysmuoto kuvataan piirrevektoreihin rationaalisten potenssisarjojen avulla. Tämä yhteys ei ole yhtään vähemmän puhdas kuin aikaisemmin vallalla ollut tulkinta, jonka edellyttää että leksikaalisen transduktorin tuottama sanan luentajoukko maagisesti linearisoidaan merkatuksi luentojen peräkkäinasetteluksi, joka syötetään puhtaille (äärellistilaisille) transduktoreille. Esitetyllä lähestymistavalla on useita käytännöllisiä etuja: mm. sisäänpäin deterministinen tapa laskea, esittää ja ylläpitää kaikki mahdolliset kohdat, joissa säännöt voivat soveltua virkkeeseen.The paper reconceptualizes Constraint Grammar as a framework where the rules refine the compact representations of local ambiguity while the rule conditions are matched against a string of feature vectors that summarize the compact representations. Both views to the ambiguity are processed with pure finite-state operations. The compact representations are mapped to feature vectors with the aid of a rational power series. This magical interconnection is not less pure than a prevalent interpretation that requires that the reading set provided by a lexical transducer is magically linearized to a marked concatenation of readings given to pure transducers. The current approach has several practical benefits, including the inward deterministic way to compute, represent and maintain all the applications of the rules in the sentence.Peer reviewe

    On Equivalence and Uniformisation Problems for Finite Transducers

    Get PDF
    Transductions are binary relations of finite words. For rational transductions, i.e., transductions defined by finite transducers, the inclusion, equivalence and sequential uniformisation problems are known to be undecidable. In this paper, we investigate stronger variants of inclusion, equivalence and sequential uniformisation, based on a general notion of transducer resynchronisation, and show their decidability. We also investigate the classes of finite-valued rational transductions and deterministic rational transductions, which are known to have a decidable equivalence problem. We show that sequential uniformisation is also decidable for them

    Merkityn kaksoisnegaation sovellukset

    Get PDF
    Nested complementation plays an important role in expressing counter- i.e. star-free and first-order definable languages and their hierarchies. In addition, methods that compile phonological rules into finite-state networks use double-nested complementation or "double negation". This paper reviews how the double-nested complementation extends to a relatively new operation, generalized restriction (GR), coined by the author. ... The paper demonstrates that the GR operation has an interesting potential in expressing regular languages, various kinds of grammars, bimorphisms and relations. This motivates a further study of optimized implementation of the operation.Non peer reviewe

    A bibliography on formal languages and related topics

    Get PDF

    A bibliography on formal languages and related topics

    Get PDF

    A bibliography on formal languages and related topics

    Get PDF

    A bibliography on formal languages and related topics

    Get PDF

    Two-wayness: Automata and Transducers

    Get PDF
    This PhD is about two natural extensions of Finite Automata (FA): the 2-way fa (2FA) and the 2-way transducers (2T). It is well known that 2FA s are computably equivalent to FAs, even in their nondeterministic (2nfa) variant. However, in the field of descriptional complexity, some questions remain. Raised by Sakoda and Sipser in 1978, the question of the cost of the simulation of 2NFA by 2DFA (the deterministic variant of 2FA) is still open. In this manuscript, we give an answer in a restricted case in which the nondeterministic choices of the simulated 2NFA may occur at the boundaries of the input tape only (2ONFA). We show that every 2ONFA can be simulated by a 2DFA of subexponential (but superpolynomial) size. Under the assumptions L=NL, this cost is reduced to the polynomial level. Moreover, we prove that the complementation and the simulation by a halting 2ONFA is polynomial. We also consider the anologous simulations for alternating devices. Providing a one-way write-only output tape to FAs leads to the notion of transducer. Contrary to the case of finite automata which are acceptor, 2-way transducers strictly extends the computational power of 1-way one, even in the case where both the input and output alphabets are unary. Though 1-way transducers enjoy nice properties and characterizations (algebraic, logical, etc. . . ), 2-way variants are less known, especially the nondeterministic case. In this area, this manuscript gives a new contribution: an algebraic characterization of the relations accepted by two-way transducers when both the input and output alphabets are unary. Actually, it can be reformulated as follows: each unary two-way transducer is equivalent to a sweeping (and even rotating) transducer. We also show that the assumptions made on the size of the alphabets are required, that is, sweeping transducers weakens the 2-way transducers whenever at least one of the alphabet is non-unary. On the path, we discuss on the computational power of some algebraic operations on word relations, introduced in the aim of describing the behavior of 2-way transducers or, more generally, of 2-way weighted automata. In particular, the mirror operation, consisting in reversing the input word in order to describe a right to left scan, draws our attention. Finally, we study another kind of operations, more adapted for binary word relations: the composition. We consider the transitive closure of relations. When the relation belongs to some very restricted sub-family of rational relations, we are able to compute its transitive closure and we set its complexity. This quickly becomes uncomputable when higher classes are considered

    Proceedings

    Get PDF
    Proceedings of the NODALIDA 2011 Workshop Constraint Grammar Applications. Editors: Eckhard Bick, Kristin Hagen, Kaili Müürisep, Trond Trosterud. NEALT Proceedings Series, Vol. 14 (2011), vi+69 pp. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/19231

    Probabilistic Modelling of Morphologically Rich Languages

    Full text link
    This thesis investigates how the sub-structure of words can be accounted for in probabilistic models of language. Such models play an important role in natural language processing tasks such as translation or speech recognition, but often rely on the simplistic assumption that words are opaque symbols. This assumption does not fit morphologically complex language well, where words can have rich internal structure and sub-word elements are shared across distinct word forms. Our approach is to encode basic notions of morphology into the assumptions of three different types of language models, with the intention that leveraging shared sub-word structure can improve model performance and help overcome data sparsity that arises from morphological processes. In the context of n-gram language modelling, we formulate a new Bayesian model that relies on the decomposition of compound words to attain better smoothing, and we develop a new distributed language model that learns vector representations of morphemes and leverages them to link together morphologically related words. In both cases, we show that accounting for word sub-structure improves the models' intrinsic performance and provides benefits when applied to other tasks, including machine translation. We then shift the focus beyond the modelling of word sequences and consider models that automatically learn what the sub-word elements of a given language are, given an unannotated list of words. We formulate a novel model that can learn discontiguous morphemes in addition to the more conventional contiguous morphemes that most previous models are limited to. This approach is demonstrated on Semitic languages, and we find that modelling discontiguous sub-word structures leads to improvements in the task of segmenting words into their contiguous morphemes.Comment: DPhil thesis, University of Oxford, submitted and accepted 2014. http://ora.ox.ac.uk/objects/uuid:8df7324f-d3b8-47a1-8b0b-3a6feb5f45c
    corecore