2,334 research outputs found

    An Abstract Machine for Unification Grammars

    Full text link
    This work describes the design and implementation of an abstract machine, Amalia, for the linguistic formalism ALE, which is based on typed feature structures. This formalism is one of the most widely accepted in computational linguistics and has been used for designing grammars in various linguistic theories, most notably HPSG. Amalia is composed of data structures and a set of instructions, augmented by a compiler from the grammatical formalism to the abstract instructions, and a (portable) interpreter of the abstract instructions. The effect of each instruction is defined using a low-level language that can be executed on ordinary hardware. The advantages of the abstract machine approach are twofold. From a theoretical point of view, the abstract machine gives a well-defined operational semantics to the grammatical formalism. This ensures that grammars specified using our system are endowed with well defined meaning. It enables, for example, to formally verify the correctness of a compiler for HPSG, given an independent definition. From a practical point of view, Amalia is the first system that employs a direct compilation scheme for unification grammars that are based on typed feature structures. The use of amalia results in a much improved performance over existing systems. In order to test the machine on a realistic application, we have developed a small-scale, HPSG-based grammar for a fragment of the Hebrew language, using Amalia as the development platform. This is the first application of HPSG to a Semitic language.Comment: Doctoral Thesis, 96 pages, many postscript figures, uses pstricks, pst-node, psfig, fullname and a macros fil

    Machine Translation Using Automatically Inferred Construction-Based Correspondence and Language Models

    Get PDF
    PACLIC 23 / City University of Hong Kong / 3-5 December 200

    The Nature of Verbs in Sign Languages: A Role and Reference Grammar Account of Irish Sign Language Verbs

    Get PDF
    This paper is concerned with the special nature of sign language verbs, in particular to this research, Irish Sign Language verbs. We use Role and Reference Grammar to provide a definition of the structure of lexical entries that are sufficiently rich and robust in nature to represent Irish Sign Language verbs. Role and Reference Grammar takes language to be a system of communicative social action, and accordingly, analysing the communicative functions of grammatical structures plays a vital role in grammatical description and theory from this perspective. This work is part of research on the development of a linguistically motivated computational framework for Irish Sign Language. In providing a definition of a linguistically motivated computational model for Irish Sign Language we must be able to refer to the various articulators (hands, fingers, eyes, eyebrows etc.), as these are what we use to articulate the various phonemes, morphemes and lexemes of an utterance. Irish Sign Language is a visual gestural language. The fact that Irish Sign Language has no written or oral form means that, for us to represent an Irish Sign Language utterance in computational terms we must implement the use of a humanoid avatar capable of movement within threedimensional space. Here, we provide an account of the grammatical information that can be found within Irish Sign Language verbs. We use the Signs of Ireland corpus to access the relevant linguistic data pertinent to Irish Sign Language. Further to this we use ELAN software as an application tool, which allows us to view the corpus and collate relevant linguistic phenomena pertinent to Irish Sign Language. We utilise the Event Visibility Hypothesis in the development of our proposed lexicon architecture

    The Persuasive Tutor: a BDI Teaching Agent with Role and Reference Grammar Language Interface – Sustainable design of a conversational agent for language learning

    Get PDF
    This paper investigates how an intelligent teaching agent with Role and Reference Grammar [RRG] (cf. Van Valin 2005) as linguistic engine can support language learning. Based on a user-centred empirical design study the architecture of a highly persuasive tool for language learning as an extension of PLOTLearner (http://europlot.blogspot.dk/2012/07/try-plotlearner-2.html) is developed. Based on grounded theory it is shown that feedback and support is of greatest importance even in self-directed computer assisted language learning. Is also shown how this overall approach to language learning can be situated into traditional conversation based learning theories (cf. Laurillard 2009). It is shown that a computationally adequate model of the RRG-linking algorithm, extended into a computational processing model, can account for communication between a learner and the software by employing conceptual graphs to represent mental states in the software agent and the important role of speech acts is emphasized in this context

    The case of restricted locatives

    Get PDF
    International audienceThis paper examines the cross-linguistic phenomenon of locative case restricted to a closed class of items (L-nouns). Starting with Latin, I suggest that the restriction is semantic in nature: L-nouns denote in the spatial domain and hence can be used as locatives without further material. I show how the independently motivated hypothesis that directional PPs consist of two layers, Path and Place, explains the directional uses of L-nouns and the cases that are assigned then, and locate the source of the locative case itself in p 0 , for which I then provide a clear semantic contribution: a type-shift from the domain of loci to the object domain. I then examine cross-linguistic restrictions on the use of locative case and show that the patterns observed can be accounted for on the same assumptions

    A Transition-Based Directed Acyclic Graph Parser for UCCA

    Full text link
    We present the first parser for UCCA, a cross-linguistically applicable framework for semantic representation, which builds on extensive typological work and supports rapid annotation. UCCA poses a challenge for existing parsing techniques, as it exhibits reentrancy (resulting in DAG structures), discontinuous structures and non-terminal nodes corresponding to complex semantic units. To our knowledge, the conjunction of these formal properties is not supported by any existing parser. Our transition-based parser, which uses a novel transition set and features based on bidirectional LSTMs, has value not just for UCCA parsing: its ability to handle more general graph structures can inform the development of parsers for other semantic DAG structures, and in languages that frequently use discontinuous structures.Comment: 16 pages; Accepted as long paper at ACL201

    Probabilistic Modelling of Morphologically Rich Languages

    Full text link
    This thesis investigates how the sub-structure of words can be accounted for in probabilistic models of language. Such models play an important role in natural language processing tasks such as translation or speech recognition, but often rely on the simplistic assumption that words are opaque symbols. This assumption does not fit morphologically complex language well, where words can have rich internal structure and sub-word elements are shared across distinct word forms. Our approach is to encode basic notions of morphology into the assumptions of three different types of language models, with the intention that leveraging shared sub-word structure can improve model performance and help overcome data sparsity that arises from morphological processes. In the context of n-gram language modelling, we formulate a new Bayesian model that relies on the decomposition of compound words to attain better smoothing, and we develop a new distributed language model that learns vector representations of morphemes and leverages them to link together morphologically related words. In both cases, we show that accounting for word sub-structure improves the models' intrinsic performance and provides benefits when applied to other tasks, including machine translation. We then shift the focus beyond the modelling of word sequences and consider models that automatically learn what the sub-word elements of a given language are, given an unannotated list of words. We formulate a novel model that can learn discontiguous morphemes in addition to the more conventional contiguous morphemes that most previous models are limited to. This approach is demonstrated on Semitic languages, and we find that modelling discontiguous sub-word structures leads to improvements in the task of segmenting words into their contiguous morphemes.Comment: DPhil thesis, University of Oxford, submitted and accepted 2014. http://ora.ox.ac.uk/objects/uuid:8df7324f-d3b8-47a1-8b0b-3a6feb5f45c

    Code switching as a communicative strategy of the Lubavitcher emissaries working with Jewish American students

    Get PDF
    Wydział AnglistykiGłównym celem niniejszego badania jest stwierdzenie, które czynniki strukturalne i społeczne charakteryzują „code switching” bilingwalnych (Normatywny Angielski i „Żydowski Angielski”) emisariuszy. Niniejsza praca analizuje zachowania językowe specyficznej grupy osób etnicznie dwujęzycznych, skupiając się na zjawisku „code switching” (CS), czyli zmiany kodu językowego. Badana dwujęzyczna grupa etniczna to mężczyźni - żydowscy emisariusze ruchu Chabad Lubawicz, działający na kampusach uniwersyteckich w Stanach Zjednoczonych, którzy często przechodzą z Normatywnego Angielskiego (NE) na „Żydowski Angielski” (JE; używany głównie w dzielnicy Brooklyn w Nowym Jorku). „Code switching” to bardzo często obserwowane zjawisko językowe o charakterze socio-pragmatcznym (SP), występujące głównie w społecznościach wielojęzycznych i wielokulturowych. Celem pracy badawczej było ustalenie co skłania emisariuszy do zmiany kodu językowego, jakie są ich kompetencje w tym zakresie i okoliczności, w których mają tendencję to zmiany kodu, oraz analiza innych pokrewnych zachowań językowych charakterystycznych dla tej bilingwalnej grupy etnicznej.The main aim investigates in this study is which factors, structural and social, characterizes JE-NE bilingual emissaries' codeswitching. The research paper analyzes the linguistic behavior of a specific group of ethnic bilinguals, focusing on the phenomenon of code switching (CS). The ethnic bilinguals studied were male Jewish Lubavitch emissaries in university campuses in the United States, who often switch between Normative English (NE) and “Jewish English” (JE; primarily spoken in Brooklyn, N.Y.). Code switching is a widely observed socio-pragmatic linguistic phenomenon, especially in multilingual and multicultural communities. The research explores the emissaries’ motivation to switch codes, their competence in code switching, the circumstances in which they are prone to switch their code, and other relevant linguistic behavior of this specific group of ethnic bilinguals. The research reveals that CS is a minor psycho-linguistic need at various gatherings, as both a conscious and unconscious act of teaching the emissaries’ language to their interlocutors. The research investigates the strategies used by the emissaries when lecturing and communicating with the students. The research also explores CS domain, directionality, motivation, and syntactic constraints in light of various CS theories. Thus, the research examines the compatibility of the CS of the Lubavitch emissaries within the existing linguistic theories and searches for counter-examples of these theories. The discussion and findings of this study expands familiarity with, and understanding of, the wider linguistic phenomenon of code switching in general
    corecore