228 research outputs found

    Ordering prenominal modifiers with a ranking approach

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 57-58).In this thesis we present a solution to the natural language processing task of ordering prenominal modifiers, a problem that has applications from machine translation to natural language generation. In machine translation, constraints on modifier orderings vary from language to language so some reordering of modifiers may be necessary. In natural language generation, a representation of an object and its properties often needs to be formulated into a concrete noun phrase. We detail a novel approach that frames this task as a ranking problem amongst the permutations of a set of modifiers, admitting arbitrary features on each candidate permutation and exploiting hundreds of thousands of features in total. We compare our methods to a state-of-the-art class based ordering approach and a strong baseline that makes use of the Google n-gram corpus. We attain a maximum error reduction of 69.8% and average error reduction across all test sets of 59.1% compared to the state-of-the-art, and we attain a maximum error reduction of 68.4% and average error reduction across all test sets of 41.8% compared to our Google n-gram baseline. Finally, we present an analysis of our approach as compared to our baselines and describe several potential improvements to our system.by Jingyi Liu.M.Eng

    Proceedings of the 12th European Workshop on Natural Language Generation (ENLG 2009)

    Get PDF

    Statistical langauge models for alternative sequence selection

    No full text

    ANNOTATED DISJUNCT FOR MACHINE TRANSLATION

    Get PDF
    Most information found in the Internet is available in English version. However, most people in the world are non-English speaker. Hence, it will be of great advantage to have reliable Machine Translation tool for those people. There are many approaches for developing Machine Translation (MT) systems, some of them are direct, rule-based/transfer, interlingua, and statistical approaches. This thesis focuses on developing an MT for less resourced languages i.e. languages that do not have available grammar formalism, parser, and corpus, such as some languages in South East Asia. The nonexistence of bilingual corpora motivates us to use direct or transfer approaches. Moreover, the unavailability of grammar formalism and parser in the target languages motivates us to develop a hybrid between direct and transfer approaches. This hybrid approach is referred as a hybrid transfer approach. This approach uses the Annotated Disjunct (ADJ) method. This method, based on Link Grammar (LG) formalism, can theoretically handle one-to-one, many-to-one, and many-to-many word(s) translations. This method consists of transfer rules module which maps source words in a source sentence (SS) into target words in correct position in a target sentence (TS). The developed transfer rules are demonstrated on English → Indonesian translation tasks. An experimental evaluation is conducted to measure the performance of the developed system over available English-Indonesian MT systems. The developed ADJ-based MT system translated simple, compound, and complex English sentences in present, present continuous, present perfect, past, past perfect, and future tenses with better precision than other systems, with the accuracy of 71.17% in Subjective Sentence Error Rate metric

    Subordinating and Coordinating Linkers

    Get PDF
    This thesis is concerned with syntactic mechanisms for the marking of grammatical relationships. It is argued that there is a class of semantically vacuous functional heads serving only as a syntactic means of marking such relationships – either subordination or coordination. These heads are known as linkers. Through studying restrictions on the structural and linear distribution of linkers cross-linguistically, the thesis sheds light on varied areas of syntax: the nature of projection in morphology and syntax; word order principles; and the place of coordinate structures within phrasestructure principles. The morphosyntax provides two possible mechanisms for marking a grammatical relationship. Firstly, an affix marking the relationship can attach directly to any member of the relationship. This member of the relationship then enters the syntactic derivation, but the affix has no syntactic status in its own right. Alternatively, the relationship can be marked by a syntactic object in its own right – a semantically vacuous projecting functional head (a linker). In this latter case, the relationship is marked by the linker structurally intervening between the members of the relationship: its projection must dominate one member, and cannot dominate the others. When combined with principles of extended projection, this leads to the restriction that, in marking a subordination, or Head-Dependent, relationship, such linkers can only appear as the highest head in the extended projection of the Dependent. This prediction is tested empirically by determining the possible distribution and constituency of linkers predominantly in the complex noun phrase. We next consider how the structural distribution of linkers is mapped onto linear order. It is proposed that there are two types of word order constraints available in natural language: those relating to harmony, which are universal and obey a fixed ranking; and those referring to specific features of a head – either lexical category or features referring to semantics. Given their status as semantically vacuous functional heads, only the first type of word order constraint, relating to harmony, applies to linkers. It is shown using Optimality Theory that this theory successfully accounts for the absence of certain disharmonic word orders cross-linguistically. Finally, we consider the implications of the restrictions on the structural and linear distribution of linkers for linkers marking the coordination relationship (that is, syntactically independent coordinators). It is argued that coordination is a symmetric structure, headed by a potentially infinite number of coordinands. It is shown that any difference in the distribution of coordinating and subordinating linkers should be attributed to the unique syntax of the coordinate structure

    Syntax of Hungarian

    Get PDF
    These books aim to present a synthesis of the currently available syntactic knowledge of the Hungarian language, rooted in theory but providing highly detailed descriptions, and intended to be of use to researchers, as well as advanced students of language and linguistics. As research in language leads to extensive changes in our understanding and representations of grammar, the Comprehensive Grammar Resources series intends to present the most current understanding of grammar and syntax as completely as possible in a way that will both speak to modern linguists and serve as a resource for the non-specialist

    A Study in the Syntax of the Luwian Language

    Get PDF
    he Ancient Anatolian corpora represent the earliest documented examples of the Indo-European languages. In this book, an analysis of the syntactic structure of the Luwian phrases, clauses and sentences is attempted, basing on a phrase-structural approach that entails a mild application of the theoretical framework of generative grammar. While obvious limits exist as regards the use of theory-driven models to the study and description of ancient corpus-languages, this books aims at demonstrating and illustrating the main configurational features of the Luwian syntax

    Cycling Through Grammar: On Compounds, Noun Phrases and Domains

    Get PDF
    In this dissertation, I address the question of domains within grammar: i.e. how domains are defined, whether different components of grammar make references to the same boundaries (or at least boundary definers), and whether these boundaries are uniform with respect to different processes. I address these questions in two case studies. First, I explore compound nouns in Icelandic and restrictions on their composition, where inflected non-head elements are structurally peripheral to uninflected ones. I argue that these effects are due to a matching condition which requires elements within compounds to match their attachment site in terms of size/type. Following that I explore how morphophonology is regulated by the structure of the compound. I argue for a contextual definition of the domain of morphophonology, where the highest functional morpheme in the extended projection of the root marks the boundary. Under this approach a morphophonological domain can contain smaller domains analogous to phases in syntax. This allows for the morphosyntactic structure to be mapped directly to phonology while giving the impression of two contradicting structures. I also explore the Icelandic noun phrase from this perspective. I take the structure of the noun to mirror the structure of the noun phrase and explore the placement of modifiers within the noun phrase and how different orders can be derived. I furthermore explore domains within the noun phrase through ellipsis and extraction. I argue that domains within the noun phrase are determined in the same way as domains within the noun, i.e. contextually, and appear to line up with the noun-internal domain definers

    The structure of DP in Central Kurdish

    Get PDF
    PhD ThesisThis thesis investigates the syntactic structure of DP (determiner phrase) in Central Kurdish within the Minimalist Program (Chomsky 1995, and subsequent work). It explores the syntax of functional categories including inflectional elements such as Izafe, number, definite and indefinite markers. The study examines the structural relation between these functional categories and the noun, shedding new light on the syntax of DP in a language which has not so far been well investigated. Using the Minimalist derivational theory, this thesis explores the derivation of extended nominal projections in Central Kurdish. The study provides a detailed account of the Izafe construction. I argue that Izafe triggers movement of NP to a position above the modifier(s), and also marks agreement in definiteness. Two types of Izafe are recognized: AP Izafe and NP Izafe. While AP Izafe agrees in definiteness with D realized by the definite article, NP Izafe shows Case agreement with a modifying DP complement. I argue that such agreement relations are established by Chomsky’s (2000, 2001) probe/goal agreement operation, except that the agreement relations occur upwards, the probe being c-commanded by the goal (contra Chomsky 2000, 2001, but in line with Baker 2008; Wurmbrand 2012; Zeijlstra 2012). Given that Central Kurdish has two definite articles, -eke and –e, which occur on different sides of number at spell-out, I argue that there are two DP layers projected, with the functional projection of number (NumP) intermediate between them. The higher D is realized by -e and the lower D by –eke. The featural make-up of the lower D bears uniqueness and specificity (the two features subsumed under definiteness), while the higher D carries only specificity. The thesis also argues that the inflection –êk is a marker of indefiniteness realized by the higher D category, and is not merely a grammaticalized diachronic remnant of the numeral yêk ‘one’ to mark singularity, as claimed by Lyons (1999: 95). The analysis also accounts for the syntax of number morphology and quantification in Central Kurdish. As a functional category, number is argued to project NumP realized by the inflection -an. Based on the morpheme order, NumP seems to take scope over DP, a phenomenon challenging the well-established cross-linguistic generalization that D scopes over NumP (Rijkhoff 2002; Ritter 1991). Assuming that scopal relations among functional categories are structurally represented, the peculiar hierarchical relation between DP and NumP in Central Kurdish poses a problem for Baker’s (1985, 1988) Mirror Principle, as well. However, I provide evidence that the projection of number (NumP) falls under the scope of another DP projection headed by a D which is morphologically realized in some situations by the definite marker –e. iv Two types of quantifiers are distinguished: definite and indefinite. This division offers a principled account of quantifiers, providing empirical evidence that they are realized by two structurally distinct functional categories: one above and the other below the DP projection.The Higher Committee for Education Development in Iraq (HCED
    • …
    corecore