11,972 research outputs found

    Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing

    Get PDF
    Linguistic typology aims to capture structural and semantic variation across the world's languages. A large-scale typology could provide excellent guidance for multilingual Natural Language Processing (NLP), particularly for languages that suffer from the lack of human labeled resources. We present an extensive literature survey on the use of typological information in the development of NLP techniques. Our survey demonstrates that to date, the use of information in existing typological databases has resulted in consistent but modest improvements in system performance. We show that this is due to both intrinsic limitations of databases (in terms of coverage and feature granularity) and under-employment of the typological features included in them. We advocate for a new approach that adapts the broad and discrete nature of typological categories to the contextual and continuous nature of machine learning algorithms used in contemporary NLP. In particular, we suggest that such approach could be facilitated by recent developments in data-driven induction of typological knowledge

    Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing

    Get PDF
    Linguistic typology aims to capture structural and semantic variation across the world’s languages. A large-scale typology could provide excellent guidance for multilingual Natural Language Processing (NLP), particularly for languages that suffer from the lack of human labeled resources. We present an extensive literature survey on the use of typological information in the development of NLP techniques. Our survey demonstrates that to date, the use of information in existing typological databases has resulted in consistent but modest improvements in system performance. We show that this is due to both intrinsic limitations of databases (in terms of coverage and feature granularity) and under-utilization of the typological features included in them. We advocate for a new approach that adapts the broad and discrete nature of typological categories to the contextual and continuous nature of machine learning algorithms used in contemporary NLP. In particular, we suggest that such an approach could be facilitated by recent developments in data-driven induction of typological knowledge.</jats:p

    Learning Language Representations for Typology Prediction

    Full text link
    One central mystery of neural NLP is what neural models "know" about their subject matter. When a neural machine translation system learns to translate from one language to another, does it learn the syntax or semantics of the languages? Can this knowledge be extracted from the system to fill holes in human scientific knowledge? Existing typological databases contain relatively full feature specifications for only a few hundred languages. Exploiting the existence of parallel texts in more than a thousand languages, we build a massive many-to-one neural machine translation (NMT) system from 1017 languages into English, and use this to predict information missing from typological databases. Experiments show that the proposed method is able to infer not only syntactic, but also phonological and phonetic inventory features, and improves over a baseline that has access to information about the languages' geographic and phylogenetic neighbors.Comment: EMNLP 201

    Introduction

    Get PDF

    Problems of methodology and explanation in word order universals research

    Get PDF
    Ever since the publication of Greenberg 1963, word order typologists have attempted to formulate and refine implicational universals of word order so as to characterize the restricted distribution of certain word order patterns, and in some cases have also attempted to develop general principles to explain the existence of those universals

    Lexical typology through similarity semantics: Toward a semantic map of motion verbs

    Get PDF
    This paper discusses a multidimensional probabilistic semantic map of lexical motion verb stems based on data collected from parallel texts (viz. translations of the Gospel according to Mark) for 100 languages from all continents. The crosslinguistic diversity of lexical semantics in motion verbs is illustrated in detail for the domain of `go', `come', and `arrive' type contexts. It is argued that the theoretical bases underlying probabilistic semantic maps from exemplar data are the isomorphism hypothesis (given any two meanings and their corresponding forms in any particular language, more similar meanings are more likely to be expressed by the same form in any language), similarity semantics (similarity is more basic than identity), and exemplar semantics (exemplar meaning is more fundamental than abstract concepts)
    • …
    corecore