824 research outputs found

    Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing

    Get PDF
    Linguistic typology aims to capture structural and semantic variation across the world's languages. A large-scale typology could provide excellent guidance for multilingual Natural Language Processing (NLP), particularly for languages that suffer from the lack of human labeled resources. We present an extensive literature survey on the use of typological information in the development of NLP techniques. Our survey demonstrates that to date, the use of information in existing typological databases has resulted in consistent but modest improvements in system performance. We show that this is due to both intrinsic limitations of databases (in terms of coverage and feature granularity) and under-employment of the typological features included in them. We advocate for a new approach that adapts the broad and discrete nature of typological categories to the contextual and continuous nature of machine learning algorithms used in contemporary NLP. In particular, we suggest that such approach could be facilitated by recent developments in data-driven induction of typological knowledge

    Max-Planck-Institute for Psycholinguistics: Annual Report 2003

    Get PDF

    Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing

    Get PDF
    Linguistic typology aims to capture structural and semantic variation across the world’s languages. A large-scale typology could provide excellent guidance for multilingual Natural Language Processing (NLP), particularly for languages that suffer from the lack of human labeled resources. We present an extensive literature survey on the use of typological information in the development of NLP techniques. Our survey demonstrates that to date, the use of information in existing typological databases has resulted in consistent but modest improvements in system performance. We show that this is due to both intrinsic limitations of databases (in terms of coverage and feature granularity) and under-utilization of the typological features included in them. We advocate for a new approach that adapts the broad and discrete nature of typological categories to the contextual and continuous nature of machine learning algorithms used in contemporary NLP. In particular, we suggest that such an approach could be facilitated by recent developments in data-driven induction of typological knowledge.</jats:p

    Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020). This edition of the conference is held in Bologna and organised by the University of Bologna. The CLiC-it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after six years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

    The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE)

    Get PDF

    FOUND IN SPACE: A CROSS-LINGUISTIC ANALYSIS OF SECOND LANGUAGE LEARNERS IN ENGLISH MAP TASK PERFORMANCE

    Get PDF
    Understanding the relationship between first and second language use in the area of spatial language has broader implications for our understanding of language learning and consequences for the construction of bilingual assessment instruments for second language learners. This study shows that observing and interpreting the task of map drawing and the related behavior of explaining maps can be a way to explore the linguistic emergence of the conceptualization of spatial language (at a moment of simultaneous and synchronized incarnation). Altogether, 50 dyads (pairs) participated in the New Mexico Map Task Project; the project included native speakers of English, Russian, Japanese, Navajo, and Spanish. In an examination of how the grammatical constructions used for spatial descriptions in a speaker\u27s first language carry over into the usage of this speaker\u27s second language, new observations include the intra-subject comparison of dyadic map task performances. Each non-native English-speaking dyad participates in two map task performances: one in their native language and one in their second language, English. Evidence was generated through morphosyntactic, phonological, and pragmatic analyses performed on the sound files of the transcripts. This evidence confirms the connection between the participants\u27 productions of tokens of selected landmark names both in their native language and their second language. Combining the results of linguistic analyses with educational assessment frameworks predicts the development of an instrument for use with immigrant and refugee students from areas of conflict
    corecore