47,820 research outputs found

    Predicting Native Language from Gaze

    Get PDF
    A fundamental question in language learning concerns the role of a speaker's first language in second language acquisition. We present a novel methodology for studying this question: analysis of eye-movement patterns in second language reading of free-form text. Using this methodology, we demonstrate for the first time that the native language of English learners can be predicted from their gaze fixations when reading English. We provide analysis of classifier uncertainty and learned features, which indicates that differences in English reading are likely to be rooted in linguistic divergences across native languages. The presented framework complements production studies and offers new ground for advancing research on multilingualism.Comment: ACL 201

    Simple data-driven context-sensitive lemmatization

    Get PDF
    Lemmatization for languages with rich inflectional morphology is one of the basic, indispensable steps in a language processing pipeline. In this paper we present a simple data-driven context-sensitive approach to lemmatizating word forms in running text. We treat lemmatization as a classification task for Machine Learning, and automatically induce class labels. We achieve this by computing a Shortest Edit Script (SES) between reversed input and output strings. A SES describes the transformations that have to be applied to the input string (word form) in order to convert it to the output string (lemma). Our approach shows competitive performance on a range of typologically different languages

    GeoCLEF 2006: the CLEF 2006 Ccross-language geographic information retrieval track overview

    Get PDF
    After being a pilot track in 2005, GeoCLEF advanced to be a regular track within CLEF 2006. The purpose of GeoCLEF is to test and evaluate cross-language geographic information retrieval (GIR): retrieval for topics with a geographic specification. For GeoCLEF 2006, twenty-five search topics were defined by the organizing groups for searching English, German, Portuguese and Spanish document collections. Topics were translated into English, German, Portuguese, Spanish and Japanese. Several topics in 2006 were significantly more geographically challenging than in 2005. Seventeen groups submitted 149 runs (up from eleven groups and 117 runs in GeoCLEF 2005). The groups used a variety of approaches, including geographic bounding boxes, named entity extraction and external knowledge bases (geographic thesauri and ontologies and gazetteers)

    Cross-lingual RST Discourse Parsing

    Get PDF
    Discourse parsing is an integral part of understanding information flow and argumentative structure in documents. Most previous research has focused on inducing and evaluating models from the English RST Discourse Treebank. However, discourse treebanks for other languages exist, including Spanish, German, Basque, Dutch and Brazilian Portuguese. The treebanks share the same underlying linguistic theory, but differ slightly in the way documents are annotated. In this paper, we present (a) a new discourse parser which is simpler, yet competitive (significantly better on 2/3 metrics) to state of the art for English, (b) a harmonization of discourse treebanks across languages, enabling us to present (c) what to the best of our knowledge are the first experiments on cross-lingual discourse parsing.Comment: To be published in EACL 2017, 13 page

    Request formation in Ecuadorian Quichua

    Get PDF
    published or submitted for publicationis peer reviewe

    Two types of well followed users in the followership networks of Twitter

    Get PDF
    In the Twitter blogosphere, the number of followers is probably the most basic and succinct quantity for measuring popularity of users. However, the number of followers can be manipulated in various ways; we can even buy follows. Therefore, alternative popularity measures for Twitter users on the basis of, for example, users' tweets and retweets, have been developed. In the present work, we take a purely network approach to this fundamental question. First, we find that two relatively distinct types of users possessing a large number of followers exist, in particular for Japanese, Russian, and Korean users among the seven language groups that we examined. A first type of user follows a small number of other users. A second type of user follows approximately the same number of other users as the number of follows that the user receives. Then, we compare local (i.e., egocentric) followership networks around the two types of users with many followers. We show that the second type, which is presumably uninfluential users despite its large number of followers, is characterized by high link reciprocity, a large number of friends (i.e., those whom a user follows) for the followers, followers' high link reciprocity, large clustering coefficient, large fraction of the second type of users among the followers, and a small PageRank. Our network-based results support that the number of followers used alone is a misleading measure of user's popularity. We propose that the number of friends, which is simple to measure, also helps us to assess the popularity of Twitter users.Comment: 4 Figures and 8 Table
    • …
    corecore