47,820 research outputs found
Predicting Native Language from Gaze
A fundamental question in language learning concerns the role of a speaker's
first language in second language acquisition. We present a novel methodology
for studying this question: analysis of eye-movement patterns in second
language reading of free-form text. Using this methodology, we demonstrate for
the first time that the native language of English learners can be predicted
from their gaze fixations when reading English. We provide analysis of
classifier uncertainty and learned features, which indicates that differences
in English reading are likely to be rooted in linguistic divergences across
native languages. The presented framework complements production studies and
offers new ground for advancing research on multilingualism.Comment: ACL 201
Simple data-driven context-sensitive lemmatization
Lemmatization for languages with rich inflectional morphology is one of the basic, indispensable steps in a language processing pipeline. In this paper we present a simple data-driven context-sensitive approach to lemmatizating word forms in running text. We treat lemmatization as a classification task for Machine Learning, and automatically induce class labels. We achieve this by computing a Shortest Edit Script (SES) between reversed input and output strings. A SES describes the transformations that have to be applied to the
input string (word form) in order to convert it to the output string (lemma). Our approach shows competitive performance on a range of typologically different languages
GeoCLEF 2006: the CLEF 2006 Ccross-language geographic information retrieval track overview
After being a pilot track in 2005, GeoCLEF advanced to be a regular track within CLEF 2006. The
purpose of GeoCLEF is to test and evaluate cross-language geographic information retrieval (GIR): retrieval for
topics with a geographic specification. For GeoCLEF 2006, twenty-five search topics were defined by the
organizing groups for searching English, German, Portuguese and Spanish document collections. Topics were
translated into English, German, Portuguese, Spanish and Japanese. Several topics in 2006 were significantly
more geographically challenging than in 2005. Seventeen groups submitted 149 runs (up from eleven groups and
117 runs in GeoCLEF 2005). The groups used a variety of approaches, including geographic bounding boxes,
named entity extraction and external knowledge bases (geographic thesauri and ontologies and gazetteers)
Cross-lingual RST Discourse Parsing
Discourse parsing is an integral part of understanding information flow and
argumentative structure in documents. Most previous research has focused on
inducing and evaluating models from the English RST Discourse Treebank.
However, discourse treebanks for other languages exist, including Spanish,
German, Basque, Dutch and Brazilian Portuguese. The treebanks share the same
underlying linguistic theory, but differ slightly in the way documents are
annotated. In this paper, we present (a) a new discourse parser which is
simpler, yet competitive (significantly better on 2/3 metrics) to state of the
art for English, (b) a harmonization of discourse treebanks across languages,
enabling us to present (c) what to the best of our knowledge are the first
experiments on cross-lingual discourse parsing.Comment: To be published in EACL 2017, 13 page
Request formation in Ecuadorian Quichua
published or submitted for publicationis peer reviewe
Two types of well followed users in the followership networks of Twitter
In the Twitter blogosphere, the number of followers is probably the most
basic and succinct quantity for measuring popularity of users. However, the
number of followers can be manipulated in various ways; we can even buy
follows. Therefore, alternative popularity measures for Twitter users on the
basis of, for example, users' tweets and retweets, have been developed. In the
present work, we take a purely network approach to this fundamental question.
First, we find that two relatively distinct types of users possessing a large
number of followers exist, in particular for Japanese, Russian, and Korean
users among the seven language groups that we examined. A first type of user
follows a small number of other users. A second type of user follows
approximately the same number of other users as the number of follows that the
user receives. Then, we compare local (i.e., egocentric) followership networks
around the two types of users with many followers. We show that the second
type, which is presumably uninfluential users despite its large number of
followers, is characterized by high link reciprocity, a large number of friends
(i.e., those whom a user follows) for the followers, followers' high link
reciprocity, large clustering coefficient, large fraction of the second type of
users among the followers, and a small PageRank. Our network-based results
support that the number of followers used alone is a misleading measure of
user's popularity. We propose that the number of friends, which is simple to
measure, also helps us to assess the popularity of Twitter users.Comment: 4 Figures and 8 Table
- âŚ