15,319 research outputs found
Predicting Native Language from Gaze
A fundamental question in language learning concerns the role of a speaker's
first language in second language acquisition. We present a novel methodology
for studying this question: analysis of eye-movement patterns in second
language reading of free-form text. Using this methodology, we demonstrate for
the first time that the native language of English learners can be predicted
from their gaze fixations when reading English. We provide analysis of
classifier uncertainty and learned features, which indicates that differences
in English reading are likely to be rooted in linguistic divergences across
native languages. The presented framework complements production studies and
offers new ground for advancing research on multilingualism.Comment: ACL 201
Approaches to teaching writing
About the book: Student academic writing is at the heart of teaching and learning in higher education. Students are assessed largely by what they write, and need to learn both general academic conventions as well as disciplinary writing requirements in order to be successful in higher education.
Teaching Academic Writing is a 'toolkit' designed to help higher education lecturers and tutors teach writing to their students. Containing a range of diverse teaching strategies, the book offers both practical activities to help students develop their writing abilities and guidelines to help lecturers and tutors think in more depth about the assessment tasks they set and the feedback they give to students. The authors explore a wide variety of text types, from essays and reflective diaries to research projects and laboratory reports. The book draws on recent research in the fields of academic literacy, second language learning, and linguistics. It is grounded in recent developments such as the increasing diversity of the student body, the use of the Internet, electronic tuition, and issues related to distance learning in an era of increasing globalisation.
Written by experienced teachers of writing, language, and linguistics, Teaching Academic Writing will be of interest to anyone involved in teaching academic writing in higher education
Two-layer classification and distinguished representations of users and documents for grouping and authorship identification
Most studies on authorship identification reported a drop in the identification result when the number of authors exceeds 20-25. In this paper, we introduce a new user representation to address this problem and split classification across two layers. There are at least 3 novelties in this paper. First, the two-layer approach allows applying authorship identification over larger number of authors (tested over 100 authors), and it is extendable. The authors are divided into groups that contain smaller number of authors. Given an anonymous document, the primary layer detects the group to which the document belongs. Then, the secondary layer determines the particular author inside the selected group. In order to extract the groups linking similar authors, clustering is applied over users rather than documents. Hence, the second novelty of this paper is introducing a new user representation that is different from document representation. Without the proposed user representation, the clustering over documents will result in documents of author(s) distributed over several clusters, instead of a single cluster membership for each author. Third, the extracted clusters are descriptive and meaningful of their users as the dimensions have psychological backgrounds. For authorship identification, the documents are labelled with the extracted groups and fed into machine learning to build classification models that predicts the group and author of a given document. The results show that the documents are highly correlated with the extracted corresponding groups, and the proposed model can be accurately trained to determine the group and the author identity
Native Language Identification with Big Bird Embeddings
Native Language Identification (NLI) intends to classify an author's native
language based on their writing in another language. Historically, the task has
heavily relied on time-consuming linguistic feature engineering, and
transformer-based NLI models have thus far failed to offer effective, practical
alternatives. The current work investigates if input size is a limiting factor,
and shows that classifiers trained using Big Bird embeddings outperform
linguistic feature engineering models by a large margin on the Reddit-L2
dataset. Additionally, we provide further insight into input length
dependencies, show consistent out-of-sample performance, and qualitatively
analyze the embedding space. Given the effectiveness and computational
efficiency of this method, we believe it offers a promising avenue for future
NLI work
From Frequency to Meaning: Vector Space Models of Semantics
Computers understand very little of the meaning of human language. This
profoundly limits our ability to give instructions to computers, the ability of
computers to explain their actions to us, and the ability of computers to
analyse and process text. Vector space models (VSMs) of semantics are beginning
to address these limits. This paper surveys the use of VSMs for semantic
processing of text. We organize the literature on VSMs according to the
structure of the matrix in a VSM. There are currently three broad classes of
VSMs, based on term-document, word-context, and pair-pattern matrices, yielding
three classes of applications. We survey a broad range of applications in these
three categories and we take a detailed look at a specific open source project
in each category. Our goal in this survey is to show the breadth of
applications of VSMs for semantics, to provide a new perspective on VSMs for
those who are already familiar with the area, and to provide pointers into the
literature for those who are less familiar with the field
多重分解能のポステリオグラムを用いた日本人英語を対 象とした流暢性推定と韻律誤り分析
学位の種別: 修士University of Tokyo(東京大学
Measuring, Characterizing, and Detecting Facebook Like Farms
Social networks offer convenient ways to seamlessly reach out to large
audiences. In particular, Facebook pages are increasingly used by businesses,
brands, and organizations to connect with multitudes of users worldwide. As the
number of likes of a page has become a de-facto measure of its popularity and
profitability, an underground market of services artificially inflating page
likes, aka like farms, has emerged alongside Facebook's official targeted
advertising platform. Nonetheless, there is little work that systematically
analyzes Facebook pages' promotion methods. Aiming to fill this gap, we present
a honeypot-based comparative measurement study of page likes garnered via
Facebook advertising and from popular like farms. First, we analyze likes based
on demographic, temporal, and social characteristics, and find that some farms
seem to be operated by bots and do not really try to hide the nature of their
operations, while others follow a stealthier approach, mimicking regular users'
behavior. Next, we look at fraud detection algorithms currently deployed by
Facebook and show that they do not work well to detect stealthy farms which
spread likes over longer timespans and like popular pages to mimic regular
users. To overcome their limitations, we investigate the feasibility of
timeline-based detection of like farm accounts, focusing on characterizing
content generated by Facebook accounts on their timelines as an indicator of
genuine versus fake social activity. We analyze a range of features, grouped
into two main categories: lexical and non-lexical. We find that like farm
accounts tend to re-share content, use fewer words and poorer vocabulary, and
more often generate duplicate comments and likes compared to normal users.
Using relevant lexical and non-lexical features, we build a classifier to
detect like farms accounts that achieves precision higher than 99% and 93%
recall.Comment: To appear in ACM Transactions on Privacy and Security (TOPS
- …