63,779 research outputs found
Automated assessment of non-native learner essays: Investigating the role of linguistic features
Automatic essay scoring (AES) refers to the process of scoring free text
responses to given prompts, considering human grader scores as the gold
standard. Writing such essays is an essential component of many language and
aptitude exams. Hence, AES became an active and established area of research,
and there are many proprietary systems used in real life applications today.
However, not much is known about which specific linguistic features are useful
for prediction and how much of this is consistent across datasets. This article
addresses that by exploring the role of various linguistic features in
automatic essay scoring using two publicly available datasets of non-native
English essays written in test taking scenarios. The linguistic properties are
modeled by encoding lexical, syntactic, discourse and error types of learner
language in the feature set. Predictive models are then developed using these
features on both datasets and the most predictive features are compared. While
the results show that the feature set used results in good predictive models
with both datasets, the question "what are the most predictive features?" has a
different answer for each dataset.Comment: Article accepted for publication at: International Journal of
Artificial Intelligence in Education (IJAIED). To appear in early 2017
(journal url: http://www.springer.com/computer/ai/journal/40593
Modeling Empathy and Distress in Reaction to News Stories
Computational detection and understanding of empathy is an important factor
in advancing human-computer interaction. Yet to date, text-based empathy
prediction has the following major limitations: It underestimates the
psychological complexity of the phenomenon, adheres to a weak notion of ground
truth where empathic states are ascribed by third parties, and lacks a shared
corpus. In contrast, this contribution presents the first publicly available
gold standard for empathy prediction. It is constructed using a novel
annotation methodology which reliably captures empathy assessments by the
writer of a statement using multi-item scales. This is also the first
computational work distinguishing between multiple forms of empathy, empathic
concern, and personal distress, as recognized throughout psychology. Finally,
we present experimental results for three different predictive models, of which
a CNN performs the best.Comment: To appear at EMNLP 201
A Neurobiologically Motivated Analysis of Distributional Semantic Models
The pervasive use of distributional semantic models or word embeddings in a
variety of research fields is due to their remarkable ability to represent the
meanings of words for both practical application and cognitive modeling.
However, little has been known about what kind of information is encoded in
text-based word vectors. This lack of understanding is particularly problematic
when word vectors are regarded as a model of semantic representation for
abstract concepts. This paper attempts to reveal the internal information of
distributional word vectors by the analysis using Binder et al.'s (2016)
brain-based vectors, explicitly structured conceptual representations based on
neurobiologically motivated attributes. In the analysis, the mapping from
text-based vectors to brain-based vectors is trained and prediction performance
is evaluated by comparing the estimated and original brain-based vectors. The
analysis demonstrates that social and cognitive information is better encoded
in text-based word vectors, but emotional information is not. This result is
discussed in terms of embodied theories for abstract concepts.Comment: submitted to CogSci 201
Language Modeling by Clustering with Word Embeddings for Text Readability Assessment
We present a clustering-based language model using word embeddings for text
readability prediction. Presumably, an Euclidean semantic space hypothesis
holds true for word embeddings whose training is done by observing word
co-occurrences. We argue that clustering with word embeddings in the metric
space should yield feature representations in a higher semantic space
appropriate for text regression. Also, by representing features in terms of
histograms, our approach can naturally address documents of varying lengths. An
empirical evaluation using the Common Core Standards corpus reveals that the
features formed on our clustering-based language model significantly improve
the previously known results for the same corpus in readability prediction. We
also evaluate the task of sentence matching based on semantic relatedness using
the Wiki-SimpleWiki corpus and find that our features lead to superior matching
performance
- …