Search CORE

63,779 research outputs found

Automated assessment of non-native learner essays: Investigating the role of linguistic features

Author: Vajjala Sowmya
Vajjala Sowmya
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/12/2016
Field of study

Automatic essay scoring (AES) refers to the process of scoring free text responses to given prompts, considering human grader scores as the gold standard. Writing such essays is an essential component of many language and aptitude exams. Hence, AES became an active and established area of research, and there are many proprietary systems used in real life applications today. However, not much is known about which specific linguistic features are useful for prediction and how much of this is consistent across datasets. This article addresses that by exploring the role of various linguistic features in automatic essay scoring using two publicly available datasets of non-native English essays written in test taking scenarios. The linguistic properties are modeled by encoding lexical, syntactic, discourse and error types of learner language in the feature set. Predictive models are then developed using these features on both datasets and the most predictive features are compared. While the results show that the feature set used results in good predictive models with both datasets, the question "what are the most predictive features?" has a different answer for each dataset.Comment: Article accepted for publication at: International Journal of Artificial Intelligence in Education (IJAIED). To appear in early 2017 (journal url: http://www.springer.com/computer/ai/journal/40593

arXiv.org e-Print Archive

Digital Repository @ Iowa State University (ISU)

Modeling Empathy and Distress in Reaction to News Stories

Author: Buechel Sven
Buffone Anneke
Sedoc João
Slaff Barry
Ungar Lyle
Publication venue
Publication date: 01/01/2018
Field of study

Computational detection and understanding of empathy is an important factor in advancing human-computer interaction. Yet to date, text-based empathy prediction has the following major limitations: It underestimates the psychological complexity of the phenomenon, adheres to a weak notion of ground truth where empathic states are ascribed by third parties, and lacks a shared corpus. In contrast, this contribution presents the first publicly available gold standard for empathy prediction. It is constructed using a novel annotation methodology which reliably captures empathy assessments by the writer of a statement using multi-item scales. This is also the first computational work distinguishing between multiple forms of empathy, empathic concern, and personal distress, as recognized throughout psychology. Finally, we present experimental results for three different predictive models, of which a CNN performs the best.Comment: To appear at EMNLP 201

arXiv.org e-Print Archive

Crossref

A Neurobiologically Motivated Analysis of Distributional Semantic Models

Author: Utsumi Akira
Publication venue
Publication date: 01/01/2018
Field of study

The pervasive use of distributional semantic models or word embeddings in a variety of research fields is due to their remarkable ability to represent the meanings of words for both practical application and cognitive modeling. However, little has been known about what kind of information is encoded in text-based word vectors. This lack of understanding is particularly problematic when word vectors are regarded as a model of semantic representation for abstract concepts. This paper attempts to reveal the internal information of distributional word vectors by the analysis using Binder et al.'s (2016) brain-based vectors, explicitly structured conceptual representations based on neurobiologically motivated attributes. In the analysis, the mapping from text-based vectors to brain-based vectors is trained and prediction performance is evaluated by comparing the estimated and original brain-based vectors. The analysis demonstrates that social and cognitive information is better encoded in text-based word vectors, but emotional information is not. This result is discussed in terms of embodied theories for abstract concepts.Comment: submitted to CogSci 201

arXiv.org e-Print Archive

eScholarship - University of California

Language Modeling by Clustering with Word Embeddings for Text Readability Assessment

Author: Chall J.S.
Flor Michael
Le Quoc V
Stenner A.J.
Publication venue
Publication date: 04/09/2017
Field of study

We present a clustering-based language model using word embeddings for text readability prediction. Presumably, an Euclidean semantic space hypothesis holds true for word embeddings whose training is done by observing word co-occurrences. We argue that clustering with word embeddings in the metric space should yield feature representations in a higher semantic space appropriate for text regression. Also, by representing features in terms of histograms, our approach can naturally address documents of varying lengths. An empirical evaluation using the Common Core Standards corpus reveals that the features formed on our clustering-based language model significantly improve the previously known results for the same corpus in readability prediction. We also evaluate the task of sentence matching based on semantic relatedness using the Wiki-SimpleWiki corpus and find that our features lead to superior matching performance

arXiv.org e-Print Archive

Crossref