2,347 research outputs found
Automated assessment of non-native learner essays: Investigating the role of linguistic features
Automatic essay scoring (AES) refers to the process of scoring free text
responses to given prompts, considering human grader scores as the gold
standard. Writing such essays is an essential component of many language and
aptitude exams. Hence, AES became an active and established area of research,
and there are many proprietary systems used in real life applications today.
However, not much is known about which specific linguistic features are useful
for prediction and how much of this is consistent across datasets. This article
addresses that by exploring the role of various linguistic features in
automatic essay scoring using two publicly available datasets of non-native
English essays written in test taking scenarios. The linguistic properties are
modeled by encoding lexical, syntactic, discourse and error types of learner
language in the feature set. Predictive models are then developed using these
features on both datasets and the most predictive features are compared. While
the results show that the feature set used results in good predictive models
with both datasets, the question "what are the most predictive features?" has a
different answer for each dataset.Comment: Article accepted for publication at: International Journal of
Artificial Intelligence in Education (IJAIED). To appear in early 2017
(journal url: http://www.springer.com/computer/ai/journal/40593
Constrained multi-task learning for automated essay scoring
Supervised machine learning models for
automated essay scoring (AES) usually require
substantial task-specific training data
in order to make accurate predictions for
a particular writing task. This limitation
hinders their utility, and consequently
their deployment in real-world settings. In
this paper, we overcome this shortcoming
using a constrained multi-task pairwisepreference
learning approach that enables
the data from multiple tasks to be combined
effectively.
Furthermore, contrary to some recent research,
we show that high performance
AES systems can be built with little or no
task-specific training data. We perform a
detailed study of our approach on a publicly
available dataset in scenarios where
we have varying amounts of task-specific
training data and in scenarios where the
number of tasks increases.This is the author accepted manuscript. The final version is available from Association for Computational Linguistics at http://acl2016.org/index.php?article_id=71
Experiments with Universal CEFR Classification
The Common European Framework of Reference (CEFR) guidelines describe
language proficiency of learners on a scale of 6 levels. While the description
of CEFR guidelines is generic across languages, the development of automated
proficiency classification systems for different languages follow different
approaches. In this paper, we explore universal CEFR classification using
domain-specific and domain-agnostic, theory-guided as well as data-driven
features. We report the results of our preliminary experiments in monolingual,
cross-lingual, and multilingual classification with three languages: German,
Czech, and Italian. Our results show that both monolingual and multilingual
models achieve similar performance, and cross-lingual classification yields
lower, but comparable results to monolingual classification.Comment: to appear in the proceedings of The 13th Workshop on Innovative Use
of NLP for Building Educational Application
Recommended from our members
Modelling text meta-properties in automated text scoring for non-native English writing
Automated text scoring (ATS) is the task of automatically scoring a text based on some given grading criteria. This thesis focuses on ATS in the context of free-text writing exams aimed at learners of English as a foreign language (EFL). The benefit of an ATS system is primarily to provide instant and consistent feedback to language learners, and service reliability also forms a crucial part of an ATS system. Based on previous work, we investigated only partially explored meta-properties in text and integrated them into a machine learning based ATS model across multiple datasets:
In most previous work, the proposed models implicitly assume that texts produced by learners in an exam are written independently. However, this is not true for the exams where learners are required to compose multiple texts. We hence explicitly instructed our model which texts are written by the same learner, which boosts model performance in most cases.
We used three intra-exam properties within the same exam including prompt, genre and task as a starting point, and we showed that explicitly modelling these properties via frustratingly easy domain adaptation (FEDA) can positively affect model performance in some cases. Furthermore, modelling multiple intra-exam properties together is better than modelling any single property individually or no property in four out of five test sets.
We studied how to utilise and combine learners' responses from multiple writing exams. We also proposed a new variant of the transfer-learning ATS model which mitigates the drawbacks of previous work. This variant first builds a ranking model across multiple datasets via FEDA, and the ranking score of each text predicted by the ranking model is used as an extra feature in the baseline model. This variants gives improvement compared to a baseline model on the development sets in terms of root-mean-square error. Furthermore, the transfer-learning model utilising multiple datasets tuned on each development set is always better than the baseline model on the corresponding test set.
We found that different datasets favour different meta properties. We therefore combined all the models looking at different meta properties together using ensemble learning. Compared to the baseline model, the combined model has a statistically significant improvement on all the test sets in terms of root-mean-square error based on a permutation test.The Institute for Automated Language Teaching and Assessmen
- …