28 research outputs found
Automatic text scoring using neural networks
Automated Text Scoring (ATS) provides
a cost-effective and consistent alternative
to human marking. However, in order
to achieve good performance, the predictive
features of the system need to
be manually engineered by human experts.
We introduce a model that forms
word representations by learning the extent
to which specific words contribute to
the text’s score. Using Long-Short Term
Memory networks to represent the meaning
of texts, we demonstrate that a fully
automated framework is able to achieve
excellent results over similar approaches.
In an attempt to make our results more
interpretable, and inspired by recent advances
in visualizing neural networks, we
introduce a novel method for identifying
the regions of the text that the model has
found more discriminative.This is the accepted manuscript. It is currently embargoed pending publication
Exploring Automated Essay Scoring for Nonnative English Speakers
Automated Essay Scoring (AES) has been quite popular and is being widely
used. However, lack of appropriate methodology for rating nonnative English
speakers' essays has meant a lopsided advancement in this field. In this paper,
we report initial results of our experiments with nonnative AES that learns
from manual evaluation of nonnative essays. For this purpose, we conducted an
exercise in which essays written by nonnative English speakers in test
environment were rated both manually and by the automated system designed for
the experiment. In the process, we experimented with a few features to learn
about nuances linked to nonnative evaluation. The proposed methodology of
automated essay evaluation has yielded a correlation coefficient of 0.750 with
the manual evaluation.Comment: Accepted for publication at EUROPHRAS 201
Experiments with Universal CEFR Classification
The Common European Framework of Reference (CEFR) guidelines describe
language proficiency of learners on a scale of 6 levels. While the description
of CEFR guidelines is generic across languages, the development of automated
proficiency classification systems for different languages follow different
approaches. In this paper, we explore universal CEFR classification using
domain-specific and domain-agnostic, theory-guided as well as data-driven
features. We report the results of our preliminary experiments in monolingual,
cross-lingual, and multilingual classification with three languages: German,
Czech, and Italian. Our results show that both monolingual and multilingual
models achieve similar performance, and cross-lingual classification yields
lower, but comparable results to monolingual classification.Comment: to appear in the proceedings of The 13th Workshop on Innovative Use
of NLP for Building Educational Application
MANHATTAN DISTANCE AND DICE SIMILARITY EVALUATION ON INDONESIAN ESSAY EXAMINATION SYSTEM
Each learning process requires an evaluation tool to measure the level of understanding of students. The type of evaluation can be multiple choice questions, short entries and essays. Some studies reveal essay exams better than other types of evaluations. An essay assessment is automatically needed to save teacher time in correcting answers. However, the development of essay assessments is still ongoing. The aim is to obtain a better accuracy value than the method used in the assessment. Based on these problems, this study proposes a comparative analysis of similarity methods for online essay exam assessment. The similarity method compared is Similarity Dice and Manhattan Distance. Both methods produce coefficient values which are then compared to the assessment of the system with manual scales with the same scale. The data used were 2162 data. This data was obtained from 50 students who answered 40 questions (politics, sports, lifestyle and technology). The data obtained in this study can be used to support other research that can be accessed at www.indonesian-ir.org. This research shows that the Dice similarity scheme is more accurate than Manhattan Distanc
Automated Feedback Generation for a Chemistry Database and Abstracting Exercise
Timely feedback is an important part of teaching and learning. Here we
describe how a readily available neural network transformer (machine-learning)
model (BERT) can be used to give feedback on the structure of the response to
an abstracting exercise where students are asked to summarise the contents of a
published article after finding it from a publication database. The dataset
contained 207 submissions from two consecutive years of the course, summarising
a total of 21 different papers from the primary literature. The model was
pre-trained using an available dataset (approx. 15,000 samples) and then
fine-tuned on 80% of the submitted dataset. This fine tuning was seen to be
important. The sentences in the student submissions are characterised into
three classes - background, technique and observation - which allows a
comparison of how each submission is structured. Comparing the structure of the
students' abstract a large collection of those from the PubMed database shows
that students in this exercise concentrate more on the background to the paper
and less on the techniques and results than the abstracts to papers themselves.
The results allowed feedback for each submitted assignment to be automatically
generated.Comment: 9 pages, 1 figure, 3 table