53 research outputs found
Automated assessment of non-native learner essays: Investigating the role of linguistic features
Automatic essay scoring (AES) refers to the process of scoring free text
responses to given prompts, considering human grader scores as the gold
standard. Writing such essays is an essential component of many language and
aptitude exams. Hence, AES became an active and established area of research,
and there are many proprietary systems used in real life applications today.
However, not much is known about which specific linguistic features are useful
for prediction and how much of this is consistent across datasets. This article
addresses that by exploring the role of various linguistic features in
automatic essay scoring using two publicly available datasets of non-native
English essays written in test taking scenarios. The linguistic properties are
modeled by encoding lexical, syntactic, discourse and error types of learner
language in the feature set. Predictive models are then developed using these
features on both datasets and the most predictive features are compared. While
the results show that the feature set used results in good predictive models
with both datasets, the question "what are the most predictive features?" has a
different answer for each dataset.Comment: Article accepted for publication at: International Journal of
Artificial Intelligence in Education (IJAIED). To appear in early 2017
(journal url: http://www.springer.com/computer/ai/journal/40593
Experiments with Universal CEFR Classification
The Common European Framework of Reference (CEFR) guidelines describe
language proficiency of learners on a scale of 6 levels. While the description
of CEFR guidelines is generic across languages, the development of automated
proficiency classification systems for different languages follow different
approaches. In this paper, we explore universal CEFR classification using
domain-specific and domain-agnostic, theory-guided as well as data-driven
features. We report the results of our preliminary experiments in monolingual,
cross-lingual, and multilingual classification with three languages: German,
Czech, and Italian. Our results show that both monolingual and multilingual
models achieve similar performance, and cross-lingual classification yields
lower, but comparable results to monolingual classification.Comment: to appear in the proceedings of The 13th Workshop on Innovative Use
of NLP for Building Educational Application
A Dependency Treebank for Telugu
In this paper, we describe the annotation and development of Telugu treebank following the Universal Dependencies framework. We manually annotated 1328 sentences from a Telugu grammar textbook and the treebank is freely available from Universal Dependencies version 2.1.1 In this paper, we discuss some language specific annotation issues and decisions; and report preliminary experiments with POS tagging and dependency parsing. To the best of our knowledge, this is the first freely accessible and open dependency treebank for Telugu
Towards grounding computational linguistic approaches to readability: Modeling reader-text interaction for easy and difficult texts
Computational approaches to readability assessment are generally built and evaluated using gold standard corpora labeled by publishers or teachers rather than being grounded in observations about human performance. Considering that both the reading process and the outcome can be observed, there is an empirical wealth that could be used to ground computational analysis of text readability. This will also support explicit readability models connecting text complexity and the reader’s language proficiency to the reading process and outcomes.
This paper takes a step in this direction by reporting on an experiment to study how the relation between text complexity and reader’s language proficiency affects the reading process and performance outcomes of readers after reading We modeled the reading process using three eye tracking variables: fixation count, average fixation count, and second pass reading duration. Our models for these variables explained 78.9%, 74% and 67.4% variance, respectively. Performance outcome was modeled through recall and comprehension questions, and these models explained 58.9% and 27.6% of the variance, respectively. While the online models give us a better understanding of the cognitive correlates of reading with text complexity and language proficiency, modeling of the offline measures can be particularly relevant for incorporating user aspects into readability models
A Multilingual Evaluation of NER Robustness to Adversarial Inputs
Adversarial evaluations of language models typically focus on English alone.
In this paper, we performed a multilingual evaluation of Named Entity
Recognition (NER) in terms of its robustness to small perturbations in the
input. Our results showed the NER models we explored across three languages
(English, German and Hindi) are not very robust to such changes, as indicated
by the fluctuations in the overall F1 score as well as in a more fine-grained
evaluation. With that knowledge, we further explored whether it is possible to
improve the existing NER models using a part of the generated adversarial data
sets as augmented training data to train a new NER model or as fine-tuning data
to adapt an existing NER model. Our results showed that both these approaches
improve performance on the original as well as adversarial test sets. While
there is no significant difference between the two approaches for English,
re-training is significantly better than fine-tuning for German and Hindi.Comment: Paper accepted at Repl4NLP workshop, ACL 202
Analyzing Text Complexity and Text Simplification: Connecting Linguistics, Processing and Educational Applications
Reading plays an important role in the process of learning and knowledge acquisition
for both children and adults. However, not all texts are accessible to every
prospective reader. Reading difficulties can arise when there is a mismatch between
a reader’s language proficiency and the linguistic complexity of the text
they read. In such cases, simplifying the text in its linguistic form while retaining
all the content could aid reader comprehension. In this thesis, we study text
complexity and simplification from a computational linguistic perspective.
We propose a new approach to automatically predict the text complexity using
a wide range of word level and syntactic features of the text. We show that this
approach results in accurate, generalizable models of text readability that work
across multiple corpora, genres and reading scales. Moving from documents to
sentences, We show that our text complexity features also accurately distinguish
different versions of the same sentence in terms of the degree of simplification
performed. This is useful in evaluating the quality of simplification performed by
a human expert or a machine-generated output and for choosing targets to simplify
in a difficult text. We also experimentally show the effect of text complexity on
readers’ performance outcomes and cognitive processing through an eye-tracking
experiment.
Turning from analyzing text complexity and identifying sentential simplifications
to generating simplified text, one can view automatic text simplification as a
process of translation from English to simple English. In this thesis, we propose
a statistical machine translation based approach for text simplification, exploring
the role of focused training data and language models in the process.
Exploring the linguistic complexity analysis further, we show that our text
complexity features can be useful in assessing the language proficiency of English
learners. Finally, we analyze German school textbooks in terms of their
linguistic complexity, across various grade levels, school types and among different
publishers by applying a pre-existing set of text complexity features developed
for German
On Understanding the Relation between Expert Annotations of Text Readability and Target Reader Comprehension
Automatic readability assessment aims to ensure that readers read texts that they can comprehend. However, computational models are typically trained on texts created from the perspective of the text writer, not the target reader. There is little experimental research on the relationship between expert annotations of readability, reader's language proficiency, and different levels of reading comprehension. To address this gap, we conducted a user study in which over a 100 participants read texts of different reading levels and answered questions created to test three forms of comprehension. Our results indicate that more than readability annotation or reader proficiency, it is the type of comprehension question asked that shows differences between reader responses - inferential questions were difficult for users of all levels of proficiency across reading levels. The data collected from this study will be released with this paper, which will, for the first time, provide a collection of 45 reader bench marked texts to evaluate readability assessment systems developed for adult learners of English. It can also potentially be useful for the development of question generation approaches in intelligent tutoring systems research
- …