45 research outputs found
Entity Recognition at First Sight: Improving NER with Eye Movement Information
Previous research shows that eye-tracking data contains information about the
lexical and syntactic properties of text, which can be used to improve natural
language processing models. In this work, we leverage eye movement features
from three corpora with recorded gaze information to augment a state-of-the-art
neural model for named entity recognition (NER) with gaze embeddings. These
corpora were manually annotated with named entity labels. Moreover, we show how
gaze features, generalized on word type level, eliminate the need for recorded
eye-tracking data at test time. The gaze-augmented models for NER using
token-level and type-level features outperform the baselines. We present the
benefits of eye-tracking features by evaluating the NER models on both
individual datasets as well as in cross-domain settings.Comment: Accepted at NAACL-HLT 201
Cross-Lingual Transfer of Cognitive Processing Complexity
When humans read a text, their eye movements are influenced by the structural complexity of the input sentences. This cognitive phenomenon holds across languages and recent studies indicate that multilingual language models utilize structural similarities between languages to facilitate cross-lingual transfer. We use sentence-level eye-tracking patterns as a cognitive indicator for structural complexity and show that the multilingual model XLM-RoBERTa can successfully predict varied patterns for 13 typologically diverse languages, despite being fine-tuned only on English data. We quantify the sensitivity of the model to structural complexity and distinguish a range of complexity characteristics. Our results indicate that the model develops a meaningful bias towards sentence length but also integrates cross-lingual differences. We conduct a control experiment with randomized word order and find that the model seems to additionally capture more complex structural information
WebQAmGaze: A Multilingual Webcam Eye-Tracking-While-Reading Dataset
We create WebQAmGaze, a multilingual low-cost eye-tracking-while-reading
dataset, designed to support the development of fair and transparent NLP
models. WebQAmGaze includes webcam eye-tracking data from 332 participants
naturally reading English, Spanish, and German texts. Each participant performs
two reading tasks composed of five texts, a normal reading and an
information-seeking task. After preprocessing the data, we find that fixations
on relevant spans seem to indicate correctness when answering the comprehension
questions. Additionally, we perform a comparative analysis of the data
collected to high-quality eye-tracking data. The results show a moderate
correlation between the features obtained with the webcam-ET compared to those
of a commercial ET device. We believe this data can advance webcam-based
reading studies and open a way to cheaper and more accessible data collection.
WebQAmGaze is useful to learn about the cognitive processes behind question
answering (QA) and to apply these insights to computational models of language
understanding
Dynamic Human Evaluation for Relative Model Comparisons
Collecting human judgements is currently the most reliable evaluation method
for natural language generation systems. Automatic metrics have reported flaws
when applied to measure quality aspects of generated text and have been shown
to correlate poorly with human judgements. However, human evaluation is time
and cost-intensive, and we lack consensus on designing and conducting human
evaluation experiments. Thus there is a need for streamlined approaches for
efficient collection of human judgements when evaluating natural language
generation systems. Therefore, we present a dynamic approach to measure the
required number of human annotations when evaluating generated outputs in
relative comparison settings. We propose an agent-based framework of human
evaluation to assess multiple labelling strategies and methods to decide the
better model in a simulation and a crowdsourcing case study. The main results
indicate that a decision about the superior model can be made with high
probability across different labelling strategies, where assigning a single
random worker per task requires the least overall labelling effort and thus the
least cost.Comment: accepted at LREC 202