We create WebQAmGaze, a multilingual low-cost eye-tracking-while-reading
dataset, designed to support the development of fair and transparent NLP
models. WebQAmGaze includes webcam eye-tracking data from 332 participants
naturally reading English, Spanish, and German texts. Each participant performs
two reading tasks composed of five texts, a normal reading and an
information-seeking task. After preprocessing the data, we find that fixations
on relevant spans seem to indicate correctness when answering the comprehension
questions. Additionally, we perform a comparative analysis of the data
collected to high-quality eye-tracking data. The results show a moderate
correlation between the features obtained with the webcam-ET compared to those
of a commercial ET device. We believe this data can advance webcam-based
reading studies and open a way to cheaper and more accessible data collection.
WebQAmGaze is useful to learn about the cognitive processes behind question
answering (QA) and to apply these insights to computational models of language
understanding