Pre-Trained Language Models Augmented with Synthetic Scanpaths for
  Natural Language Understanding

Deng, Shuwen; Jäger, Lena A.; Prasse, Paul; Reich, David R.; Scheffer, Tobias

Pre-Trained Language Models Augmented with Synthetic Scanpaths for Natural Language Understanding

Authors: Shuwen Deng
Lena A. Jäger
Paul Prasse
David R. Reich
Tobias Scheffer
Publication date: 23 October 2023
Publisher

Abstract

Human gaze data offer cognitive information that reflects natural language comprehension. Indeed, augmenting language models with human scanpaths has proven beneficial for a range of NLP tasks, including language understanding. However, the applicability of this approach is hampered because the abundance of text corpora is contrasted by a scarcity of gaze data. Although models for the generation of human-like scanpaths during reading have been developed, the potential of synthetic gaze data across NLP tasks remains largely unexplored. We develop a model that integrates synthetic scanpath generation with a scanpath-augmented language model, eliminating the need for human gaze data. Since the model's error gradient can be propagated throughout all parts of the model, the scanpath generator can be fine-tuned to downstream tasks. We find that the proposed model not only outperforms the underlying language model, but achieves a performance that is comparable to a language model augmented with real human gaze data. Our code is publicly available.Comment: Pre-print for EMNLP 202

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2310.14676

Last time updated on 16/01/2024