Search CORE

819 research outputs found

Weakly Supervised Part-of-speech Tagging Using Eye-tracking Data

Author: Barrett Maria Jung
Bingel Joachim
Keller Frank
Søgaard Anders
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2016
Field of study

Crossref

Copenhagen University Research Information System

Edinburgh Research Explorer

Improving natural language processing with human data:Eye tracking and other data sources reflecting cognitive text processing

Author: Barrett Maria
Publication venue: Det Humanistiske Fakultet, Københavns Universitet
Publication date: 01/10/2018
Field of study

Copenhagen University Research Information System

Entity Recognition at First Sight: Improving NER with Eye Movement Information

Author: Hollenstein Nora
Zhang Ce
Publication venue
Publication date: 01/01/2019
Field of study

Previous research shows that eye-tracking data contains information about the lexical and syntactic properties of text, which can be used to improve natural language processing models. In this work, we leverage eye movement features from three corpora with recorded gaze information to augment a state-of-the-art neural model for named entity recognition (NER) with gaze embeddings. These corpora were manually annotated with named entity labels. Moreover, we show how gaze features, generalized on word type level, eliminate the need for recorded eye-tracking data at test time. The gaze-augmented models for NER using token-level and type-level features outperform the baselines. We present the benefits of eye-tracking features by evaluating the NER models on both individual datasets as well as in cross-domain settings.Comment: Accepted at NAACL-HLT 201

arXiv.org e-Print Archive

Repository for Publications and Research Data

Copenhagen University Research Information System

What to do about non-standard (or non-canonical) language in NLP

Author: Plank Barbara
Publication venue
Publication date: 01/01/2016
Field of study

Real world data differs radically from the benchmark corpora we use in natural language processing (NLP). As soon as we apply our technologies to the real world, performance drops. The reason for this problem is obvious: NLP models are trained on samples from a limited set of canonical varieties that are considered standard, most prominently English newswire. However, there are many dimensions, e.g., socio-demographics, language, genre, sentence type, etc. on which texts can differ from the standard. The solution is not obvious: we cannot control for all factors, and it is not clear how to best go beyond the current practice of training on homogeneous data from a single domain and language. In this paper, I review the notion of canonicity, and how it shapes our community's approach to language. I argue for leveraging what I call fortuitous data, i.e., non-obvious data that is hitherto neglected, hidden in plain sight, or raw data that needs to be refined. If we embrace the variety of this heterogeneous data by combining it with proper algorithms, we will not only produce more robust models, but will also enable adaptive language technology capable of addressing natural language variation.Comment: KONVENS 201

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

At a Glance: The Impact of Gaze Aggregation Views on Syntactic Tagging

Author: Klerke Sigrid
Plank Barbara
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

The IT University of Copenhagen's Repository

Sequence labelling and sequence classification with gaze:Novel uses of eye‐tracking data for Natural Language Processing

Author: Barrett Maria Jung
Hollenstein Nora
Publication venue: 'Wiley'
Publication date: 01/01/2020
Field of study

Crossref

Copenhagen University Research Information System

The IT University of Copenhagen's Repository

Predicting Native Language from Gaze

Author: Berzak Yevgeni
Flynn Suzanne
Katz Boris
Nakamura Chie
Publication venue
Publication date: 01/01/2017
Field of study

A fundamental question in language learning concerns the role of a speaker's first language in second language acquisition. We present a novel methodology for studying this question: analysis of eye-movement patterns in second language reading of free-form text. Using this methodology, we demonstrate for the first time that the native language of English learners can be predicted from their gaze fixations when reading English. We provide analysis of classifier uncertainty and learned features, which indicates that differences in English reading are likely to be rooted in linguistic divergences across native languages. The presented framework complements production studies and offers new ground for advancing research on multilingualism.Comment: ACL 201

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Sequence classification with human attention

Author: Barrett M
Bingel J
Hollenstein N
Rei M
Søgaard A
Publication venue: CoNLL 2018 - 22nd Conference on Computational Natural Language Learning, Proceedings
Publication date: 01/01/2018
Field of study

Learning attention functions requires large volumes of data, but many NLP tasks simulate human behavior, and in this paper, we show that human attention really does provide a good inductive bias on many attention functions in NLP. Specifically, we use estimated human attention derived from eye-tracking corpora to regularize attention functions in recurrent neural networks. We show substantial improvements across a range of tasks, including sentiment analysis, grammatical error detection, and detection of abusive language

Copenhagen University Research Information System

Apollo (Cambridge)