Search CORE

12 research outputs found

The ZuCo benchmark on cross-subject reading task classification with EEG and eye-tracking data

Author: Hollenstein Nora
Jäger Lena A
Kiegeland Samuel
Langer Nicolas
Plomecka Martyna
Tröndle Marius
Özyurt Yilmazcan
Publication venue: Frontiers Research Foundation
Publication date: 12/01/2023
Field of study

We present a new machine learning benchmark for reading task classification with the goal of advancing EEG and eye-tracking research at the intersection between computational language processing and cognitive neuroscience. The benchmark task consists of a cross-subject classification to distinguish between two reading paradigms: normal reading and task-specific reading. The data for the benchmark is based on the Zurich Cognitive Language Processing Corpus (ZuCo 2.0), which provides simultaneous eye-tracking and EEG signals from natural reading of English sentences. The training dataset is publicly available, and we present a newly recorded hidden testset. We provide multiple solid baseline methods for this task and discuss future improvements. We release our code and provide an easy-to-use interface to evaluate new approaches with an accompanying public leaderboard: www.zuco-benchmark.com

ZORA

Helping, I Mean Assessing Psychiatric Communication: An Applicaton of Incremental Self-Repair Detection

Author: Hough J
Howes Christine
McCabe Rose
Purver Matthew
Publication venue
Publication date: 04/03/2016
Field of study

18th SemDial Workshop on the Semantics and Pragmatics of Dialogue (DialWatt), 1-3 September 2014, Edinburgh, ScotlandSelf-repair is pervasive in dialogue, and models thereof have long been a focus of research, particularly for disfluency detection in speech recognition and spoken dialogue systems. However, the generality of such models across domains has received little attention. In this paper we investigate the application of an automatic incremental self-repair detection system, STIR, developed on the Switchboard corpus of telephone speech, to a new domain – psychiatric consultations. We find that word-level accuracy is reduced markedly by the differences in annotation schemes and transcription conventions between corpora, which has implications for the generalisability of all repair detection systems. However, overall rates of repair are detected accurately, promising a useful resource for clinical dialogue studies

Open Research Exeter

Helping, I Mean Assessing Psychiatric Communication: An Application of Incremental Self-Repair Detection

Author: Hough J
Howes C
McCabe R
Purver M
Publication venue
Publication date: 01/09/2014
Field of study

Howes was supported by the EPSRC-funded PPAT project grant number EP/J501360/1 during this work. Hough is supported by the DUEL project financially supported by the Agence Nationale de la Research (grant number ANR-13-FRAL-0001) and the Deutsche Forschungsgemainschaft. Much of the work was carried out under an EPSRC DTA scholarship at Queen Mary University of London. Purver is partly supported by ConCreTe: the project ConCreTe acknowledges the financial support of the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme for Research of the European Commission, under FET grant number 611733

Queen Mary Research Online

Strongly Incremental Repair Detection

Author: Hough Julian
Purver Matthew
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 23/04/1992
Field of study

Hough J, Purver M. Strongly Incremental Repair Detection. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar: ACL; 2014: 78-89.We present STIR (STrongly Incremental Repair detection), a system that detects speech repairs and edit terms on transcripts incrementally with minimal latency. STIR uses information-theoretic measures from n-gram models as its principal decision features in a pipeline of classifiers detecting the different stages of repairs. Results on the Switchboard disfluency tagged corpus show utterance-final accuracy on a par with state-of-the-art incremental repair detection methods, but with better incremental accuracy, faster time-to-detection and less computational overhead. We evaluate its performance using incremental metrics and propose new repair processing evaluation standards

Publications at Bielefeld University

Kenyon College: Digital Kenyon - Research, Scholarship, and Creative Exchange

A study of model parameters for scaling up word to sentence similarity tasks in distributional semantics

Author: Milajevs Dmitrijs
Publication venue: 'Queen Mary University of London'
Publication date: 04/04/2018
Field of study

PhDRepresentation of sentences that captures semantics is an essential part of natural language processing systems, such as information retrieval or machine translation. The representation of a sentence is commonly built by combining the representations of the words that the sentence consists of. Similarity between words is widely used as a proxy to evaluate semantic representations. Word similarity models are well-studied and are shown to positively correlate with human similarity judgements. Current evaluation of models of sentential similarity builds on the results obtained in lexical experiments. The main focus is how the lexical representations are used, rather than what they should be. It is often assumed that the optimal representations for word similarity are also optimal for sentence similarity. This work discards this assumption and systematically looks for lexical representations that are optimal for similarity measurement between sentences. We find that the best representation for word similarity is not always the best for sentence similarity and vice versa. The best models in word similarity tasks perform best with additive composition. However, the best result on compositional tasks is achieved with Kroneckerbased composition. There are representations that are equally good in both tasks when used with multiplicative composition. The systematic study of the parameters of similarity models reveals that the more information lexical representations contain, the more attention should be paid to noise. In particular, the word vectors in models with the feature size at the magnitude of the vocabulary size should be sparse, but if a small number of context features is used then the vectors should be dense. Given the right lexical representations, compositional operators achieve state-of-the-art performance, improving over models that use neural-word embeddings. To avoid overfitting, either several test datasets should be used or parameter selection should be based on parameters’ average behaviours.EPSRC grant EP/J002607/1

Queen Mary Research Online

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

Author
Publication venue: 'OpenEdition'
Publication date: 01/07/2022
Field of study

On behalf of the Program Committee, a very warm welcome to the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020). This edition of the conference is held in Bologna and organised by the University of Bologna. The CLiC-it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after six years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

Directory of Open Access Books (DOAB)

Tune your brown clustering, please

Author: Bøgh K.S.
Chester S.
Derczynski L.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2015
Field of study

Brown clustering, an unsupervised hierarchical clustering technique based on ngram mutual information, has proven useful in many NLP applications. However, most uses of Brown clustering employ the same default configuration; the appropriateness of this configuration has gone predominantly unexplored. Accordingly, we present information for practitioners on the behaviour of Brown clustering in order to assist hyper-parametre tuning, in the form of a theoretical model of Brown clustering utility. This model is then evaluated empirically in two sequence labelling tasks over two text types. We explore the dynamic between the input corpus size, chosen number of classes, and quality of the resulting clusters, which has an impact for any approach using Brown clustering. In every scenario that we examine, our results reveal that the values most commonly used for the clustering are sub-optimal

White Rose Research Online

SEMDIAL 2013 : DialDam:Proceedings of the 17th Workshop on the Semantics and Pragmatics of Dialogue : Amsterdam, 16-18 December 2013

Author
Publication venue: University of Amsterdam
Publication date: 01/01/2013
Field of study

International Migration, Integration and Social Cohesion online publications

EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020

Author
Publication venue: 'OpenEdition'
Publication date: 10/06/2022
Field of study

Welcome to EVALITA 2020! EVALITA is the evaluation campaign of Natural Language Processing and Speech Tools for Italian. EVALITA is an initiative of the Italian Association for Computational Linguistics (AILC, http://www.ai-lc.it) and it is endorsed by the Italian Association for Artificial Intelligence (AIxIA, http://www.aixia.it) and the Italian Association for Speech Sciences (AISV, http://www.aisv.it)

Directory of Open Access Books (DOAB)

Can structural priming answer the important questions about language? A commentary on Branigan and Pickering "An experimental approach to linguistic representation"

Author: Huettig F.
Martin A.
Nieuwland M.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2017
Field of study

While structural priming makes a valuable contribution to psycholinguistics, it does not allow direct observation of representation, nor escape “source ambiguity.” Structural priming taps into implicit memory representations and processes that may differ from what is used online. We question whether implicit memory for language can and should be equated with linguistic representation or with language processing

MPG.PuRe