Search CORE

299 research outputs found

A plea for more interactions between psycholinguistics and natural language processing research

Author: Brysbaert Marc
Keuleers Emmanuel
Mandera Pawel
Publication venue
Publication date: 01/01/2014
Field of study

A new development in psycholinguistics is the use of regression analyses on tens of thousands of words, known as the megastudy approach. This development has led to the collection of processing times and subjective ratings (of age of acquisition, concreteness, valence, and arousal) for most of the existing words in English and Dutch. In addition, a crowdsourcing study in the Dutch language has resulted in information about how well 52,000 lemmas are known. This information is likely to be of interest to NLP researchers and computational linguists. At the same time, large-scale measures of word characteristics developed in the latter traditions are likely to be pivotal in bringing the megastudy approach to the next level

Ghent University Academic Bibliography

How do Spanish speakers read words? Insights from a crowdsourced lexical decision megastudy

Author: Aguasvivas José Armando
Brysbaert Marc
Carreiras Manuel
Duñabeitia Jon Andoni
Keuleers Emmanuel
Mandera Paweł
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Published online: 18 February 2020Vocabulary size seems to be affected by multiple factors, including those that belong to the properties of the words themselves and those that relate to the characteristics of the individuals assessing the words. In this study, we present results from a crowdsourced lexical decision megastudy in which more than 150,000 native speakers from around 20 Spanish-speaking countries performed a lexical decision task to 70 target word items selected from a list of about 45,000 Spanish words. We examined how demographic characteristics such as age, education level, and multilingualism affected participants’ vocabulary size. Also, we explored how common factors related to words like frequency, length, and orthographic neighbourhood influenced the knowledge of a particular item. Results indicated important contributions of age to overall vocabulary size, with vocabulary size increasing in a logarithmic fashion with this factor. Furthermore, a contrast between monolingual and bilingual communities within Spain revealed no significant vocabulary size differences between the communities. Additionally, we replicated the standard effects of the words’ properties and their interactions, accurately accounting for the estimated knowledge of a particular word. These results highlight the value of crowdsourced approaches to uncover effects that are traditionally masked by smallsampled in-lab factorial experimental designs.This research is supported by the Basque Government through the BERC 2018-2021 program and by the Spanish State Research Agency through BCBL Severo Ochoa excellence accreditation SEV-2015-0490. This study was also partially supported by grants PGC2018-097145-B-I00, RED2018-102615-T, and RTI2018-093547-B-I00 from the Spanish State Research Agency. Work by JA was supported by “la Caixa” Foundation and the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement no. 713673, and fellowship code LCF/BQ/IN17/116200154004. We would also like to thank the reviewers for their insightful comments and efforts towards improving this manuscript

Ghent University Academic Bibliography

Archivo Digital para la Docencia y la Investigación

Tilburg University Repository

Beyond Quantity of Experience: Exploring the Role of Semantic Consistency in Chinese Character Knowledge

Author: Hsieh Cheng-Yu
Marelli Marco
Rastle Kathy
Publication venue
Publication date: 08/08/2023
Field of study

Royal Holloway - Pure

MEGALEX:A megastudy of visual and auditory word recognition

Author: Bonin Patrick
Dufau Stéphane
Ferrand Ludovic
Grainger Jonathan
Mathôt Sebastiaan
Méot Alain
New Boris
Pallier Christophe
Spinelli Elsa
Publication venue
Publication date: 08/08/2017
Field of study

Using the megastudy approach, we report a new database (MEGALEX) of visual and auditory lexical decision times and accuracy rates for tens of thousands of words. We collected visual lexical decision data for 28,466 French words and the same number of pseudowords, and auditory lexical decision data for 17,876 French words and the same number of pseudowords (synthesized tokens were used for the auditory modality). This constitutes the first large-scale database for auditory lexical decision, and the first database to enable a direct comparison of word recognition in different modalities. Different regression analyses were conducted to illustrate potential ways to exploit this megastudy database. First, we compared the proportions of variance accounted for by five word frequency measures. Second, we conducted item-level regression analyses to examine the relative importance of the lexical variables influencing performance in the different modalities (visual and auditory). Finally, we compared the similarities and differences between the two modalities. All data are freely available on our website ( https://sedufau.shinyapps.io/megalex/ ) and are searchable at www.lexique.org , inside the Open Lexique search engine

HAL-uB

Hal - Université Grenoble Alpes

University of Groningen

HAL AMU

HAL Clermont Université

HAL Descartes

HAL Université de Savoie

HAL-CEA

Proceedings - University of Groningen

Crossref

ARTS repository - University of Groningen

HAL-Inserm

Dissertations of the University of Groningen

Recognition times for 62 thousand English words : data from the English Crowdsourcing Project

Author: Brysbaert Marc
Keuleers Emmanuel
Mandera Pawel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 31/07/2019
Field of study

We present a new dataset of English word recognition times for a total of 62 thousand words, called the English Crowdsourcing Project. The data were collected via an internet vocabulary test in which more than one million people participated. The present dataset is limited to native English speakers. Participants were asked to indicate which words they knew. Their response times were registered, although at no point were the participants asked to respond as quickly as possible. Still, the response times correlate around .75 with the response times of the English Lexicon Project for the shared words. Also, the results of virtual experiments indicate that the new response times are a valid addition to the English Lexicon Project. This not only means that we have useful response times for some 35 thousand extra words, but we now also have data on differences in response latencies as a function of education and age

Ghent University Academic Bibliography

Tilburg University Repository

The British Lexicon Project: Lexical decision data for 28,730 monosyllabic and disyllabic English words

Author: A Rey
CM Morrison
CR Sears
CR Sears
D Chateau
D Chateau
DA Balota
DA Balota
DE Sibley
DH Spieler
DH Spieler
E Keuleers
E Keuleers
E Keuleers
EJ Wagenmakers
Emmanuel Keuleers
GO Stone
J Grainger
J Rodd
J Segui
K Lemhofer
Kathleen Rastle
L Ferrand
M Brysbaert
M Coltheart
M Stevens
M Yates
M Yates
M Yates
Marc Brysbaert
MJ Cortese
MJ Cortese
MJ Yap
MS Seidenberg
MS Seidenberg
P Courrieu
P Courrieu
Paula Lacey
PM Pexman
R Borowsky
R Treiman
RH Baayen
RH Baayen
S Andrews
S Gerhand
S Monsell
Y Hino
Publication venue: Springer-Verlag
Publication date: 01/01/2011
Field of study

We present a new database of lexical decision times for English words and nonwords, for which two groups of British participants each responded to 14,365 monosyllabic and disyllabic words and the same number of nonwords for a total duration of 16 h (divided over multiple sessions). This database, called the British Lexicon Project (BLP), fills an important gap between the Dutch Lexicon Project (DLP; Keuleers, Diependaele, & Brysbaert, Frontiers in Language Sciences. Psychology, 1, 174, 2010) and the English Lexicon Project (ELP; Balota et al., 2007), because it applies the repeated measures design of the DLP to the English language. The high correlation between the BLP and ELP data indicates that a high percentage of variance in lexical decision data sets is systematic variance, rather than noise, and that the results of megastudies are rather robust with respect to the selection and presentation of the stimuli. Because of its design, the BLP makes the same analyses possible as the DLP, offering researchers with a new interesting data set of word-processing times for mixed effects analyses and mathematical modeling. The BLP data are available at http://crr.ugent.be/blp and as Electronic Supplementary Materials

Royal Holloway Research Online

Crossref

Springer - Publisher Connector

Ghent University Academic Bibliography

PubMed Central

Word prevalence norms for 62,000 English lemmas

Author: Brysbaert Marc
Keuleers Emmanuel
Mandera Paweł
McCormick Samantha F.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

We present word prevalence data for 61,858 English words. Word prevalence refers to the number of people who know the word. The measure was obtained on the basis of an online crowdsourcing study involving over 220,000 people. Word prevalence data are useful for gauging the difficulty of words and, as such, for matching stimulus materials in experimental conditions or selecting stimulus materials for vocabulary tests. Word prevalence also predicts word processing times, over and above the effects of word frequency, word length, similarity to other words, and age of acquisition, in line with previous findings in the Dutch language

Roehampton University Research Repository

Royal Holloway - Pure

Ghent University Academic Bibliography

Tilburg University Repository

SPALEX: A Spanish Lexical Decision Database From a Massive Online Data Collection

Author: Aguasvivas José Armando
Brysbaert Marc
Carreiras Manuel
Duñabeitia Jon Andoni
Keuleers Emmanuel
Mandera Paweł
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2018
Field of study

Published: 12 November 2018This research has been partially funded by grants PSI2015-65689-P and SEV-2015-0490 from the Spanish Government, and AThEME-613465 from the European Union. Work by JA was supported by la Caixa Foundation and the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement no. 713673

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Ghent University Academic Bibliography

Archivo Digital para la Docencia y la Investigación

Tilburg University Repository

Practice Effects in Large-Scale Visual Word Recognition Studies: A Lexical Decision Study on 14,000 Dutch Mono- and Disyllabic Words and Nonwords

Author: Brysbaert Marc
Diependaele Kevin
Keuleers Emmanuel
Publication venue: Frontiers Research Foundation
Publication date: 01/01/2010
Field of study

In recent years, psycholinguistics has seen a remarkable growth of research based on the analysis of data from large-scale studies of word recognition, in particular lexical decision and word naming. We present the data of the Dutch Lexicon Project (DLP) in which a group of 39 participants made lexical decisions to 14,000 words and the same number of nonwords. To examine whether the extensive practice precludes comparison with the traditional short experiments, we look at the differences between the first and the last session, compare the results with the English Lexicon Project (ELP) and the French Lexicon Project (FLP), and examine to what extent established findings in Dutch psycholinguistics can be replicated in virtual experiments. Our results show that when good nonwords are used, practice effects are minimal in lexical decision experiments and do not invalidate the behavioral data. For instance, the word frequency curve is the same in DLP as in ELP and FLP. Also, the Dutch–English cognate effect is the same in DLP as in a previously published factorial experiment. This means that large-scale word recognition studies can make use of psychophysical and psychometrical approaches. In addition, our data represent an important collection of very long series of individual reaction times that may be of interest to researchers in other areas

Crossref

Ghent University Academic Bibliography

Directory of Open Access Journals

PubMed Central

Frontiers - Publisher Connector