Search CORE

5 research outputs found

Words, Subwords, and Morphemes: What Really Matters in the Surprisal-Reading Time Relationship?

Author: Nair Sathvik
Resnik Philip
Publication venue
Publication date: 26/10/2023
Field of study

An important assumption that comes with using LLMs on psycholinguistic data has gone unverified. LLM-based predictions are based on subword tokenization, not decomposition of words into morphemes. Does that matter? We carefully test this by comparing surprisal estimates using orthographic, morphological, and BPE tokenization against reading time data. Our results replicate previous findings and provide evidence that in the aggregate, predictions using BPE tokenization do not suffer relative to morphological and orthographic segmentation. However, a finer-grained analysis points to potential issues with relying on BPE-based tokenization, as well as providing promising results involving morphologically-aware surprisal estimates and suggesting a new method for evaluating morphological prediction.Comment: Accepted to Findings of EMNLP 2023; 10 pages, 5 figure

arXiv.org e-Print Archive

Contextualized Word Embeddings Capture Human-Like Relations Between English Word Senses

Author: Mahesh Srinivasan
Sathvik Nair
Stephan Meylan
Publication venue: 'Center for Open Science'
Publication date: 27/10/2020
Field of study

CogALex 2020 Submissio

Telephone: Evaluating Language Models with Serial Reproduction

Author: Sathvik Nair
Stephan Meylan
Tom Griffiths
Publication venue: OSF
Publication date: 19/04/2021
Field of study

Repository for "Evaluating Models of Robust Word Recognition with Serial Reproduction

How far does probability take us when measuring psycholinguistic fit? Evidence from Substitution Illusions and Speeded Cloze Data

Author: Colin Phillips
Philip Resnik
Sathvik Nair
Shohini Bhattasali
Publication venue: 'Center for Open Science'
Publication date: 07/03/2023
Field of study

Peekbank Dataset Repository

Author: Adrian Steffan
Alexandra Carstensen
Alvin Tan
Angeline Tsui
Annissa Saleh
Bria Long
Charles Murray
Claire Bergey
Daniel Yurovsky
Esat Boucaud
Evgenii Kalenkovich
George Kachergis
Hannah E. Marshall
Heidi A. Baumgartner
[email protected]
Jess Mankewitz
Kunal Handa
Marisa Casillas
Martin Zettersten
Michael C. Frank
Molly Lewis
Naiti Bhatt
Rose M. Schneider
Ruthe Foushee
Sarp Uner
Sathvik Nair
Sophie Regan
Stephan Meylan
Tian Linger Xu
Veronica Boyce
Publication venue: 'Center for Open Science'
Publication date: 04/06/2024
Field of study

core

core