39 research outputs found
Large Language Model Augmented Exercise Retrieval for Personalized Language Learning
We study the problem of zero-shot exercise retrieval in the context of online
language learning, to give learners the ability to explicitly request
personalized exercises via natural language. Using real-world data collected
from language learners, we observe that vector similarity approaches poorly
capture the relationship between exercise content and the language that
learners use to express what they want to learn. This semantic gap between
queries and content dramatically reduces the effectiveness of general-purpose
retrieval models pretrained on large scale information retrieval datasets like
MS MARCO. We leverage the generative capabilities of large language models to
bridge the gap by synthesizing hypothetical exercises based on the learner's
input, which are then used to search for relevant exercises. Our approach,
which we call mHyER, overcomes three challenges: (1) lack of relevance labels
for training, (2) unrestricted learner input content, and (3) low semantic
similarity between input and retrieval candidates. mHyER outperforms several
strong baselines on two novel benchmarks created from crowdsourced data and
publicly available data.Comment: Presented at Learning Analytics and Knowledge 2024. 11 pages, 4
figures, 5 table
Recommended from our members
Detecting Language Impairments in Autism: A Computational Analysis of Semi-structured Conversations with Vector Semantics
Many of the most significant impairments faced by individuals with autism spectrum disorder (ASD) relate to pragmatic (i.e. social) language. There is also evidence that pragmatic language differences may map to ASD-related genes. Therefore, quantifying the social-linguistic features of ASD has the potential to both improve clinical treatment and help identify gene-behavior relationships in ASD. Here, we apply vector semantics to transcripts of semi-structured interactions with children with both idiopathic and syndromic ASD. We find that children with ASD are less semantically similar to a gold standard derived from typically developing participants, and are more semantically variable. We show that this semantic similarity measure is affected by transcript word length, but that these group differences persist after removing length differences via subsampling. These findings suggest that linguistic signatures of ASD pervade child speech broadly, and can be automatically detected even in less structured interactions
Verb phrase ellipsis: The view from information structure.
Abstract Findings from three experimental studies are presented in support of the hypothesis that the reduced acceptability associated with antecedent mismatch under ellipsis reflects violation of an information structural constraint governing contrastive topic structures, and not an ellipsis-specific licensing constraint as previously assumed. Magnitude estimation data show that the penalty associated with a mismatched antecedent is larger for contrastive topic ellipses as compared to ellipses which exhibit simple (non-contrastive topic) focus. The same pattern of acceptability is also observed for non-ellipsis controls, however. Online reading times indicate increased processing costs associated with antecedent mismatch, and the cost is greater in contrastive topic as compared to simple focus ellipses. Elevated reading times for mismatched contrastive topics are observed throughout the target clause, however, including regions prior to the ellipsis site.
Do successor effects in reading reflect lexical parafoveal processing? Evidence from corpus-based and experimental eye movement data
Abstract In the past, most research on eye movements during reading involved a limited number of subjects reading sentences with specific experimental manipulations on target words. Such experiments usually only analyzed eye-movements measures on and around the target word. Recently, some researchers have started collecting larger data sets involving large and diverse groups of subjects reading large numbers of sentences, enabling them to consider a larger number of influences and study larger and more representative subject groups. In such corpus studies, most of the words in a sentence are analyzed. The complexity of the design of corpus studies and the many potentially uncontrolled influences in such studies pose new issues concerning the analysis methods and interpretability of the data. In particular, several corpus studies of reading have found an effect of successor word (n + 1) frequency on current word (n) fixation times, while studies employing experimental manipulations tend not to. The general interpretation of corpus studies suggests that readers obtain parafoveal lexical information from the upcoming word before they have finished identifying the current word, while the experimental manipulations shed doubt on this claim. In the present study, we combined a corpus analysis approach with an experimental manipulation (i.e., a parafoveal modification of the moving mask technique, Rayner & Bertera, 1979), so that, either (a) word n+1, (b) word n+;2, (c) both words, or (d) neither word was masked. We found that denying preview for either or both parafoveal words increased average fixation times. Furthermore, we found successor effects similar to those reported in the corpus studies. Importantly, these successor effects were found even when the parafoveal word was masked, suggesting that apparent successor frequency effects may be due to causes that are unrelated to lexical parafoveal preprocessing. We discuss the implications of this finding both for parallel and serial accounts of word identification and for the interpretability of large correlational studies of word identification in reading in general
Bicknell, Levy, & Rayner
This contains data, analysis, and materials for Bicknell, Levy, and Rayner paper in Psychological Science
Online expectations for verbal arguments conditional on event knowledge
Abstract This paper provides support for the hypothesis that comprehenders form online expectations for upcoming verbal arguments using their knowledge of typical events. We test this hypothesis in a self-paced reading experiment and an experiment measuring event-related brain potentials. In both experiments, we use materials in which the likelihood of the verbal patient depends on event knowledge about the particular combination of agent and verb earlier in the sentence. By manipulating the agent for a given verb, we show that comprehenders experience more processing difficulty in sentences where the patient is less likely. Norming studies and a priming experiment provide evidence that this result is unlikely to have arisen from direct linguistic associations between patient and agent, suggesting that comprehenders use their event knowledge to form expectations
Eye movements in reading as rational behavior
Moving one's eyes while reading is one of the most complex everyday tasks humans face. To perform efficiently, readers must make decisions about when and where to move their eyes every 200-300ms. Over the past decades, it has been demonstrated that these fine-grained decisions are influenced by a range of linguistic properties of the text, and measuring eye movements during reading has become one of the primary methods of studying online sentence comprehension. However, it is still largely unclear why linguistic variables affect the eye movement record in the ways they do. The present work begins to answer this question by presenting a rational framework for understanding eye movement control in reading, in which probabilistic language knowledge plays a crucial role. Specifically, the task of reading is taken to be one of sentence identification: readers move their eyes to efficiently obtain visual input, which they combine with probabilistic language knowledge through Bayesian inference to yield posterior beliefs about sentence form and structure. Simulations with implemented models within this framework demonstrate that it can provide a principled account of many aspects of reading behavior, including the influence of a number of linguistic variables. In addition, the framework suggests a novel explanation for one of the least understood aspects of eye movements in reading - regressive eye movements - and we present evidence from an eye tracking corpus to support this proposa