77 research outputs found

    How acoustically reduced forms activate the lexicon : evidence from eye-tracking

    Get PDF
    Most research on spoken word comprehension has focused on carefully articulated speech that is read aloud by selected speakers (Cutler, 1998). But the type of speech we most often encounter is spontaneous speech, in which no attention is paid to careful pronunciation. The production of a word shorter than its citation form is called reduction, which is highly frequent in casual speech (Ernestus, 2000; Johnson, 2004). The challenge for models of word comprehension is to explain how listeners recognize reduced forms such as [pjutǝr] which deviate drastically from their canonical counterpart [kɔmpjutǝr] 'computer'.peer-reviewe

    Seeing the advantage: visually grounding word embeddings to better capture human semantic knowledge

    Get PDF
    Distributional semantic models capture word-level meaning that is useful in many natural language processing tasks and have even been shown to capture cognitive aspects of word meaning. The majority of these models are purely text based, even though the human sensory experience is much richer. In this paper we create visually grounded word embeddings by combining English text and images and compare them to popular text-based methods, to see if visual information allows our model to better capture cognitive aspects of word meaning. Our analysis shows that visually grounded embedding similarities are more predictive of the human reaction times in a large priming experiment than the purely text-based embeddings. The visually grounded embeddings also correlate well with human word similarity ratings. Importantly, in both experiments we show that the grounded embeddings account for a unique portion of explained variance, even when we include text-based embeddings trained on huge corpora. This shows that visual grounding allows our model to capture information that cannot be extracted using text as the only source of information

    Semantic sentence similarity: size does not always matter

    Get PDF
    This study addresses the question whether visually grounded speech recognition (VGS) models learn to capture sentence semantics without access to any prior linguistic knowledge. We produce synthetic and natural spoken versions of a well known semantic textual similarity database and show that our VGS model produces embeddings that correlate well with human semantic similarity judgements. Our results show that a model trained on a small image-caption database outperforms two models trained on much larger databases, indicating that database size is not all that matters. We also investigate the importance of having multiple captions per image and find that this is indeed helpful even if the total number of images is lower, suggesting that paraphrasing is a valuable learning signal. While the general trend in the field is to create ever larger datasets to train models on, our findings indicate other characteristics of the database can just as important important.Comment: This paper has been accepted at Interspeech 2021 where it will be presented and appear in the conference proceedings in September 202

    Probabilistic effects on French [t] duration

    Get PDF
    Abstract The present study shows that [t] consonants are affected by probabilistic factors in a syllable-timed language as French, and in spontaneous as well as in journalistic speech. Study 1 showed a word bigram frequency effect in spontaneous French, but its exact nature depended on the corpus on which the probabilistic measures were based. Study 2 investigated journalistic speech and showed an effect of the joint frequency of the test word and its following word. We discuss the possibility that these probabilistic effects are due to the speaker's planning of upcoming words, and to the speaker's adaptation to the listener's needs

    Prosodic structure affects the production and perception of voice-assimilated German fricatives

    Get PDF
    Prosodic structure has long been known to constrain phonological processes [1]. More recently, it has also been recognized as a source of fine-grained phonetic variation of speech sounds. In particular, segments in domain-initial position undergo prosodic strengthening [2, 3], which also implies more resistance to coarticulation in higher prosodic domains [5]. The present study investigates the combined effects of prosodic strengthening and assimilatory devoicing on word-initial fricatives in German, the functional implication of both processes for cues to the fortis-lenis contrast, and the influence of prosodic structure on listeners’ compensation for assimilation. Results indicate that 1. Prosodic structure modulates duration and the degree of assimilatory devoicing, 2. Phonological contrasts are maintained by speakers, but differ in phonetic detail across prosodic domains, and 3. Compensation for assimilation in perception is moderated by prosodic structure and lexical constraints.peer-reviewe

    Compensation for assimilatory devoicing and prosodic structure in German fricative perception

    Get PDF
    An important source of phonetic variation in German fricatives is progressive voice assimilation: the lenis fricatives /v/ and /z/ are devoiced after /t/ across word boundaries. This process is gradient and moderated by prosodic structure: fricatives are more devoiced after smaller prosodic boundaries. We present three phoneme identification experiments, investigating how listeners deal with assimilatory devoicing and its prosodic conditioning. Fully voiced, partially devoiced and completely devoiced fricatives had to be identified as fortis or lenis in different segmental (assimilation versus non-assimilation context) and prosodic (after a word versus a phrase boundary) environments. Results indicate that 1. listeners compensate for assimilatory devoicing in judging partially devoiced fricatives more often as lenis in assimilation context than in non-assimilation context; 2. prosodic structure plays a role in compensation for assimilation: more devoiced fricatives are more often judged as lenis after word boundaries than after phrase boundaries in assimilation context, and 3. the influence of prosody is constrained by lexical effects: we found prosodic conditioning of compensation for the devoicing of /v/, contrasting with /f/, but not of /z/. These findings suggest that an on-line prosodic analysis of spoken language contributes to the resolution of lexical ambiguity arising from progressive voice assimilation.This research was supported by a grant from the Max-Planck- Gesellschaft zur Förderung der Wissenschaften, München, Germany.peer-reviewe

    Perceptual compensation for voice assimilation of German fricatives

    Get PDF
    This research was supported by the Max Planck Gesellschaft zur Förderung der Wissenschaften. We thank Jonathan Harrington and Ernst Dombrowski from the IPDS Kiel for providing facilities for data collection, and Petra van Alphen, Mirjam Broersma and James McQueen for helpful comments.In German, word-initial lax fricatives may be produced with substantially reduced glottal vibration after voiceless obstruents. This assimilation occurs more frequently and to a larger extent across prosodic word boundaries than across phrase boundaries. Assimilatory devoicing makes the fricatives more similar to their tense counterparts and could thus hinder word recognition. The present study investigates how listeners cope with assimilatory devoicing. Results of a cross-modal priming experiment indicate that listeners compensate for assimilation in appropriate contexts. Prosodic structure moderates compensation for assimilation: Compensation occurs especially after phrase boundaries, where devoiced fricatives are sufficiently long to be confused with their tense counterparts.peer-reviewe

    Speech register influences listeners’ word expectations

    Get PDF
    We utilized the N400 effect to investigate the influence of speech register on predictive language processing. Participants listened to long stretches (4 – 15 min) of naturalistic speech from different registers (dialogues, news broadcasts, and read-aloud books), totalling approximately 50,000 words, while the EEG signal was recorded. We estimated the surprisal of words in the speech materials with the aid of a statistical language model in such a manner that it reflected different predictive processing strategies; generic, register-specific, or recency-based. The N400 amplitude was best predicted with register-specific word surprisal, indicating that the statistics of the wider context (i.e., register) influences predictive language processing. Furthermore, adaptation to speech register cannot merely be explained by recency effects; instead, listeners adapt their word anticipations to the presented speech register

    Formant transitions in fricative identification: The role of native fricative inventory

    Get PDF
    The distribution of energy across the noise spectrum provides the primary cues for the identification of a fricative. Formant transitions have been reported to play a role in identification of some fricatives, but the combined results so far are conflicting. We report five experiments testing the hypothesis that listeners differ in their use of formant transitions as a function of the presence of spectrally similar fricatives in their native language. Dutch, English, German, Polish, and Spanish native listeners performed phoneme monitoring experiments with pseudowords containing either coherent or misleading formant transitions for the fricatives / s / and / f /. Listeners of German and Dutch, both languages without spectrally similar fricatives, were not affected by the misleading formant transitions. Listeners of the remaining languages were misled by incorrect formant transitions. In an untimed labeling experiment both Dutch and Spanish listeners provided goodness ratings that revealed sensitivity to the acoustic manipulation. We conclude that all listeners may be sensitive to mismatching information at a low auditory level, but that they do not necessarily take full advantage of all available systematic acoustic variation when identifying phonemes. Formant transitions may be most useful for listeners of languages with spectrally similar fricatives
    • …
    corecore