77 research outputs found
How acoustically reduced forms activate the lexicon : evidence from eye-tracking
Most research on spoken word comprehension has focused on carefully articulated speech that is read aloud
by selected speakers (Cutler, 1998). But the type of speech we most often encounter is spontaneous speech,
in which no attention is paid to careful pronunciation. The production of a word shorter than its citation form
is called reduction, which is highly frequent in casual speech (Ernestus, 2000; Johnson, 2004). The challenge
for models of word comprehension is to explain how listeners recognize reduced forms such as [pjutÇťr]
which deviate drastically from their canonical counterpart [kɔmpjutǝr] 'computer'.peer-reviewe
Seeing the advantage: visually grounding word embeddings to better capture human semantic knowledge
Distributional semantic models capture word-level meaning that is useful in
many natural language processing tasks and have even been shown to capture
cognitive aspects of word meaning. The majority of these models are purely text
based, even though the human sensory experience is much richer. In this paper
we create visually grounded word embeddings by combining English text and
images and compare them to popular text-based methods, to see if visual
information allows our model to better capture cognitive aspects of word
meaning. Our analysis shows that visually grounded embedding similarities are
more predictive of the human reaction times in a large priming experiment than
the purely text-based embeddings. The visually grounded embeddings also
correlate well with human word similarity ratings. Importantly, in both
experiments we show that the grounded embeddings account for a unique portion
of explained variance, even when we include text-based embeddings trained on
huge corpora. This shows that visual grounding allows our model to capture
information that cannot be extracted using text as the only source of
information
Semantic sentence similarity: size does not always matter
This study addresses the question whether visually grounded speech
recognition (VGS) models learn to capture sentence semantics without access to
any prior linguistic knowledge. We produce synthetic and natural spoken
versions of a well known semantic textual similarity database and show that our
VGS model produces embeddings that correlate well with human semantic
similarity judgements. Our results show that a model trained on a small
image-caption database outperforms two models trained on much larger databases,
indicating that database size is not all that matters. We also investigate the
importance of having multiple captions per image and find that this is indeed
helpful even if the total number of images is lower, suggesting that
paraphrasing is a valuable learning signal. While the general trend in the
field is to create ever larger datasets to train models on, our findings
indicate other characteristics of the database can just as important important.Comment: This paper has been accepted at Interspeech 2021 where it will be
presented and appear in the conference proceedings in September 202
Probabilistic effects on French [t] duration
Abstract The present study shows that [t] consonants are affected by probabilistic factors in a syllable-timed language as French, and in spontaneous as well as in journalistic speech. Study 1 showed a word bigram frequency effect in spontaneous French, but its exact nature depended on the corpus on which the probabilistic measures were based. Study 2 investigated journalistic speech and showed an effect of the joint frequency of the test word and its following word. We discuss the possibility that these probabilistic effects are due to the speaker's planning of upcoming words, and to the speaker's adaptation to the listener's needs
Prosodic structure affects the production and perception of voice-assimilated German fricatives
Prosodic structure has long been known to constrain
phonological processes [1]. More recently, it has also been
recognized as a source of fine-grained phonetic variation of
speech sounds. In particular, segments in domain-initial
position undergo prosodic strengthening [2, 3], which also
implies more resistance to coarticulation in higher prosodic
domains [5]. The present study investigates the combined
effects of prosodic strengthening and assimilatory devoicing
on word-initial fricatives in German, the functional
implication of both processes for cues to the fortis-lenis
contrast, and the influence of prosodic structure on listeners’
compensation for assimilation. Results indicate that 1.
Prosodic structure modulates duration and the degree of
assimilatory devoicing, 2. Phonological contrasts are
maintained by speakers, but differ in phonetic detail across
prosodic domains, and 3. Compensation for assimilation in
perception is moderated by prosodic structure and lexical
constraints.peer-reviewe
Compensation for assimilatory devoicing and prosodic structure in German fricative perception
An important source of phonetic variation in German fricatives is progressive voice
assimilation: the lenis fricatives /v/ and /z/ are devoiced after /t/ across word
boundaries. This process is gradient and moderated by prosodic structure: fricatives
are more devoiced after smaller prosodic boundaries.
We present three phoneme identification experiments, investigating how
listeners deal with assimilatory devoicing and its prosodic conditioning. Fully voiced,
partially devoiced and completely devoiced fricatives had to be identified as fortis or
lenis in different segmental (assimilation versus non-assimilation context) and
prosodic (after a word versus a phrase boundary) environments. Results indicate that
1. listeners compensate for assimilatory devoicing in judging partially devoiced
fricatives more often as lenis in assimilation context than in non-assimilation context;
2. prosodic structure plays a role in compensation for assimilation: more devoiced
fricatives are more often judged as lenis after word boundaries than after phrase
boundaries in assimilation context, and 3. the influence of prosody is constrained by
lexical effects: we found prosodic conditioning of compensation for the devoicing of
/v/, contrasting with /f/, but not of /z/. These findings suggest that an on-line prosodic
analysis of spoken language contributes to the resolution of lexical ambiguity arising
from progressive voice assimilation.This research was supported by a grant from the Max-Planck-
Gesellschaft zur Förderung der Wissenschaften, München, Germany.peer-reviewe
Perceptual compensation for voice assimilation of German fricatives
This research was supported by the Max Planck Gesellschaft zur Förderung der Wissenschaften. We thank Jonathan Harrington and Ernst Dombrowski from the IPDS Kiel for providing facilities for data collection, and Petra van Alphen, Mirjam Broersma and James McQueen for helpful comments.In German, word-initial lax fricatives may be produced with substantially reduced glottal vibration after voiceless obstruents. This assimilation occurs more frequently and to a larger extent across prosodic word boundaries than across phrase boundaries. Assimilatory devoicing makes the fricatives more similar to their tense counterparts and could thus hinder word recognition. The present study investigates how listeners cope with assimilatory devoicing. Results of a cross-modal priming experiment indicate that listeners compensate for assimilation in appropriate contexts. Prosodic structure moderates compensation for assimilation: Compensation occurs especially after phrase boundaries, where devoiced fricatives are sufficiently long to be confused with their tense counterparts.peer-reviewe
Speech register influences listeners’ word expectations
We utilized the N400 effect to investigate the influence of speech register on predictive language processing. Participants listened to long stretches (4 – 15 min) of naturalistic speech from different registers (dialogues, news broadcasts, and read-aloud books), totalling approximately 50,000 words, while the EEG signal was recorded. We estimated the surprisal of words in the speech materials with the aid of a statistical language model in such a manner that it reflected different predictive processing strategies; generic, register-specific, or recency-based. The N400 amplitude was best predicted with register-specific word surprisal, indicating that the statistics of the wider context (i.e., register) influences predictive language processing. Furthermore, adaptation to speech register cannot merely be explained by recency effects; instead, listeners adapt their word anticipations to the presented speech register
Formant transitions in fricative identification: The role of native fricative inventory
The distribution of energy across the noise spectrum provides the primary cues for the identification of a fricative. Formant transitions have been reported to play a role in identification of some fricatives, but the combined results so far are conflicting. We report five experiments testing the hypothesis that listeners differ in their use of formant transitions as a function of the presence of spectrally similar fricatives in their native language. Dutch, English, German, Polish, and Spanish native listeners performed phoneme monitoring experiments with pseudowords containing either coherent or misleading formant transitions for the fricatives / s / and / f /. Listeners of German and Dutch, both languages without spectrally similar fricatives, were not affected by the misleading formant transitions. Listeners of the remaining languages were misled by incorrect formant transitions. In an untimed labeling experiment both Dutch and Spanish listeners provided goodness ratings that revealed sensitivity to the acoustic manipulation. We conclude that all listeners may be sensitive to mismatching information at a low auditory level, but that they do not necessarily take full advantage of all available systematic acoustic variation when identifying phonemes. Formant transitions may be most useful for listeners of languages with spectrally similar fricatives
- …