1,110 research outputs found

    Perceptually relevant speech tracking in auditory and motor cortex reflects distinct linguistic features

    Get PDF
    During online speech processing, our brain tracks the acoustic fluctuations in speech at different timescales. Previous research has focused on generic timescales (for example, delta or theta bands) that are assumed to map onto linguistic features such as prosody or syllables. However, given the high intersubject variability in speaking patterns, such a generic association between the timescales of brain activity and speech properties can be ambiguous. Here, we analyse speech tracking in source-localised magnetoencephalographic data by directly focusing on timescales extracted from statistical regularities in our speech material. This revealed widespread significant tracking at the timescales of phrases (0.6–1.3 Hz), words (1.8–3 Hz), syllables (2.8–4.8 Hz), and phonemes (8–12.4 Hz). Importantly, when examining its perceptual relevance, we found stronger tracking for correctly comprehended trials in the left premotor (PM) cortex at the phrasal scale as well as in left middle temporal cortex at the word scale. Control analyses using generic bands confirmed that these effects were specific to the speech regularities in our stimuli. Furthermore, we found that the phase at the phrasal timescale coupled to power at beta frequency (13–30 Hz) in motor areas. This cross-frequency coupling presumably reflects top-down temporal prediction in ongoing speech perception. Together, our results reveal specific functional and perceptually relevant roles of distinct tracking and cross-frequency processes along the auditory–motor pathway

    On segments and syllables in the sound structure of language: Curve-based approaches to phonology and the auditory representation of speech.

    Get PDF
    http://msh.revues.org/document7813.htmlInternational audienceRecent approaches to the syllable reintroduce continuous and mathematical descriptions of sound objects designed as ''curves''. Psycholinguistic research on oral language perception usually refer to symbolic and highly hierarchized approaches to the syllable which strongly differenciate segments (phones) and syllables. Recent work on the auditory bases of speech perception evidence the ability of listeners to extract phonetic information when strong degradations of the speech signal have been produced in the spectro-temporal domain. Implications of these observations for the modelling of syllables in the fields of speech perception and phonology are discussed.Les approches récentes de la syllabe réintroduisent une description continue et descriptible mathématiquement des objets sonores: les courbes. Les recherches psycholinguistiques sur la perception du langage parlé ont plutôt recours à des descriptions symboliques et hautement hiérarchisées de la syllabe dans le cadre desquelles segments (phones) et syllabes sont strictement différenciés. Des travaux récents sur les fondements auditifs de la perception de la parole mettent en évidence la capacité qu'ont les locuteurs à extraire une information phonétique alors même que des dégradations majeures du signal sont effectuées dans le domaine spectro-temporel. Les implications de ces observations pour la conception de la syllabe dans le champ de la perception de la parole et en phonologie sont discutées

    Local Temporal Regularities in Child-Directed Speech in Spanish

    Get PDF
    Published online: Oct 4, 2022Purpose: The purpose of this study is to characterize the local (utterance-level) temporal regularities of child-directed speech (CDS) that might facilitate phonological development in Spanish, classically termed a syllable-timed language. Method: Eighteen female adults addressed their 4-year-old children versus other adults spontaneously and also read aloud (CDS vs. adult-directed speech [ADS]). We compared CDS and ADS speech productions using a spectrotemporal model (Leong & Goswami, 2015), obtaining three temporal metrics: (a) distribution of modulation energy, (b) temporal regularity of stressed syllables, and (c) syllable rate. Results: CDS was characterized by (a) significantly greater modulation energy in the lower frequencies (0.5–4 Hz), (b) more regular rhythmic occurrence of stressed syllables, and (c) a slower syllable rate than ADS, across both spontaneous and read conditions. Discussion: CDS is characterized by a robust local temporal organization (i.e., within utterances) with amplitude modulation bands aligning with delta and theta electrophysiological frequency bands, respectively, showing greater phase synchronization than in ADS, facilitating parsing of stress units and syllables. These temporal regularities, together with the slower rate of production of CDS, might support the automatic extraction of phonological units in speech and hence support the phonological development of children. Supplemental Material: https://doi.org/10.23641/asha.21210893This study was supported by the Formación de Personal Investigado Grant BES-2016-078125 by Ministerio Español de Economía, Industria y Competitividad and Fondo Social Europeo awarded to Jose Pérez-Navarro; through Project RTI2018-096242-B-I00 (Ministerio de Ciencia, Innovación y Universidades [MCIU]/Agencia Estatal de Investigación [AEI]/Fondo Europeo de Desarrollo Regional [FEDER], Unión Europea) funded by MCIU, the AEI, and FEDER awarded to Marie Lallier; by the Basque Government through the Basque Excellence Research Centre 2018-2021 Program; and by the Spanish State Research Agency through Basque Center on Cognition, Brain and Language Severo Ochoa Excellence Accreditation SEV- 2015-0490. We want to thank the participants and their children for their volunteer contribution to our study

    Phonological complexity, segment rate and speech tempo perception

    Get PDF
    Studies of speech tempo commonly use syllable or segment rate as a proxy measure for perceived tempo. In languages whose phonologies allow substantial syllable complexity these measures can produce figures on quite different scales; however, little is known about the correlation between syllable and segment rate measurements on the one hand and naïve listeners’ tempo judgements on the other. We follow up on the findings of one relevant study on German [1], which suggest that listeners attend to both syllable and segment rates in making tempo estimates, through a weighted average of the rates in which syllable rate carries more weight. We report on an experiment in which we manipulate phonological complexity in English utterance pairs that are constant in syllable rate. Listeners decide for each pair which utterance sounds faster. Our results suggest that differences in segment rate that do not correspond to differences in syllable rate have little impact on perceived speech tempo in English

    Listeners’ sensitivity to syllable complexity in spontaneous speech tempo perception

    Get PDF
    Studies of speech tempo commonly use syllable or segment rate as a proxy measure for perceived tempo. While listeners’ sensitivity to syllable rate is well-established [1-4], evidence for listeners’ additional sensitivity to segment rate--that is, to syllable complexity alongside syllable rate--is as yet lacking. In [5, 6] we reported experiments that yielded no evidence for listeners’ orientation to segment rate differences between stimuli that have the same syllable rate. In these experiments, we kept syllable rate constant by equalizing phrase durations. As phrase duration is a separate temporal parameter from syllable rate, we must complement this work with experiments using less homogeneous stimulus sets. In this paper we report on an experiment that uses stimuli selected from a corpus of spontaneous British English speech. Within crucial subsets there was minimal variation in one out of syllable and segment rate, and substantial variation in the other. Stimulus duration varied independently. Listeners ranked stimuli for perceived tempo. Results suggest that faced with these more variable stimuli, listeners do orient to segment rate in ranking stimuli that have near-identical syllable rates--presumably reflecting the influence of syllable complexity. Moreover, stimulus duration emerges as a separate factor influencing listeners’ rankings, alongside f0 and intensity

    Rhythmic unit extraction and modelling for automatic language identification

    Get PDF
    International audienceThis paper deals with an approach to Automatic Language Identification based on rhythmic modelling. Beside phonetics and phonotactics, rhythm is actually one of the most promising features to be considered for language identification, even if its extraction and modelling are not a straightforward issue. Actually, one of the main problems to address is what to model. In this paper, an algorithm of rhythm extraction is described: using a vowel detection algorithm, rhythmic units related to syllables are segmented. Several parameters are extracted (consonantal and vowel duration, cluster complexity) and modelled with a Gaussian Mixture. Experiments are performed on read speech for 7 languages (English, French, German, Italian, Japanese, Mandarin and Spanish) and results reach up to 86 ± 6% of correct discrimination between stress-timed mora-timed and syllable-timed classes of languages, and to 67 ± 8% percent of correct language identification on average for the 7 languages with utterances of 21 seconds. These results are commented and compared with those obtained with a standard acoustic Gaussian mixture modelling approach (88 ± 5% of correct identification for the 7-languages identification task)

    Does training with amplitude modulated tones affect tone-vocoded speech perception?

    Get PDF
    Temporal-envelope cues are essential for successful speech perception. We asked here whether training on stimuli containing temporal-envelope cues without speech content can improve the perception of spectrally-degraded (vocoded) speech in which the temporal-envelope (but not the temporal fine structure) is mainly preserved. Two groups of listeners were trained on different amplitude-modulation (AM) based tasks, either AM detection or AM-rate discrimination (21 blocks of 60 trials during two days, 1260 trials; frequency range: 4Hz, 8Hz, and 16Hz), while an additional control group did not undertake any training. Consonant identification in vocoded vowel-consonant-vowel stimuli was tested before and after training on the AM tasks (or at an equivalent time interval for the control group). Following training, only the trained groups showed a significant improvement in the perception of vocoded speech, but the improvement did not significantly differ from that observed for controls. Thus, we do not find convincing evidence that this amount of training with temporal-envelope cues without speech content provide significant benefit for vocoded speech intelligibility. Alternative training regimens using vocoded speech along the linguistic hierarchy should be explored

    Entraining the Brain: Applications to Language Research and Links to Musical Entrainment

    Get PDF
    Clayton’s paper provides a clear and accessible summary of the significance of entrainment for music making, and for human behaviour in general. He notes the central role of metrical structure in musical entrainment, the possible role of oscillatory neural activity, and the core notion of phase alignment. Here I show how these same factors are central to speech processing by the human brain. I argue that entrainment to metrical structure is core to linguistic as well as musical human behaviour. I illustrate this view using entrainment data from developmental dyslexia. The core role of entrainment in efficient speech processing suggests that language difficulties in childhood may benefit from music-based remediation that focuses on multi-modal rhythmic entrainment. Alignment of linguistic and musical metrical structure seems likely to be fundamental to successful remediation

    The Evolution of Rhythm Processing

    Get PDF
    Behavioral and brain rhythms in the millisecond-to-second range are central in human music, speech, and movement. A comparative approach can further our understanding of the evolution of rhythm processing by identifying behavioral and neural similarities and differences across cognitive domains and across animal species. We provide an overview of research into rhythm cognition in music, speech, and animal communication. Rhythm has received considerable attention within each individual field, but to date, little integration. This review article on rhythm processing incorporates and extends existing ideas on temporal processing in speech and music and offers suggestions about the neural, biological, and evolutionary bases of human abilities in these domains

    The effect of literacy in the speech temporal modulation structure

    Get PDF
    The temporal modulation structure of adult-directed speech is conceptualised as a modulation hierarchy comprising four temporal bands, delta, 1 – 3 Hz, theta, 4 – 8 Hz, beta, 15 – 30 Hz, and low gamma, 30 – 50 Hz. Neuronal oscillatory entrainment to amplitude modulations (AMs) in these four bands may provide a basis for speech encoding and parsing the continuous signal into linguistic units (delta – syllable stress patterns, theta – syllables, beta – onset-rime units, low gamma – phonetic information). While adult-directed speech is theta-dominant and shows tighter theta-beta/low gamma phase alignment, infant-directed speech is delta-dominant and shows tighter delta-theta phase alignment. Although this change in the speech representations could be maturational, it was hypothesized that literacy may also influence the structure of speech. In fact, literacy and schooling are known to change auditory speech entrainment, enhancing phonemic specification and augmenting the phonological detail of the lexicon’s representations. Thus, we hypothesized that a corresponding difference in speech production could also emerge. In this work, spontaneous speech samples were recorded from literate (with lower and higher literacy) and illiterate subjects and their energy modulation spectrum across delta, theta and beta/low gamma AMs as well as the phase synchronization between nested AMs analysed. Measures of the participants’ phonology skills and vocabulary were also retrieved and a specific task to confirm the sensitivity to speech rhythm of the analysis method used (S-AMPH) was conducted. Results showed no differences in the energy of delta, theta and beta/low gamma AMs in spontaneous speech. However, phase alignment between slower and faster speech AMs was significantly enhanced by literacy, showing moderately strong correlations with the phonology measures and literacy. Our data suggest that literacy affects not only cortical entrainment and speech perception but also the physical/rhythmic properties of speech production.A modulação temporal do discurso dirigido a adultos é conceptualizado como uma hierarquia de modulações em quatro bandas temporais: delta, 1 – 3 Hz, theta, 4 – 8 Hz, beta, 15 – 30 Hz, e low gamma, 30 – 50 Hz. A sincronização das oscilações neuronais nestas quatro bandas pode providenciar a base para a codificação e análise de um sinal contínuo em unidades linguísticas (delta – força silábica, theta – sílabas, beta – arranque/rima, low gamma – informação fonética). Enquanto o discurso dirigido a adultos é de um ritmo predominantemente theta e mostra um forte alinhamento entre bandas theta e beta/low gamma, discurso dirigido a crianças é predominantemente de um ritmo delta e mostra maiores sincronizações entre bandas delta e theta. Apesar das diferenças nas representações do discurso poderem resultar de processos maturacionais, foi hipotetizado que a literacia também poderia influenciar as características rítmicas do discurso. De facto, a literacia afecta o processamento auditivo da linguagem, além de desenvolver a consciência fonémica e aumentar o detalhe fonológico das representações lexicais. Neste estudo foram gravadas amostras de discurso espontâneo de sujeitos letrados (alta e baixa escolarização) e iletrados. Os espectros de modulação de energia nas bandas de interesse foram analisados bem como a sincronização das bandas delta-theta e theta-beta/ low gamma. Foram recolhidas medidas de consciência fonológica e vocabulário e foi realizada também uma tarefa para confirmar a sensibilidade do modelo de análise (S-AMPH) ao ritmo do discurso. A análise revelou ausência de diferenças na energia nas modulações delta, theta ou beta/low gamma no discurso espontâneo. Contudo, a sincronização entre as bandas aumentou significativamente com a literacia, revelando uma correlação moderada com as medidas de fonologia, vocabulário e literacia. Sendo assim, a literacia afecta não só a sincronização cortical e à linguagem falada mas também as propriedades físicas e rítmicas da produção do discurso
    corecore