6,389 research outputs found

    Speech rhythms and multiplexed oscillatory sensory coding in the human brain

    Get PDF
    Cortical oscillations are likely candidates for segmentation and coding of continuous speech. Here, we monitored continuous speech processing with magnetoencephalography (MEG) to unravel the principles of speech segmentation and coding. We demonstrate that speech entrains the phase of low-frequency (delta, theta) and the amplitude of high-frequency (gamma) oscillations in the auditory cortex. Phase entrainment is stronger in the right and amplitude entrainment is stronger in the left auditory cortex. Furthermore, edges in the speech envelope phase reset auditory cortex oscillations thereby enhancing their entrainment to speech. This mechanism adapts to the changing physical features of the speech envelope and enables efficient, stimulus-specific speech sampling. Finally, we show that within the auditory cortex, coupling between delta, theta, and gamma oscillations increases following speech edges. Importantly, all couplings (i.e., brain-speech and also within the cortex) attenuate for backward-presented speech, suggesting top-down control. We conclude that segmentation and coding of speech relies on a nested hierarchy of entrained cortical oscillations

    Computationally Efficient and Robust BIC-Based Speaker Segmentation

    Get PDF
    An algorithm for automatic speaker segmentation based on the Bayesian information criterion (BIC) is presented. BIC tests are not performed for every window shift, as previously, but when a speaker change is most probable to occur. This is done by estimating the next probable change point thanks to a model of utterance durations. It is found that the inverse Gaussian fits best the distribution of utterance durations. As a result, less BIC tests are needed, making the proposed system less computationally demanding in time and memory, and considerably more efficient with respect to missed speaker change points. A feature selection algorithm based on branch and bound search strategy is applied in order to identify the most efficient features for speaker segmentation. Furthermore, a new theoretical formulation of BIC is derived by applying centering and simultaneous diagonalization. This formulation is considerably more computationally efficient than the standard BIC, when the covariance matrices are estimated by other estimators than the usual maximum-likelihood ones. Two commonly used pairs of figures of merit are employed and their relationship is established. Computational efficiency is achieved through the speaker utterance modeling, whereas robustness is achieved by feature selection and application of BIC tests at appropriately selected time instants. Experimental results indicate that the proposed modifications yield a superior performance compared to existing approaches

    Constrained structure of ancient Chinese poetry facilitates speech content grouping

    No full text
    Ancient Chinese poetry is constituted by structured language that deviates from ordinary language usage [1, 2]; its poetic genres impose unique combinatory constraints on linguistic elements [3]. How does the constrained poetic structure facilitate speech segmentation when common linguistic [4, 5, 6, 7, 8] and statistical cues [5, 9] are unreliable to listeners in poems? We generated artificial Jueju, which arguably has the most constrained structure in ancient Chinese poetry, and presented each poem twice as an isochronous sequence of syllables to native Mandarin speakers while conducting magnetoencephalography (MEG) recording. We found that listeners deployed their prior knowledge of Jueju to build the line structure and to establish the conceptual flow of Jueju. Unprecedentedly, we found a phase precession phenomenon indicating predictive processes of speech segmentation—the neural phase advanced faster after listeners acquired knowledge of incoming speech. The statistical co-occurrence of monosyllabic words in Jueju negatively correlated with speech segmentation, which provides an alternative perspective on how statistical cues facilitate speech segmentation. Our findings suggest that constrained poetic structures serve as a temporal map for listeners to group speech contents and to predict incoming speech signals. Listeners can parse speech streams by using not only grammatical and statistical cues but also their prior knowledge of the form of language

    Irregular speech rate dissociates auditory cortical entrainment, evoked responses, and frontal alpha

    Get PDF
    The entrainment of slow rhythmic auditory cortical activity to the temporal regularities in speech is considered to be a central mechanism underlying auditory perception. Previous work has shown that entrainment is reduced when the quality of the acoustic input is degraded, but has also linked rhythmic activity at similar time scales to the encoding of temporal expectations. To understand these bottom-up and top-down contributions to rhythmic entrainment, we manipulated the temporal predictive structure of speech by parametrically altering the distribution of pauses between syllables or words, thereby rendering the local speech rate irregular while preserving intelligibility and the envelope fluctuations of the acoustic signal. Recording EEG activity in human participants, we found that this manipulation did not alter neural processes reflecting the encoding of individual sound transients, such as evoked potentials. However, the manipulation significantly reduced the fidelity of auditory delta (but not theta) band entrainment to the speech envelope. It also reduced left frontal alpha power and this alpha reduction was predictive of the reduced delta entrainment across participants. Our results show that rhythmic auditory entrainment in delta and theta bands reflect functionally distinct processes. Furthermore, they reveal that delta entrainment is under top-down control and likely reflects prefrontal processes that are sensitive to acoustical regularities rather than the bottom-up encoding of acoustic features

    Atypical MEG inter-subject correlation during listening to continuous natural speech in dyslexia

    Get PDF
    Listening to speech elicits brain activity time-locked to the speech sounds. This so-called neural entrainment to speech was found to be atypical in dyslexia, a reading impairment associated with neural speech processing deficits. We hypothesized that the brain responses of dyslexic vs. normal readers to real-life speech would be different, and thus the strength of inter-subject correlation (ISC) would differ from that of typical readers and be reflected in reading-related measures. We recorded magnetoencephalograms (MEG) of 23 dyslexic and 21 typically-reading adults during listening to ∼10 min of natural Finnish speech consisting of excerpts from radio news, a podcast, a self-recorded audiobook chapter and small talk. The amplitude envelopes of band-pass-filtered MEG source signals were correlated between subjects in a cortically-constrained source space in six frequency bands. The resulting ISCs of dyslexic and typical readers were compared with a permutation-based t-test. Neuropsychological measures of phonological processing, technical reading, and working memory were correlated with the ISCs utilizing the Mantel test. During listening to speech, ISCs were mainly reduced in dyslexic compared to typical readers in delta (0.5–4 Hz) and high gamma (55–90 Hz) frequency bands. In the theta (4−8 Hz), beta (12–25 Hz), and low gamma (25−45 Hz) bands, dyslexics had enhanced ISC to speech compared to controls. Furthermore, we found that ISCs across both groups were associated with phonological processing, technical reading, and working memory. The atypical ISC to natural speech in dyslexics supports the temporal sampling deficit theory of dyslexia. It also suggests over-synchronization to phoneme-rate information in speech, which could indicate more effort-demanding sampling of phonemes from speech in dyslexia. These irregularities in parsing speech are likely some of the complex neural factors contributing to dyslexia. The associations between neural coupling and reading-related skills further support this notion.Peer reviewe

    Reading acquisition: from digital screening to neurocognitive bases in a transparent orthography

    Get PDF
    155 p.El aprendizaje de la lectura es un área activa de investigación en la psicología y la neurociencia cognitiva. En las últimas décadas se ha avanzado enormemente en la comprensión de los procesos neurocognitivos subyacentes al aprendizaje de la lectura y a sus dificultades. Sin embargo, existen al menos dos dimensiones en las que es necesario seguir trabajando arduamente. Por un lado, el conocimiento actual sobre el aprendizaje de la lectura no ha impactado en las prácticas educativas. Por otro lado, la diversidad de las características del aprendizaje de la lectura en distintas ortografías no se comprende cabalmente. La presente tesis se enfoca en el estudio del aprendizaje de la lectura combinando estrategias de identificación oportuna de niños en riesgo lector en el contexto escolar, y estudios de laboratorio enfocados en comprender las bases neurocognitivas del aprendizaje de la lectura en una ortografía transparente como el español. Estos objetivos se lograron a través de un diseño longitudinal comenzando desde la educación inicial, siguiendo a un mismo grupo de aproximadamente 600 niños hasta segundo año de escuela. Los resultados muestran, por una parte, que es factible identificar a niños en riesgo lector incluso antes de la educación primaria, y, por otra parte, que el aprendizaje de la lectura en una ortografía transparente como el español tiene características comunes y características distintivas respecto a ortografías opacas. Estos resultados ponen en evidencia la factibilidad de la identificación oportuna de riesgo lector, y remarcan la importancia de considerar las características de la ortografía durante el aprendizaje de la lectura

    Deep Neural Networks for the Recognition and Classification of Heart Murmurs Using Neuromorphic Auditory Sensors

    Get PDF
    Auscultation is one of the most used techniques for detecting cardiovascular diseases, which is one of the main causes of death in the world. Heart murmurs are the most common abnormal finding when a patient visits the physician for auscultation. These heart sounds can either be innocent, which are harmless, or abnormal, which may be a sign of a more serious heart condition. However, the accuracy rate of primary care physicians and expert cardiologists when auscultating is not good enough to avoid most of both type-I (healthy patients are sent for echocardiogram) and type-II (pathological patients are sent home without medication or treatment) errors made. In this paper, the authors present a novel convolutional neural network based tool for classifying between healthy people and pathological patients using a neuromorphic auditory sensor for FPGA that is able to decompose the audio into frequency bands in real time. For this purpose, different networks have been trained with the heart murmur information contained in heart sound recordings obtained from nine different heart sound databases sourced from multiple research groups. These samples are segmented and preprocessed using the neuromorphic auditory sensor to decompose their audio information into frequency bands and, after that, sonogram images with the same size are generated. These images have been used to train and test different convolutional neural network architectures. The best results have been obtained with a modified version of the AlexNet model, achieving 97% accuracy (specificity: 95.12%, sensitivity: 93.20%, PhysioNet/CinC Challenge 2016 score: 0.9416). This tool could aid cardiologists and primary care physicians in the auscultation process, improving the decision making task and reducing type-I and type-II errors.Ministerio de Economía y Competitividad TEC2016-77785-
    corecore