1,230 research outputs found

    Does training with amplitude modulated tones affect tone-vocoded speech perception?

    Get PDF
    Temporal-envelope cues are essential for successful speech perception. We asked here whether training on stimuli containing temporal-envelope cues without speech content can improve the perception of spectrally-degraded (vocoded) speech in which the temporal-envelope (but not the temporal fine structure) is mainly preserved. Two groups of listeners were trained on different amplitude-modulation (AM) based tasks, either AM detection or AM-rate discrimination (21 blocks of 60 trials during two days, 1260 trials; frequency range: 4Hz, 8Hz, and 16Hz), while an additional control group did not undertake any training. Consonant identification in vocoded vowel-consonant-vowel stimuli was tested before and after training on the AM tasks (or at an equivalent time interval for the control group). Following training, only the trained groups showed a significant improvement in the perception of vocoded speech, but the improvement did not significantly differ from that observed for controls. Thus, we do not find convincing evidence that this amount of training with temporal-envelope cues without speech content provide significant benefit for vocoded speech intelligibility. Alternative training regimens using vocoded speech along the linguistic hierarchy should be explored

    Lip2AudSpec: Speech reconstruction from silent lip movements video

    Full text link
    In this study, we propose a deep neural network for reconstructing intelligible speech from silent lip movement videos. We use auditory spectrogram as spectral representation of speech and its corresponding sound generation method resulting in a more natural sounding reconstructed speech. Our proposed network consists of an autoencoder to extract bottleneck features from the auditory spectrogram which is then used as target to our main lip reading network comprising of CNN, LSTM and fully connected layers. Our experiments show that the autoencoder is able to reconstruct the original auditory spectrogram with a 98% correlation and also improves the quality of reconstructed speech from the main lip reading network. Our model, trained jointly on different speakers is able to extract individual speaker characteristics and gives promising results of reconstructing intelligible speech with superior word recognition accuracy

    Speech dynamics

    Get PDF

    The effect of literacy in the speech temporal modulation structure

    Get PDF
    The temporal modulation structure of adult-directed speech is conceptualised as a modulation hierarchy comprising four temporal bands, delta, 1 – 3 Hz, theta, 4 – 8 Hz, beta, 15 – 30 Hz, and low gamma, 30 – 50 Hz. Neuronal oscillatory entrainment to amplitude modulations (AMs) in these four bands may provide a basis for speech encoding and parsing the continuous signal into linguistic units (delta – syllable stress patterns, theta – syllables, beta – onset-rime units, low gamma – phonetic information). While adult-directed speech is theta-dominant and shows tighter theta-beta/low gamma phase alignment, infant-directed speech is delta-dominant and shows tighter delta-theta phase alignment. Although this change in the speech representations could be maturational, it was hypothesized that literacy may also influence the structure of speech. In fact, literacy and schooling are known to change auditory speech entrainment, enhancing phonemic specification and augmenting the phonological detail of the lexicon’s representations. Thus, we hypothesized that a corresponding difference in speech production could also emerge. In this work, spontaneous speech samples were recorded from literate (with lower and higher literacy) and illiterate subjects and their energy modulation spectrum across delta, theta and beta/low gamma AMs as well as the phase synchronization between nested AMs analysed. Measures of the participants’ phonology skills and vocabulary were also retrieved and a specific task to confirm the sensitivity to speech rhythm of the analysis method used (S-AMPH) was conducted. Results showed no differences in the energy of delta, theta and beta/low gamma AMs in spontaneous speech. However, phase alignment between slower and faster speech AMs was significantly enhanced by literacy, showing moderately strong correlations with the phonology measures and literacy. Our data suggest that literacy affects not only cortical entrainment and speech perception but also the physical/rhythmic properties of speech production.A modulação temporal do discurso dirigido a adultos é conceptualizado como uma hierarquia de modulações em quatro bandas temporais: delta, 1 – 3 Hz, theta, 4 – 8 Hz, beta, 15 – 30 Hz, e low gamma, 30 – 50 Hz. A sincronização das oscilações neuronais nestas quatro bandas pode providenciar a base para a codificação e análise de um sinal contínuo em unidades linguísticas (delta – força silábica, theta – sílabas, beta – arranque/rima, low gamma – informação fonética). Enquanto o discurso dirigido a adultos é de um ritmo predominantemente theta e mostra um forte alinhamento entre bandas theta e beta/low gamma, discurso dirigido a crianças é predominantemente de um ritmo delta e mostra maiores sincronizações entre bandas delta e theta. Apesar das diferenças nas representações do discurso poderem resultar de processos maturacionais, foi hipotetizado que a literacia também poderia influenciar as características rítmicas do discurso. De facto, a literacia afecta o processamento auditivo da linguagem, além de desenvolver a consciência fonémica e aumentar o detalhe fonológico das representações lexicais. Neste estudo foram gravadas amostras de discurso espontâneo de sujeitos letrados (alta e baixa escolarização) e iletrados. Os espectros de modulação de energia nas bandas de interesse foram analisados bem como a sincronização das bandas delta-theta e theta-beta/ low gamma. Foram recolhidas medidas de consciência fonológica e vocabulário e foi realizada também uma tarefa para confirmar a sensibilidade do modelo de análise (S-AMPH) ao ritmo do discurso. A análise revelou ausência de diferenças na energia nas modulações delta, theta ou beta/low gamma no discurso espontâneo. Contudo, a sincronização entre as bandas aumentou significativamente com a literacia, revelando uma correlação moderada com as medidas de fonologia, vocabulário e literacia. Sendo assim, a literacia afecta não só a sincronização cortical e à linguagem falada mas também as propriedades físicas e rítmicas da produção do discurso

    Speech dynamics

    Get PDF

    Improving the Speech Intelligibility By Cochlear Implant Users

    Get PDF
    In this thesis, we focus on improving the intelligibility of speech for cochlear implants (CI) users. As an auditory prosthetic device, CI can restore hearing sensations for most patients with profound hearing loss in both ears in a quiet background. However, CI users still have serious problems in understanding speech in noisy and reverberant environments. Also, bandwidth limitation, missing temporal fine structures, and reduced spectral resolution due to a limited number of electrodes are other factors that raise the difficulty of hearing in noisy conditions for CI users, regardless of the type of noise. To mitigate these difficulties for CI listener, we investigate several contributing factors such as the effects of low harmonics on tone identification in natural and vocoded speech, the contribution of matched envelope dynamic range to the binaural benefits and contribution of low-frequency harmonics to tone identification in quiet and six-talker babble background. These results revealed several promising methods for improving speech intelligibility for CI patients. In addition, we investigate the benefits of voice conversion in improving speech intelligibility for CI users, which was motivated by an earlier study showing that familiarity with a talker’s voice can improve understanding of the conversation. Research has shown that when adults are familiar with someone’s voice, they can more accurately – and even more quickly – process and understand what the person is saying. This theory identified as the “familiar talker advantage” was our motivation to examine its effect on CI patients using voice conversion technique. In the present research, we propose a new method based on multi-channel voice conversion to improve the intelligibility of transformed speeches for CI patients
    • …
    corecore