19,921 research outputs found

    Light Gated Recurrent Units for Speech Recognition

    Full text link
    A field that has directly benefited from the recent advances in deep learning is Automatic Speech Recognition (ASR). Despite the great achievements of the past decades, however, a natural and robust human-machine speech interaction still appears to be out of reach, especially in challenging environments characterized by significant noise and reverberation. To improve robustness, modern speech recognizers often employ acoustic models based on Recurrent Neural Networks (RNNs), that are naturally able to exploit large time contexts and long-term speech modulations. It is thus of great interest to continue the study of proper techniques for improving the effectiveness of RNNs in processing speech signals. In this paper, we revise one of the most popular RNN models, namely Gated Recurrent Units (GRUs), and propose a simplified architecture that turned out to be very effective for ASR. The contribution of this work is two-fold: First, we analyze the role played by the reset gate, showing that a significant redundancy with the update gate occurs. As a result, we propose to remove the former from the GRU design, leading to a more efficient and compact single-gate model. Second, we propose to replace hyperbolic tangent with ReLU activations. This variation couples well with batch normalization and could help the model learn long-term dependencies without numerical issues. Results show that the proposed architecture, called Light GRU (Li-GRU), not only reduces the per-epoch training time by more than 30% over a standard GRU, but also consistently improves the recognition accuracy across different tasks, input features, noisy conditions, as well as across different ASR paradigms, ranging from standard DNN-HMM speech recognizers to end-to-end CTC models.Comment: Copyright 2018 IEE

    Evaluation of the neo-glottal closure based on the source description in esophageal voice

    Get PDF
    The characteristics of esophageal voice render its study by traditional acoustic means to be limited and complicate. These limitations are even stronger when working with patients lacking minimal skills to control the required technique. Nevertheless the speech therapist needs to know the performance and mechanics developed by the patient in producing esophageal voice, as the specific techniques required in this case are not as universal and well-known as the ones for normal voicing. Each patient develops different strategies for producing esophageal voice due to the anatomical changes affecting the crico-pharyngeal sphincter (CPS) and the functional losses resulting from surgery. Therefore it is of fundamental relevance that practitioners could count on new instruments to evaluate esophageal voice quality, which on its turn could help in the enhancement of the CPS dynamics. The present work carries out a description of the voice of four patients after undergoing laryngectomy on data obtained from the study of the neo-glottal wave profile. Results obtained after analyzing the open-close phases and the tension of the muscular body on the CPS are shown

    Bone conductive implants in single sided deafness

    Get PDF
    Conclusion: The Bone Conductive Implants (BCI) showed to partly restore some of the functions lost when the binaural hearing is missing, such as in the single-sided deafness (SSD) subjects. The adoption of the single BCI needs to be advised by the clinician on the ground of a thorough counselling with the SSD subject. Objectives: To perform an overview of the present possibilities of BCI in SSD and to evaluate the reliability of the audiological evaluation for assessing the speech recognition in noise and the sound localization cues, as major problems related to the loss of binaural hearing. Method: Nine SSD subjects who underwent BCI implantation underwent a pre-operative audiological evaluation, consisting in the soundfield speech audiometry, as word recognition score (WRS) and sound localization, in quiet and in noise. Moreover, they were also tested for the accuracy of directional word recognition in noise and with the subjective evaluation with APHAB questionnaire. Results: The mean maximum percentage of word discrimination was 65.5% in the unaided condition and 78.9% in the BCI condition. The sound localization in noise with the BCI was better than the unaided condition, especially when stimulus and noise were on the same side of the implanted ear. The accuracy of directional word recognition showed to improve with BCI in respect to the unaided condition, in the BCI side, with either the stimulus on the implanted ear and the noise in the contralateral ear, or when both stimulus and noise were deliver to implanted ear

    Investigating the Neural Basis of Audiovisual Speech Perception with Intracranial Recordings in Humans

    Get PDF
    Speech is inherently multisensory, containing auditory information from the voice and visual information from the mouth movements of the talker. Hearing the voice is usually sufficient to understand speech, however in noisy environments or when audition is impaired due to aging or disabilities, seeing mouth movements greatly improves speech perception. Although behavioral studies have well established this perceptual benefit, it is still not clear how the brain processes visual information from mouth movements to improve speech perception. To clarify this issue, I studied the neural activity recorded from the brain surfaces of human subjects using intracranial electrodes, a technique known as electrocorticography (ECoG). First, I studied responses to noisy speech in the auditory cortex, specifically in the superior temporal gyrus (STG). Previous studies identified the anterior parts of the STG as unisensory, responding only to auditory stimulus. On the other hand, posterior parts of the STG are known to be multisensory, responding to both auditory and visual stimuli, which makes it a key region for audiovisual speech perception. I examined how these different parts of the STG respond to clear versus noisy speech. I found that noisy speech decreased the amplitude and increased the across-trial variability of the response in the anterior STG. However, possibly due to its multisensory composition, posterior STG was not as sensitive to auditory noise as the anterior STG and responded similarly to clear and noisy speech. I also found that these two response patterns in the STG were separated by a sharp boundary demarcated by the posterior-most portion of the Heschl’s gyrus. Second, I studied responses to silent speech in the visual cortex. Previous studies demonstrated that visual cortex shows response enhancement when the auditory component of speech is noisy or absent, however it was not clear which regions of the visual cortex specifically show this response enhancement and whether this response enhancement is a result of top-down modulation from a higher region. To test this, I first mapped the receptive fields of different regions in the visual cortex and then measured their responses to visual (silent) and audiovisual speech stimuli. I found that visual regions that have central receptive fields show greater response enhancement to visual speech, possibly because these regions receive more visual information from mouth movements. I found similar response enhancement to visual speech in frontal cortex, specifically in the inferior frontal gyrus, premotor and dorsolateral prefrontal cortices, which have been implicated in speech reading in previous studies. I showed that these frontal regions display strong functional connectivity with visual regions that have central receptive fields during speech perception

    OPA1-related auditory neuropathy: site of lesion and outcome of cochlear implantation.

    Get PDF
    Hearing impairment is the second most prevalent clinical feature after optic atrophy in Dominant Optic Atrophy associated with mutations in the OPA1 gene. In this study we characterized the hearing dysfunction in OPA1-linked disorders and provided effective rehabilitative options to improve speech perception. We studied two groups of OPA1 subjects, one comprising 11 patients (7 males; age range 13-79 years) carrying OPA1 mutations inducing haploinsufficiency, the other, 10 subjects (3 males; age range 5-58 years) carrying OPA1 missense mutations. Both groups underwent audiometric assessment with pure tone and speech perception evaluation, and otoacoustic emissions and auditory brainstem response recording. Cochlear potentials were recorded through transtympanic electrocochleography from the group of patients harboring OPA1 missense mutations and were compared to recordings obtained from 20 normally-hearing controls and from 19 subjects with cochlear hearing loss. Eight patients carrying OPA1 missense mutations underwent cochlear implantation. Speech perception measures and electrically-evoked auditory nerve and brainstem responses were obtained after one year of cochlear implant use. Nine out of 11 patients carrying OPA1 mutations inducing haploinsufficiency had normal hearing function. In contrast, all but one subject harboring OPA1 missense mutations displayed impaired speech perception, abnormal brainstem responses and presence of otoacoustic emissions consistent with auditory neuropathy. In electrocochleography recordings, cochlear microphonic had enhanced amplitudes while summating potential showed normal latency and peak amplitude consistent with preservation of both outer and inner hair cell activities. After cancelling the cochlear microphonic, the synchronized neural response seen in both normally-hearing controls and subjects with cochlear hearing loss was replaced by a prolonged, low-amplitude negative potential that decreased in both amplitude and duration during rapid stimulation consistent with neural generation. The use of cochlear implant improved speech perception in all but one patient. Brainstem potentials were recorded in response to electrical stimulation in five subjects out of six, whereas no compound action potential was evoked from the auditory nerve through the cochlear implant. These findings indicate that underlying the hearing impairment in patients carrying OPA1 missense mutations is a disordered synchrony in auditory nerve fiber activity resulting from neural degeneration affecting the terminal dendrites. Cochlear implantation improves speech perception and synchronous activation of auditory pathways by by-passing the site of lesion

    Training of Working Memory Impacts Neural Processing of Vocal Pitch Regulation

    Get PDF
    Working memory training can improve the performance of tasks that were not trained. Whether auditory-motor integration for voice control can benefit from working memory training, however, remains unclear. The present event-related potential (ERP) study examined the impact of working memory training on the auditory-motor processing of vocal pitch. Trained participants underwent adaptive working memory training using a digit span backwards paradigm, while control participants did not receive any training. Before and after training, both trained and control participants were exposed to frequency-altered auditory feedback while producing vocalizations. After training, trained participants exhibited significantly decreased N1 amplitudes and increased P2 amplitudes in response to pitch errors in voice auditory feedback. In addition, there was a significant positive correlation between the degree of improvement in working memory capacity and the post-pre difference in P2 amplitudes. Training-related changes in the vocal compensation, however, were not observed. There was no systematic change in either vocal or cortical responses for control participants. These findings provide evidence that working memory training impacts the cortical processing of feedback errors in vocal pitch regulation. This enhanced cortical processing may be the result of increased neural efficiency in the detection of pitch errors between the intended and actual feedback

    Development of neural responses to hearing their own name in infants at low and high risk for autism spectrum disorder

    Get PDF
    The own name is a salient stimulus, used by others to initiate social interaction. Typically developing infants orient towards the sound of their own name and exhibit enhanced event-related potentials (ERP) at 5 months. The lack of orientation to the own name is considered to be one of the earliest signs of autism spectrum disorder (ASD). In this study, we investigated ERPs to hearing the own name in infants at high and low risk for ASD, at 10 and 14 months. We hypothesized that low-risk infants would exhibit enhanced frontal ERP responses to their own name compared to an unfamiliar name, while high-risk infants were expected to show attenuation or absence of this difference in their ERP responses. In contrast to expectations, we did not find enhanced ERPs to own name in the low-risk group. However, the high-risk group exhibited attenuated frontal positive-going activity to their own name compared to an unfamiliar name and compared to the low-risk group, at the age of 14 months. These results suggest that infants at high risk for ASD start to process their own name differently shortly after one year of age, a period when frontal brain development is happening at a fast rate
    • …
    corecore