45 research outputs found

    Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition

    Full text link
    Long short-term memory (LSTM) based acoustic modeling methods have recently been shown to give state-of-the-art performance on some speech recognition tasks. To achieve a further performance improvement, in this research, deep extensions on LSTM are investigated considering that deep hierarchical model has turned out to be more efficient than a shallow one. Motivated by previous research on constructing deep recurrent neural networks (RNNs), alternative deep LSTM architectures are proposed and empirically evaluated on a large vocabulary conversational telephone speech recognition task. Meanwhile, regarding to multi-GPU devices, the training process for LSTM networks is introduced and discussed. Experimental results demonstrate that the deep LSTM networks benefit from the depth and yield the state-of-the-art performance on this task.Comment: submitted to ICASSP 2015 which does not perform blind review

    Automatic speech recognition with deep neural networks for impaired speech

    Get PDF
    The final publication is available at https://link.springer.com/chapter/10.1007%2F978-3-319-49169-1_10Automatic Speech Recognition has reached almost human performance in some controlled scenarios. However, recognition of impaired speech is a difficult task for two main reasons: data is (i) scarce and (ii) heterogeneous. In this work we train different architectures on a database of dysarthric speech. A comparison between architectures shows that, even with a small database, hybrid DNN-HMM models outperform classical GMM-HMM according to word error rate measures. A DNN is able to improve the recognition word error rate a 13% for subjects with dysarthria with respect to the best classical architecture. This improvement is higher than the one given by other deep neural networks such as CNNs, TDNNs and LSTMs. All the experiments have been done with the Kaldi toolkit for speech recognition for which we have adapted several recipes to deal with dysarthric speech and work on the TORGO database. These recipes are publicly available.Peer ReviewedPostprint (author's final draft

    Real time end-to-end glass break detection system using LSTM deep recurrent neural network

    Get PDF
    Presently, glass windows in commercial and residential buildings are very popular. While glass has its benefits, it is also disposed to security risks. Almost all glass break detectors use a pre-determined frequency of breaking glass sound and vibration threshold signals of a pane to determine whether or not breakage has occurred. However, sounds such as thunder sounds, shouting, gunshot, hitting objects are similar in frequency and threshold value to glass breaking sounds events, and may consequently cause false positives in detection in the alarm system. The aim of this paper is to propose a new design for a glass break detection system using LSTM deep recurrent neural networks in an end to-end approach to reduce false positive alarm of state of the art glass break detectors. We utilized raw wave audio data to detect a glass break detection event in End-to-End learning approach. The key benefit of End-to-End learning is avoiding the need of hand-crafted audio features. To address the issue of a vanishing gradient and exploding gradient problem in conventional recurrent neural networks, this paper proposed deep long short term memory (LSTM) recurrent neural network to handle the sequence of the input audio data. As a real time detection result, the proposed glass break detection approach has a clear advantage over the conventional glass break detection system, as it yields significantly higher precision accuracy (99.999988 %) and suffers less from environmental noise that might cause a false alarm

    Deep learning with convolutional neural networks for decoding and visualization of EEG pathology

    Get PDF
    We apply convolutional neural networks (ConvNets) to the task of distinguishing pathological from normal EEG recordings in the Temple University Hospital EEG Abnormal Corpus. We use two basic, shallow and deep ConvNet architectures recently shown to decode task-related information from EEG at least as well as established algorithms designed for this purpose. In decoding EEG pathology, both ConvNets reached substantially better accuracies (about 6% better, ~85% vs. ~79%) than the only published result for this dataset, and were still better when using only 1 minute of each recording for training and only six seconds of each recording for testing. We used automated methods to optimize architectural hyperparameters and found intriguingly different ConvNet architectures, e.g., with max pooling as the only nonlinearity. Visualizations of the ConvNet decoding behavior showed that they used spectral power changes in the delta (0-4 Hz) and theta (4-8 Hz) frequency range, possibly alongside other features, consistent with expectations derived from spectral analysis of the EEG data and from the textual medical reports. Analysis of the textual medical reports also highlighted the potential for accuracy increases by integrating contextual information, such as the age of subjects. In summary, the ConvNets and visualization techniques used in this study constitute a next step towards clinically useful automated EEG diagnosis and establish a new baseline for future work on this topic.Comment: Published at IEEE SPMB 2017 https://www.ieeespmb.org/2017
    corecore