45 research outputs found
Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition
Long short-term memory (LSTM) based acoustic modeling methods have recently
been shown to give state-of-the-art performance on some speech recognition
tasks. To achieve a further performance improvement, in this research, deep
extensions on LSTM are investigated considering that deep hierarchical model
has turned out to be more efficient than a shallow one. Motivated by previous
research on constructing deep recurrent neural networks (RNNs), alternative
deep LSTM architectures are proposed and empirically evaluated on a large
vocabulary conversational telephone speech recognition task. Meanwhile,
regarding to multi-GPU devices, the training process for LSTM networks is
introduced and discussed. Experimental results demonstrate that the deep LSTM
networks benefit from the depth and yield the state-of-the-art performance on
this task.Comment: submitted to ICASSP 2015 which does not perform blind review
Automatic speech recognition with deep neural networks for impaired speech
The final publication is available at https://link.springer.com/chapter/10.1007%2F978-3-319-49169-1_10Automatic Speech Recognition has reached almost human performance in some controlled scenarios. However, recognition of impaired speech is a difficult task for two main reasons: data is (i) scarce and (ii) heterogeneous. In this work we train different architectures on a database of dysarthric speech. A comparison between architectures shows that, even with a small database, hybrid DNN-HMM models outperform classical GMM-HMM according to word error rate measures. A DNN is able to improve the recognition word error rate a 13% for subjects with dysarthria with respect to the best classical architecture. This improvement is higher than the one given by other deep neural networks such as CNNs, TDNNs and LSTMs. All the experiments have been done with the Kaldi toolkit for speech recognition for which we have adapted several recipes to deal with dysarthric speech and work on the TORGO database. These recipes are publicly available.Peer ReviewedPostprint (author's final draft
Real time end-to-end glass break detection system using LSTM deep recurrent neural network
Presently, glass windows in commercial and residential buildings are very popular. While glass has its benefits, it is also disposed to security risks. Almost all glass break detectors use a pre-determined frequency of breaking
glass sound and vibration threshold signals of a pane to determine whether or not breakage has occurred. However, sounds such as thunder sounds, shouting, gunshot, hitting objects are similar in frequency and threshold
value to glass breaking sounds events, and may consequently cause false positives in detection in the alarm system. The aim of this paper is to propose a new design for a glass break detection system using LSTM deep recurrent
neural networks in an end to-end approach to reduce false positive alarm of state of the art glass break detectors. We utilized raw wave audio data to detect a glass break detection event in End-to-End learning approach. The
key benefit of End-to-End learning is avoiding the need of hand-crafted audio features. To address the issue of a vanishing gradient and exploding gradient problem in conventional recurrent neural networks, this paper proposed
deep long short term memory (LSTM) recurrent neural network to handle the sequence of the input audio data. As a real time detection result, the proposed glass break detection approach has a clear advantage over the conventional glass break detection system, as it yields significantly higher
precision accuracy (99.999988 %) and suffers less from environmental noise that might cause a false alarm
Deep learning with convolutional neural networks for decoding and visualization of EEG pathology
We apply convolutional neural networks (ConvNets) to the task of
distinguishing pathological from normal EEG recordings in the Temple University
Hospital EEG Abnormal Corpus. We use two basic, shallow and deep ConvNet
architectures recently shown to decode task-related information from EEG at
least as well as established algorithms designed for this purpose. In decoding
EEG pathology, both ConvNets reached substantially better accuracies (about 6%
better, ~85% vs. ~79%) than the only published result for this dataset, and
were still better when using only 1 minute of each recording for training and
only six seconds of each recording for testing. We used automated methods to
optimize architectural hyperparameters and found intriguingly different ConvNet
architectures, e.g., with max pooling as the only nonlinearity. Visualizations
of the ConvNet decoding behavior showed that they used spectral power changes
in the delta (0-4 Hz) and theta (4-8 Hz) frequency range, possibly alongside
other features, consistent with expectations derived from spectral analysis of
the EEG data and from the textual medical reports. Analysis of the textual
medical reports also highlighted the potential for accuracy increases by
integrating contextual information, such as the age of subjects. In summary,
the ConvNets and visualization techniques used in this study constitute a next
step towards clinically useful automated EEG diagnosis and establish a new
baseline for future work on this topic.Comment: Published at IEEE SPMB 2017 https://www.ieeespmb.org/2017