724 research outputs found
Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments
Eliminating the negative effect of non-stationary environmental noise is a
long-standing research topic for automatic speech recognition that stills
remains an important challenge. Data-driven supervised approaches, including
ones based on deep neural networks, have recently emerged as potential
alternatives to traditional unsupervised approaches and with sufficient
training, can alleviate the shortcomings of the unsupervised methods in various
real-life acoustic environments. In this light, we review recently developed,
representative deep learning approaches for tackling non-stationary additive
and convolutional degradation of speech with the aim of providing guidelines
for those involved in the development of environmentally robust speech
recognition systems. We separately discuss single- and multi-channel techniques
developed for the front-end and back-end of speech recognition systems, as well
as joint front-end and back-end training frameworks
Knock-Knock: Acoustic Object Recognition using Stacked Denoising Autoencoders
This paper presents a successful application of deep learning for object
recognition based on acoustic data. The shortcomings of previously employed
approaches where handcrafted features describing the acoustic data are being
used, include limiting the capability of the found representation to be widely
applicable and facing the risk of capturing only insignificant characteristics
for a task. In contrast, there is no need to define the feature representation
format when using multilayer/deep learning architecture methods: features can
be learned from raw sensor data without defining discriminative characteristics
a-priori. In this paper, stacked denoising autoencoders are applied to train a
deep learning model. Knocking each object in our test set 120 times with a
marker pen to obtain the auditory data, thirty different objects were
successfully classified in our experiment and each object was knocked 120 times
by a marker pen to obtain the auditory data. By employing the proposed deep
learning framework, a high accuracy of 91.50% was achieved. A traditional
method using handcrafted features with a shallow classifier was taken as a
benchmark and the attained recognition rate was only 58.22%. Interestingly, a
recognition rate of 82.00% was achieved when using a shallow classifier with
raw acoustic data as input. In addition, we could show that the time taken to
classify one object using deep learning was far less (by a factor of more than
6) than utilizing the traditional method. It was also explored how different
model parameters in our deep architecture affect the recognition performance.Comment: 6 pages, 10 figures, Neurocomputin
RawNet: Fast End-to-End Neural Vocoder
Neural networks based vocoders have recently demonstrated the powerful
ability to synthesize high quality speech. These models usually generate
samples by conditioning on some spectrum features, such as Mel-spectrum.
However, these features are extracted by using speech analysis module including
some processing based on the human knowledge. In this work, we proposed RawNet,
a truly end-to-end neural vocoder, which use a coder network to learn the
higher representation of signal, and an autoregressive voder network to
generate speech sample by sample. The coder and voder together act like an
auto-encoder network, and could be jointly trained directly on raw waveform
without any human-designed features. The experiments on the Copy-Synthesis
tasks show that RawNet can achieve the comparative synthesized speech quality
with LPCNet, with a smaller model architecture and faster speech generation at
the inference step.Comment: Submitted to Interspeech 2019, Graz, Austri
Auto-encoder based deep learning for surface electromyography signal processing
© 2018 Advances in Science, Technology and Engineering Systems. All Rights Reserved. Feature extraction is taking a very vital and essential part of bio-signal processing. We need to choose one of two paths to identify and select features in any system. The most popular track is engineering handcrafted, which mainly depends on the user experience and the field of application. While the other path is feature learning, which depends on training the system on recognising and picking the best features that match the application. The main concept of feature learning is to create a model that is expected to be able to learn the best features without any human intervention instead of recourse the traditional methods for feature extraction or reduction and avoid dealing with feature extraction that depends on researcher experience. In this paper, Auto-Encoder will be utilised as a feature learning algorithm to practice the recommended model to excerpt the useful features from the surface electromyography signal. Deep learning method will be suggested by using Auto-Encoder to learn features. Wavelet Packet, Spectrogram, and Wavelet will be employed to represent the surface electromyography signal in our recommended model. Then, the newly represented bio-signal will be fed to stacked autoencoder (2 stages) to learn features and finally, the behaviour of the proposed algorithm will be estimated by hiring different classifiers such as Extreme Learning Machine, Support Vector Machine, and SoftMax Layer. The Rectified Linear Unit (ReLU) will be created as an activation function for extreme learning machine classifier besides existing functions such as sigmoid and radial basis function. ReLU will show a better classification ability than sigmoid and Radial basis function (RBF) for wavelet, Wavelet scale 5 and wavelet packet signal representations implemented techniques. ReLU will illustrate better classification ability, as an activation function, than sigmoid and poorer than RBF for spectrogram signal representation. Both confidence interval and Analysis of Variance will be estimated for different classifiers. Classifier fusion layer will be implemented to glean the classifier which will progress the best accuracies' values for both testing and training to develop the results. Classifier fusion layer brought an encouraging value for both accuracies either training or testing ones
- …