Search CORE

3 research outputs found

Convolutional Neural Networks for Raw Speech Recognition

Author: Aggarwal Rajesh Kumar
Passricha Vishal
Publication venue: 'IntechOpen'
Publication date: 12/12/2018
Field of study

State-of-the-art automatic speech recognition (ASR) systems map the speech signal into its corresponding text. Traditional ASR systems are based on Gaussian mixture model. The emergence of deep learning drastically improved the recognition rate of ASR systems. Such systems are replacing traditional ASR systems. These systems can also be trained in end-to-end manner. End-to-end ASR systems are gaining much popularity due to simplified model-building process and abilities to directly map speech into the text without any predefined alignments. Three major types of end-to-end architectures for ASR are attention-based methods, connectionist temporal classification, and convolutional neural network (CNN)-based direct raw speech model. In this chapter, CNN-based acoustic model for raw speech signal is discussed. It establishes the relation between raw speech signal and phones in a data-driven manner. Relevant features and classifier both are jointly learned from the raw speech. Raw speech is processed by first convolutional layer to learn the feature representation. The output of first convolutional layer, that is, intermediate representation, is more discriminative and further processed by rest convolutional layers. This system uses only few parameters and performs better than traditional cepstral feature-based systems. The performance of the system is evaluated for TIMIT and claimed similar performance as MFCC

IntechOpen

Crossref

Multi-level region-of-interest CNNs for end to end speech recognition

Author: GE Dahl
H Hermansky
HA Bourlard
Jan Vaněk
JS Bridle
K-F Lee
Kaiming He
L Toth
LR Rabiner
M Dua
M Gales
N Srivastava
P Swietojanski
Pooja Sharma
Rajesh Kumar Aggarwal
S Lee
Sandeep Rathor
SB Davis
Shubhanshi Singhal
TN Sainath
Vishal Passricha
Y Huang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Convolutional support vector machines for speech recognition

Author: A Robinson
AR Mohamed
AR Mohamed
D Vasquez
G Dahl
HA Bourlard
J Nagi
JK Suykens
K Crammer
K-F Lee
M Gales
O Abdel-Hamid
O Abdel-Hamid
O Viikki
Rajesh Kumar Aggarwal
S Shalev-Shwartz
S-X Zhang
SX Zhang
T Joachims
TN Sainath
Vishal Passricha
VN Vapnik
Y Hifny
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref