95,006 research outputs found
Method and apparatus for obtaining complete speech signals for speech recognition applications
The present invention relates to a method and apparatus for obtaining complete speech signals for speech recognition applications. In one embodiment, the method continuously records an audio stream comprising a sequence of frames to a circular buffer. When a user command to commence or terminate speech recognition is received, the method obtains a number of frames of the audio stream occurring before or after the user command in order to identify an augmented audio signal for speech recognition processing. In further embodiments, the method analyzes the augmented audio signal in order to locate starting and ending speech endpoints that bound at least a portion of speech to be processed for recognition. At least one of the speech endpoints is located using a Hidden Markov Model
Pathological speech classification using a convolutional neural network
Convolutional Neural Networks (CNNs) have enabled significant improvements across a number of applications in computer vision such as object detection, face recognition and image classification. An audio
signal can be visually represented as a spectrogram that captures the time-varying frequency content of the signal. This paper describes how a CNN can be applied to the spectrogram of an audio signal to distinguish
pathological from healthy speech. We propose a CNN structure and
implement it using Keras to test the approach. A classification accuracy of over 95% is obtained in experiments on two public pathological
speech datasets
Pathological Speech Classification Using a Convolutional Neural Network
Convolutional Neural Networks (CNNs) have enabled significant improvements across a number of applications in computer vision such as object detection, face recognition and image classification. An audio signal can be visually represented as a spectrogram that captures the time-varying frequency content of the signal. This paper describes how a CNN can be applied to the spectrogram of an audio signal to distinguish pathological from healthy speech. We propose a CNN structure and implement it using Keras to test the approach. A classification accuracy of over 95% is obtained in experiments on two public pathological speech datasets
TasNet: time-domain audio separation network for real-time, single-channel speech separation
Robust speech processing in multi-talker environments requires effective
speech separation. Recent deep learning systems have made significant progress
toward solving this problem, yet it remains challenging particularly in
real-time, short latency applications. Most methods attempt to construct a mask
for each source in time-frequency representation of the mixture signal which is
not necessarily an optimal representation for speech separation. In addition,
time-frequency decomposition results in inherent problems such as
phase/magnitude decoupling and long time window which is required to achieve
sufficient frequency resolution. We propose Time-domain Audio Separation
Network (TasNet) to overcome these limitations. We directly model the signal in
the time-domain using an encoder-decoder framework and perform the source
separation on nonnegative encoder outputs. This method removes the frequency
decomposition step and reduces the separation problem to estimation of source
masks on encoder outputs which is then synthesized by the decoder. Our system
outperforms the current state-of-the-art causal and noncausal speech separation
algorithms, reduces the computational cost of speech separation, and
significantly reduces the minimum required latency of the output. This makes
TasNet suitable for applications where low-power, real-time implementation is
desirable such as in hearable and telecommunication devices.Comment: Camera ready version for ICASSP 2018, Calgary, Canad
- …