13,928 research outputs found
End to End Deep Neural Network Frequency Demodulation of Speech Signals
Frequency modulation (FM) is a form of radio broadcasting which is widely
used nowadays and has been for almost a century. We suggest a
software-defined-radio (SDR) receiver for FM demodulation that adopts an
end-to-end learning based approach and utilizes the prior information of
transmitted speech message in the demodulation process. The receiver detects
and enhances speech from the in-phase and quadrature components of its base
band version. The new system yields high performance detection for both
acoustical disturbances, and communication channel noise and is foreseen to
out-perform the established methods for low signal to noise ratio (SNR)
conditions in both mean square error and in perceptual evaluation of speech
quality score
Group-Sparse Signal Denoising: Non-Convex Regularization, Convex Optimization
Convex optimization with sparsity-promoting convex regularization is a
standard approach for estimating sparse signals in noise. In order to promote
sparsity more strongly than convex regularization, it is also standard practice
to employ non-convex optimization. In this paper, we take a third approach. We
utilize a non-convex regularization term chosen such that the total cost
function (consisting of data consistency and regularization terms) is convex.
Therefore, sparsity is more strongly promoted than in the standard convex
formulation, but without sacrificing the attractive aspects of convex
optimization (unique minimum, robust algorithms, etc.). We use this idea to
improve the recently developed 'overlapping group shrinkage' (OGS) algorithm
for the denoising of group-sparse signals. The algorithm is applied to the
problem of speech enhancement with favorable results in terms of both SNR and
perceptual quality.Comment: 14 pages, 11 figure
Reconstructing intelligible audio speech from visual speech features
This work describes an investigation into the feasibility of producing intelligible audio speech from only visual speech fea- tures. The proposed method aims to estimate a spectral enve- lope from visual features which is then combined with an arti- ficial excitation signal and used within a model of speech pro- duction to reconstruct an audio signal. Different combinations of audio and visual features are considered, along with both a statistical method of estimation and a deep neural network. The intelligibility of the reconstructed audio speech is measured by human listeners, and then compared to the intelligibility of the video signal only and when combined with the reconstructed audio
Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments
Eliminating the negative effect of non-stationary environmental noise is a
long-standing research topic for automatic speech recognition that stills
remains an important challenge. Data-driven supervised approaches, including
ones based on deep neural networks, have recently emerged as potential
alternatives to traditional unsupervised approaches and with sufficient
training, can alleviate the shortcomings of the unsupervised methods in various
real-life acoustic environments. In this light, we review recently developed,
representative deep learning approaches for tackling non-stationary additive
and convolutional degradation of speech with the aim of providing guidelines
for those involved in the development of environmentally robust speech
recognition systems. We separately discuss single- and multi-channel techniques
developed for the front-end and back-end of speech recognition systems, as well
as joint front-end and back-end training frameworks
- …