2,140 research outputs found
End to End Deep Neural Network Frequency Demodulation of Speech Signals
Frequency modulation (FM) is a form of radio broadcasting which is widely
used nowadays and has been for almost a century. We suggest a
software-defined-radio (SDR) receiver for FM demodulation that adopts an
end-to-end learning based approach and utilizes the prior information of
transmitted speech message in the demodulation process. The receiver detects
and enhances speech from the in-phase and quadrature components of its base
band version. The new system yields high performance detection for both
acoustical disturbances, and communication channel noise and is foreseen to
out-perform the established methods for low signal to noise ratio (SNR)
conditions in both mean square error and in perceptual evaluation of speech
quality score
Statistical models for natural sounds
It is important to understand the rich structure of natural sounds in order to solve important
tasks, like automatic speech recognition, and to understand auditory processing
in the brain. This thesis takes a step in this direction by characterising the statistics of
simple natural sounds. We focus on the statistics because perception often appears to
depend on them, rather than on the raw waveform. For example the perception of auditory
textures, like running water, wind, fire and rain, depends on summary-statistics,
like the rate of falling rain droplets, rather than on the exact details of the physical
source.
In order to analyse the statistics of sounds accurately it is necessary to improve a
number of traditional signal processing methods, including those for amplitude demodulation,
time-frequency analysis, and sub-band demodulation. These estimation tasks
are ill-posed and therefore it is natural to treat them as Bayesian inference problems.
The new probabilistic versions of these methods have several advantages. For example,
they perform more accurately on natural signals and are more robust to noise,
they can also fill-in missing sections of data, and provide error-bars. Furthermore,
free-parameters can be learned from the signal. Using these new algorithms we demonstrate
that the energy, sparsity, modulation depth and modulation time-scale in each
sub-band of a signal are critical statistics, together with the dependencies between the
sub-band modulators. In order to validate this claim, a model containing co-modulated
coloured noise carriers is shown to be capable of generating a range of realistic sounding
auditory textures.
Finally, we explored the connection between the statistics of natural sounds and perception.
We demonstrate that inference in the model for auditory textures qualitatively
replicates the primitive grouping rules that listeners use to understand simple acoustic
scenes. This suggests that the auditory system is optimised for the statistics of natural
sounds
A generative model for natural sounds based on latent force modelling
Generative models based on subband amplitude envelopes of natural sounds have resulted in convincing synthesis, showing subband amplitude modulation to be a crucial component of auditory perception. Probabilistic latent variable analysis can be particularly insightful, but existing approaches don’t incorporate prior knowledge about the physical behaviour of amplitude envelopes, such as exponential decay or feedback. We use latent force modelling, a probabilistic learning paradigm that encodes physical knowledge into Gaussian process regression, to model correlation across spectral subband envelopes. We augment the standard latent force model approach by explicitly modelling dependencies across multiple time steps. Incorporating this prior knowledge strengthens the interpretation of the latent functions as the source that generated the signal. We examine this interpretation via an experiment showing that sounds generated by sampling from our probabilistic model are perceived to be more realistic than those generated by comparative models based on nonnegative matrix factorisation, even in cases where our model is outperformed from a reconstruction error perspective
- …