2,045 research outputs found
Wavenet based low rate speech coding
Traditional parametric coding of speech facilitates low rate but provides
poor reconstruction quality because of the inadequacy of the model used. We
describe how a WaveNet generative speech model can be used to generate high
quality speech from the bit stream of a standard parametric coder operating at
2.4 kb/s. We compare this parametric coder with a waveform coder based on the
same generative model and show that approximating the signal waveform incurs a
large rate penalty. Our experiments confirm the high performance of the WaveNet
based coder and show that the speech produced by the system is able to
additionally perform implicit bandwidth extension and does not significantly
impair recognition of the original speaker for the human listener, even when
that speaker has not been used during the training of the generative model.Comment: 5 pages, 2 figure
A Tutorial on Speech Synthesis Models
For Speech Synthesis, the understanding of the physical and mathematical models of speech is essential. Hence, Speech Modeling is a large field, and is well documented in literature. The aim in this paper is to provide a background review of several speech models used in speech synthesis, specifically the Source Filter Model, Linear Prediction Model, Sinusoidal Model, and Harmonic/Noise Model. The most important models of speech signals will be described starting from the earlier ones up until the last ones, in order to highlight major improvements over these models. It would be desirable a parametric model of speech, that is relatively simple, flexible, high quality, and robust in re-synthesis. Emphasis will be given in Harmonic / Noise Model, since it seems to be more promising and robust model of speech. (C) 2015 The Authors. Published by Elsevier B.V
Software and hardware implementation techniques for digital communications-related algorithms
There are essentially three areas addressed in the body of this thesis. (a) The first is a theoretical investigation into the design and development of a practically realizable
implementation of a maximum-likelihood detection process to deal with digital data transmission over
HF radio links. These links exhibit multipath properties with delay spreads that can easily extend over 12
to 15 milliseconds. The project was sponsored by the Ministry of Defence through the auspices of the Science
and Engineering Research Council. The primary objective was to transmit voice band data at a minimum
rate of 2.4 kb/s continuously for long periods of time during the day or night. Computer simulation
models of HF propagation channels were created to simulate atmospheric and multipath effects of transmission
from London to Washington DC, Ankara, and as far as Melbourne, Australia. Investigations into
HF channel estimation are not the subject of this thesis. The detection process assumed accurate knowledge
of the channel. [Continues.
The DESAM toolbox: spectral analysis of musical audio
International audienceIn this paper is presented the DESAM Toolbox, a set of Matlab functions dedicated to the estimation of widely used spectral models for music signals. Although those models can be used in Music Information Retrieval (MIR) tasks, the core functions of the toolbox do not focus on any specific application. It is rather aimed at providing a range of state-of-the-art signal processing tools that decompose music files according to different signal models, giving rise to different ``mid-level'' representations. After motivating the need for such a toolbox, this paper offers an overview of the overall organization of the toolbox, and describes all available functionalities
Individual identity in songbirds: signal representations and metric learning for locating the information in complex corvid calls
Bird calls range from simple tones to rich dynamic multi-harmonic structures.
The more complex calls are very poorly understood at present, such as those of
the scientifically important corvid family (jackdaws, crows, ravens, etc.).
Individual birds can recognise familiar individuals from calls, but where in
the signal is this identity encoded? We studied the question by applying a
combination of feature representations to a dataset of jackdaw calls, including
linear predictive coding (LPC) and adaptive discrete Fourier transform (aDFT).
We demonstrate through a classification paradigm that we can strongly
outperform a standard spectrogram representation for identifying individuals,
and we apply metric learning to determine which time-frequency regions
contribute most strongly to robust individual identification. Computational
methods can help to direct our search for understanding of these complex
biological signals
- …