Search CORE

2,045 research outputs found

Wavenet based low rate speech coding

Author: Kleijn W. Bastiaan
Lim Felicia S. C.
Luebs Alejandro
Skoglund Jan
Stimberg Florian
Walters Thomas C.
Wang Quan
Publication venue
Publication date: 01/12/2017
Field of study

Traditional parametric coding of speech facilitates low rate but provides poor reconstruction quality because of the inadequacy of the model used. We describe how a WaveNet generative speech model can be used to generate high quality speech from the bit stream of a standard parametric coder operating at 2.4 kb/s. We compare this parametric coder with a waveform coder based on the same generative model and show that approximating the signal waveform incurs a large rate penalty. Our experiments confirm the high performance of the WaveNet based coder and show that the speech produced by the system is able to additionally perform implicit bandwidth extension and does not significantly impair recognition of the original speaker for the human listener, even when that speaker has not been used during the training of the generative model.Comment: 5 pages, 2 figure

arXiv.org e-Print Archive

Crossref

A Tutorial on Speech Synthesis Models

Author: Affifi Sadek
Boughazi Mohamed
Tabet Youcef
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

For Speech Synthesis, the understanding of the physical and mathematical models of speech is essential. Hence, Speech Modeling is a large field, and is well documented in literature. The aim in this paper is to provide a background review of several speech models used in speech synthesis, specifically the Source Filter Model, Linear Prediction Model, Sinusoidal Model, and Harmonic/Noise Model. The most important models of speech signals will be described starting from the earlier ones up until the last ones, in order to highlight major improvements over these models. It would be desirable a parametric model of speech, that is relatively simple, flexible, high quality, and robust in re-synthesis. Emphasis will be given in Harmonic / Noise Model, since it seems to be more promising and robust model of speech. (C) 2015 The Authors. Published by Elsevier B.V

Archives ouvertes de l'Université M'hamed Bougara Boumerdes

Software and hardware implementation techniques for digital communications-related algorithms

Author: Safdar M. Asghar (7202186)
Publication venue
Publication date: 01/01/2001
Field of study

There are essentially three areas addressed in the body of this thesis. (a) The first is a theoretical investigation into the design and development of a practically realizable implementation of a maximum-likelihood detection process to deal with digital data transmission over HF radio links. These links exhibit multipath properties with delay spreads that can easily extend over 12 to 15 milliseconds. The project was sponsored by the Ministry of Defence through the auspices of the Science and Engineering Research Council. The primary objective was to transmit voice band data at a minimum rate of 2.4 kb/s continuously for long periods of time during the day or night. Computer simulation models of HF propagation channels were created to simulate atmospheric and multipath effects of transmission from London to Washington DC, Ankara, and as far as Melbourne, Australia. Investigations into HF channel estimation are not the subject of this thesis. The detection process assumed accurate knowledge of the channel. [Continues.

Loughborough University Institutional Repository

The DESAM toolbox: spectral analysis of musical audio

Author: Badeau Roland
Bertin Nancy
Daudet Laurent
David Bertrand
Derrien Olivier
Echeveste Jose
Lagrange Mathieu
Marchand Sylvain
Publication venue: HAL CCSD
Publication date: 01/09/2010
Field of study

International audienceIn this paper is presented the DESAM Toolbox, a set of Matlab functions dedicated to the estimation of widely used spectral models for music signals. Although those models can be used in Music Information Retrieval (MIR) tasks, the core functions of the toolbox do not focus on any specific application. It is rather aimed at providing a range of state-of-the-art signal processing tools that decompose music files according to different signal models, giving rise to different ``mid-level'' representations. After motivating the need for such a toolbox, this paper offers an overview of the overall organization of the toolbox, and describes all available functionalities

HAL-CentraleSupelec

HAL AMU

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Individual identity in songbirds: signal representations and metric learning for locating the information in complex corvid calls

Author: Assoc ISC
Gill LF
Morfi V
Stowell D
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2016
Field of study

Bird calls range from simple tones to rich dynamic multi-harmonic structures. The more complex calls are very poorly understood at present, such as those of the scientifically important corvid family (jackdaws, crows, ravens, etc.). Individual birds can recognise familiar individuals from calls, but where in the signal is this identity encoded? We studied the question by applying a combination of feature representations to a dataset of jackdaw calls, including linear predictive coding (LPC) and adaptive discrete Fourier transform (aDFT). We demonstrate through a classification paradigm that we can strongly outperform a standard spectrogram representation for identifying individuals, and we apply metric learning to determine which time-frequency regions contribute most strongly to robust individual identification. Computational methods can help to direct our search for understanding of these complex biological signals

arXiv.org e-Print Archive

Crossref

Queen Mary Research Online

MPG.PuRe