Search CORE

59,423 research outputs found

DeepVoCoder: A CNN model for compression and coding of narrow band speech

Author: Ilk Hakki Gokhan
Keles Hacer Yalim
Rozhon Jan
Vozňák Miroslav
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

This paper proposes a convolutional neural network (CNN)-based encoder model to compress and code speech signal directly from raw input speech. Although the model can synthesize wideband speech by implicit bandwidth extension, narrowband is preferred for IP telephony and telecommunications purposes. The model takes time domain speech samples as inputs and encodes them using a cascade of convolutional filters in multiple layers, where pooling is applied after some layers to downsample the encoded speech by half. The final bottleneck layer of the CNN encoder provides an abstract and compact representation of the speech signal. In this paper, it is demonstrated that this compact representation is sufficient to reconstruct the original speech signal in high quality using the CNN decoder. This paper also discusses the theoretical background of why and how CNN may be used for end-to-end speech compression and coding. The complexity, delay, memory requirements, and bit rate versus quality are discussed in the experimental results.Web of Science7750897508

DSpace at VSB Technical University of Ostrava

Network planning for third-generation mobile radio systems

Author: Beach MA
Cheung JCS
McGeehan JP
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/1994
Field of study

Explore Bristol Research

Wavenet based low rate speech coding

Author: Kleijn W. Bastiaan
Lim Felicia S. C.
Luebs Alejandro
Skoglund Jan
Stimberg Florian
Walters Thomas C.
Wang Quan
Publication venue
Publication date: 01/12/2017
Field of study

Traditional parametric coding of speech facilitates low rate but provides poor reconstruction quality because of the inadequacy of the model used. We describe how a WaveNet generative speech model can be used to generate high quality speech from the bit stream of a standard parametric coder operating at 2.4 kb/s. We compare this parametric coder with a waveform coder based on the same generative model and show that approximating the signal waveform incurs a large rate penalty. Our experiments confirm the high performance of the WaveNet based coder and show that the speech produced by the system is able to additionally perform implicit bandwidth extension and does not significantly impair recognition of the original speaker for the human listener, even when that speaker has not been used during the training of the generative model.Comment: 5 pages, 2 figure

arXiv.org e-Print Archive

Crossref

Recommended from our members

Error resilient video transcoding for robust inter-network communications using GPRS

Author: Cellatoglu A
Dogan S
Kondoz AM
Sadka AH
Uyguroglu M
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2002
Field of study

A novel fully comprehensive mobile video communications system is proposed in this paper. This system exploits the useful rate management features of the video transcoders and combines them with error resilience for transmissions of coded video streams over general packet radio service (GPRS) mobileaccess networks. The error-resilient video transcoding operation takes place at a centralized point, referred to as a video proxy, which provides the necessary output transmission rates with the required amount of robustness. With the use of this proposed algorithm, error resilience can be added to an already compressed video stream at an intermediate stage at the edge of two or more different networks through two resilience schemes, namely the adaptive intra refresh (AIR) and feedback control signaling (FCS) methods. Both resilience tools impose an output rate increase which can also be prevented with the proposed novel technique in this paper. Thus, an error-resilient video transcoding scheme is presented to give robust video outputs at near target transmission rates that only require the same number of GPRS timeslots as the nonresilient schemes. Moreover, an ultimate robustness is also accomplished with the combination of the two resilience algorithms at the video proxy. Extensive computer simulations demonstrate the effectiveness of the proposed system

Brunel University Research Archive

Micro protocol engineering for unstructured carriers: On the embedding of steganographic control protocols into audio transmissions

Author: Keller Jörg
Mazurczyk Wojciech
Naumann Matthias
Wendzel Steffen
Publication venue
Publication date: 28/05/2015
Field of study

Network steganography conceals the transfer of sensitive information within unobtrusive data in computer networks. So-called micro protocols are communication protocols placed within the payload of a network steganographic transfer. They enrich this transfer with features such as reliability, dynamic overlay routing, or performance optimization --- just to mention a few. We present different design approaches for the embedding of hidden channels with micro protocols in digitized audio signals under consideration of different requirements. On the basis of experimental results, our design approaches are compared, and introduced into a protocol engineering approach for micro protocols.Comment: 20 pages, 7 figures, 4 table

arXiv.org e-Print Archive

Fraunhofer-ePrints