Search CORE

186 research outputs found

Speech coding at medium bit rates using analysis by synthesis techniques

Author: Nikolaos Gouvianakis (7201001)
Publication venue
Publication date: 12/08/2019
Field of study

Speech coding at medium bit rates using analysis by synthesis technique

Loughborough University Institutional Repository

Novel Pitch Detection Algorithm With Application to Speech Coding

Author: Kura Vijay
Publication venue: ScholarWorks@UNO
Publication date: 19/12/2003
Field of study

This thesis introduces a novel method for accurate pitch detection and speech segmentation, named Multi-feature, Autocorrelation (ACR) and Wavelet Technique (MAWT). MAWT uses feature extraction, and ACR applied on Linear Predictive Coding (LPC) residuals, with a wavelet-based refinement step. MAWT opens the way for a unique approach to modeling: although speech is divided into segments, the success of voicing decisions is not crucial. Experiments demonstrate the superiority of MAWT in pitch period detection accuracy over existing methods, and illustrate its advantages for speech segmentation. These advantages are more pronounced for gain-varying and transitional speech, and under noisy conditions

University of New Orleans

Perceptual models in speech quality assessment and coding

Author: Vasos E. Savvides (7202612)
Publication venue
Publication date: 01/01/1988
Field of study

The ever-increasing demand for good communications/toll quality speech has created a renewed interest into the perceptual impact of rate compression. Two general areas are investigated in this work, namely speech quality assessment and speech coding. In the field of speech quality assessment, a model is developed which simulates the processing stages of the peripheral auditory system. At the output of the model a "running" auditory spectrum is obtained. This represents the auditory (spectral) equivalent of any acoustic sound such as speech. Auditory spectra from coded speech segments serve as inputs to a second model. This model simulates the information centre in the brain which performs the speech quality assessment. [Continues.

Loughborough University Institutional Repository

Interaction of Working Memory, Compressor Speed and Background Noise Characteristics

Author: MacDonald Ewen
Ohlenforst Barbara
Souza Pamela
Publication venue
Publication date: 01/01/2014
Field of study

Online Research Database In Technology

Efficient multipulse approximation of speech excitation using the most singular manifold

Author: Khalid Daoudi
Khanagha Vahid
Publication venue: HAL CCSD
Publication date: 02/04/2012
Field of study

INTERSPEECH 2012We propose a novel approach to find the locations of the multipulse sequence that approximates the speech source excitation. This approach is based on the notion of Most Singular Manifold (MSM) which is associated to the set of less predictable events. The MSM is formed by identifying (directly from the speech waveform) multiscale singularities which may correspond to significant impulsive excitations of the vocal tract. This identification is done through a multiscale measure of local predictability and the estimation of its associated singularity exponents. Once the pulse locations are found using the MSM, their amplitudes are computed using the second stage of the classical MultiPulse Excitation (MPE) coder. The multipulse sequence is then fed to the classical LPC synthesizer to reconstruct speech. The resulting MSM-based algorithm is shown to be significantly more efficient than MPE. We evaluate our algorithm using 1 hour of speech from the TIMIT database and compare its performances to MPE and a recent approach based on compressed sensing (CS). The results show that our algorithm yields similar perceptual quality as MPE and outperforms the CS method when the number of pulses is low

INRIA a CCSD electronic archive server

Sparsity in Linear Predictive Coding of Speech

Author: Giacobello Daniele
Publication venue: Multimedia Information and Signal Processing, Institute of Electronic Systems, Aalborg University
Publication date: 01/01/2010
Field of study

nrpages: 197status: publishe

Lirias

VBN

Low Delay Sparse and Mixed Excitation CELP Coders for Wideband Speech Coding

Author: Dymarski Przemyslaw Grzegorz
Publication venue: Electronics and Telecommunications Committee
Publication date: 01/01/2020
Field of study

Code Excited Linear Prediction (CELP) algorithmsare proposed for compression of speech in 8 kHz band atswitched or variable bit rate and algorithmic delay not exceeding2 msec. Two structures of Low-Delay CELP coders are analyzed:Low-delay sparse excitation and mixed excitation CELP. Sparseexcitation is based on MP-MLQ and multilayer models. Mixedexcitation CELP algorithm stems from the narrowband G.728standard. As opposed to G.728 LD-CELP coder, mixed excitationcodebook consists of pseudorandom vectors and sequencesobtained with Long-Term Prediction (LTP). Variable rate codingconsists in maximizing vector dimension while keeping therequired speech quality. Good speech quality (MOS=3.9according to PESQ algorithm) is obtained at average bit rate 33.5kbit/sec

Biblioteka Nauki - repozytorium artykuÅÃ³w

International Journal of Electronics and Telecommunications (Warsaw University of Technology)

Design of a smartphone with a Digital Signal Processor

Author: Lecluse Joep
Publication venue
Publication date: 01/01/1996
Field of study

Repository TU/e

Pure OAI Repository

New Directions in Subband Coding

Author: Cox R. V.
Grant Steven L.
Jayant N. S.
Quackenbush S. R.
Seshadri N.
Shoham Y.
Publication venue: Scholars\u27 Mine
Publication date: 01/01/1988
Field of study

Two very different subband coders are described. The first is a modified dynamic bit-allocation-subband coder (D-SBC) designed for variable rate coding situations and easily adaptable to noisy channel environments. It can operate at rates as low as 12 kb/s and still give good quality speech. The second coder is a 16-kb/s waveform coder, based on a combination of subband coding and vector quantization (VQ-SBC). The key feature of this coder is its short coding delay, which makes it suitable for real-time communication networks. The speech quality of both coders has been enhanced by adaptive postfiltering. The coders have been implemented on a single AT&T DSP32 signal processo

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

DeepVoCoder: A CNN model for compression and coding of narrow band speech

Author: Ilk Hakki Gokhan
Keles Hacer Yalim
Rozhon Jan
Vozňák Miroslav
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

This paper proposes a convolutional neural network (CNN)-based encoder model to compress and code speech signal directly from raw input speech. Although the model can synthesize wideband speech by implicit bandwidth extension, narrowband is preferred for IP telephony and telecommunications purposes. The model takes time domain speech samples as inputs and encodes them using a cascade of convolutional filters in multiple layers, where pooling is applied after some layers to downsample the encoded speech by half. The final bottleneck layer of the CNN encoder provides an abstract and compact representation of the speech signal. In this paper, it is demonstrated that this compact representation is sufficient to reconstruct the original speech signal in high quality using the CNN decoder. This paper also discusses the theoretical background of why and how CNN may be used for end-to-end speech compression and coding. The complexity, delay, memory requirements, and bit rate versus quality are discussed in the experimental results.Web of Science7750897508

DSpace at VSB Technical University of Ostrava