Search CORE

4 research outputs found

Revisiting the Linear Prediction Analysis-by-Synthesis Speech Coding Paradigm using Real-time Convex Optimization

Author: Christensen Mads Græsbøll
Giacobello Daniele
Jensen Tobias Lindstrøm
Murthi Manohar
Publication venue
Publication date: 01/11/2018
Field of study

Crossref

VBN

Fast Algorithms for High-Order Sparse Linear Prediction with Applications to Speech Processing

Author: Christensen Mads Græsbøll
Giacobello Daniele
Jensen Tobias Lindstrøm
van Waterschoot Toon
Publication venue: 'Elsevier BV'
Publication date: 01/02/2016
Field of study

Crossref

VBN

Bayes meets Bach: applications of Bayesian statistics to audio restoration

Author: Carvalho Hugo Tremonte de
Publication venue: 'Programa de Pos-graduacao em Ciencias Contabeis da UFRJ'
Publication date: 01/01/2017
Field of study

Memoryless nonlinear distortion can be present in audio signals, from recording to reproduction: bad quality or amateurishly operated equipments, physically degraded media and low quality reproducing devices are some examples where nonlinearities can naturally appear. Another quite common defect in old recordings are the long pulses, caused in general by the reproduction of disks with deep scratches or severely degraded magnetic tapes. Such defects are characterized by an initial discontinuity in the waveform, followed by a low-frequency transient of long duration. In both cases audible artifacts can be created, causing an unpleasant experience to the listener. It is then important to develop techniques to mitigate such defects, having at hand only the degraded signal, in a way to recover the original signal. In this thesis, techniques to deal with both problems are presented: the restoration of nonlinearly degraded recordings is tackled in a Bayesian context, considering both autoregressive models and sparsity in the DCT domain for the original signal, as well as through a deterministic solution also based on sparsity; for the suppression of long pulses, a parametric approach is revisited with the addition of an efficient initialization procedure, and a nonparametric modeling via Gaussian process is also presented.Distorções não-lineares podem aparecer em sinais de áudio desde o momento da sua gravação até a posterior reprodução: equipamentos precários ou operados de maneira indevida, mídias fisicamente degradadas e baixa qualidade dos aparelhos de reprodução são somente alguns exemplos onde não-linearidades podem aparecer de modo natural. Outro defeito bastante comum em gravações antigas são os pulsos longos, em geral causados pela reprodução de discos com arranhões muito profundos ou fitas magnéticas severamente degradadas. Tais defeitos são caracterizados por uma descontinuidade inicial na forma de onda, seguida de um transitório de baixa frequência e longa duração. Em ambos os casos, artefatos auditivos podem ser criados, causando assim uma experiência ruim para o ouvinte. E importante então desenvolver técnicas para mitigar tais efeitos, tendo como base somente uma versão do sinal degradado, de modo a recuperar o sinal original não degradado. Nessa tese são apresentadas técnicas para lidar com esses dois problemas: o problema de restaurar gravações corrompidas com distorções não-lineares é abordado em um contexto bayesiano, considerando tanto modelos autorregressivos quanto de esparsidade no domínio da DCT para o sinal original, bem como por uma solução determinística também em usando esparsidade; para a supressão de pulsos longos, uma abordagem paramétrica é revisitada, junto com o acréscimo de um eficiente procedimento de inicialização, sendo também apresentada uma abordagem não-paramétricausando processos gaussianos

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Pantheon

Embedded Optimization Algorithms for Perceptual Enhancement of Audio Signals (Ingebedde optimalisatie-algoritmes voor de perceptuele verbetering van geluidssignalen)

Author: Defraene Bruno
Publication venue
Publication date: 20/12/2013
Field of study

This thesis investigates the design and evaluation of an embedded optimization framework for the perceptual enhancement of audio signals which are degraded by linear and/or nonlinear distortion. In general, audio signal enhancement has the goal to improve the perceived audio quality, speech intelligibility, or another desired perceptual attribute of the distorted audio signal by applying a real-time digital signal processing algorithm. In the designed embedded optimization framework, the audio signal enhancement problem under consideration is formulated and solved as a per-frame numerical optimization problem, allowing to compute the enhanced audio signal frame that is optimal according to a desired perceptual attribute. The first stage of the embedded optimization framework consists in the formulation of the per-frame optimization problem aimed at maximally enhancing the desired perceptual attribute, by explicitly incorporating a suitable model of human sound perception. The second stage of the embedded optimization framework consists in the on-line solution of the formulated per-frame optimization problem, by using a fast and reliable optimization method that exploits the inherent structure of the optimization problem. This embedded optimization framework is applied to four commonly encountered and challenging audio signal enhancement problems, namely hard clipping precompensation, loudspeaker precompensation, declipping and multi-microphone dereverberation. The first part of this thesis focuses on precompensation algorithms, in which the audio signal enhancement operation is applied before the distortion process affects the audio signal. More specifically, the problems of hard clipping precompensation and loudspeaker precompensation are tackled in the embedded optimization framework. In the context of hard clipping precompensation, an objective function reflecting the perceptible nonlinear hard clipping distortion is constructed by including frequency weights based on the instantaneous masking threshold, which is computed on a frame-by frame basis by applying a perceptual model. The resulting per-frame convex quadratic optimization problems are solved efficiently using an optimal projected gradient method, for which theoretical complexity bounds are derived. Moreover, a fixed-point hardware implementation of this optimal projected gradient method on a field programmable gate array (FPGA) shows the algorithm to be capable to run in real time and without perceptible audio quality loss on a small and portable audio device. In the context of loudspeaker precompensation, an objective function reflecting the perceptible combined linear and nonlinear loudspeaker distortion is constructed in a similar fashion as for hard clipping precompensation. The loudspeaker is modeled using a Hammerstein loudspeaker model, i.e. a cascade of a memoryless nonlinearity and a linear FIR filter. The resulting per-frame nonconvex optimization problems are solved efficiently using gradient optimization methods which exploit knowledge on the invertibility and the smoothness of the memoryless nonlinearity in the Hammerstein loudspeaker model. From objective and subjective evaluation experiments, it is concluded with statistical significance that the embedded optimization algorithms for hard clipping and loudspeaker precompensation improve the resulting audio quality when compared to standard precompensation algorithms.The second part of this thesis focuses on recovery algorithms, in which the audio signal enhancement operation is applied after the distortion process affects the audio signal. More specifically, the problems of declipping and multi-microphone dereverberation are tackled in the embedded optimization framework. Declipping is formulated as a sparse signal recovery problem where the recovery is performed by solving a per-frame l1-norm minimization problem, which includes frequency weights based on the instantaneous masking threshold. As a result, the declipping algorithm is focused on maximizing the perceived audio quality instead of the physical signal reconstruction quality of the declipped audio signal. Comparative objective and subjective evaluation experiments reveal with statistical significance that the proposed embedded optimization declipping algorithm improves the resulting audio quality compared to existing declipping algorithms. Multi-microphone dereverberation is formulated as a nonconvex optimization problem, allowing for the joint estimation of the clean audio signal and the room acoustics model parameters. It is shown that the nonconvex optimization problem can be smoothed by including regularization terms based on a statistical late reverberation model and a sparsity prior for the clean audio signal, which is demonstrated to improve the dereverberation performance.I INTRODUCTION -------------- 1. Introduction II PRECOMPENSATION ALGORITHMS ----------------------------- 2. Hard Clipping Precompensation 3. Loudspeaker Precompensation 4. Subjective Audio Quality Evaluation 5. Embedded Hardware Implementation III RECOVERY ALGORITHMS ----------------------- 6. Declipping Using Perceptual Compressed Sensing 7. Multi-Microphone Dereverberation 8. Conclusions and Suggestions for Future Researchnrpages: 221status: publishe

Lirias