Search CORE

285 research outputs found

Blind Single Channel Deconvolution using Nonstationary Signal Processing

Author: Hopgood James
Rayner Peter J. W.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2003
Field of study

Gramophone noise detection and reconstruction using time delay artificial neural networks

Author: Engelbrecht Andries P.
Stallmann Christoph F.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2017
Field of study

Gramophone records were the main recording medium for more than seven decades and regained widespread popularity over the past several years. Being an analog storage medium, gramophone records are subject to distortions caused by scratches, dust particles, degradation, and other means of improper handling. The observed noise often leads to an unpleasant listening experience and requires a filtering process to remove the unwanted disruptions and improve the audio quality. This paper proposes a novel approach that employs various feed forward time delay artificial neural networks to detect and reconstruct noise in musical sound waves. A set of 800 songs from eight different genres were used to validate the performance of the neural networks. The performance was analyzed according to the outlier detection and interpolation accuracy, the computational time and the tradeoff between the accuracy and the time. The empirical results of both detection and reconstruction neural networks were compared to a number of other algorithms, including various statistical measurements, duplication approaches, trigonometric processes, polynomials, and time series models. It was found that the neural networks' outlier detection accuracy was slightly lower than some of the other noise identification algorithms, but achieved a more efficient tradeoff by detecting most of the noise in real time. The reconstruction process favored neural networks with an increase in the interpolation accuracy compared to other widely used time series models. It was also found that certain genres such as classical, country, and jazz music were interpolated more accurately. Volatile signals, such as electronic, metal, and pop music were more challenging to reconstruct and were substantially better interpolated using neural networks than the other examined algorithms.http://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6221021hj2017Computer Scienc

Crossref

UPSpace at the University of Pretoria

Zero-Shot Blind Audio Bandwidth Extension

Author: Elvander Filip
Moliner Eloi
Välimäki Vesa
Publication venue
Publication date: 02/06/2023
Field of study

Audio bandwidth extension involves the realistic reconstruction of high-frequency spectra from bandlimited observations. In cases where the lowpass degradation is unknown, such as in restoring historical audio recordings, this becomes a blind problem. This paper introduces a novel method called BABE (Blind Audio Bandwidth Extension) that addresses the blind problem in a zero-shot setting, leveraging the generative priors of a pre-trained unconditional diffusion model. During the inference process, BABE utilizes a generalized version of diffusion posterior sampling, where the degradation operator is unknown but parametrized and inferred iteratively. The performance of the proposed method is evaluated using objective and subjective metrics, and the results show that BABE surpasses state-of-the-art blind bandwidth extension baselines and achieves competitive performance compared to non-blind filter-informed methods when tested with synthetic data. Moreover, BABE exhibits robust generalization capabilities when enhancing real historical recordings, effectively reconstructing the missing high-frequency content while maintaining coherence with the original recording. Subjective preference tests confirm that BABE significantly improves the audio quality of historical music recordings. Examples of historical recordings restored with the proposed method are available on the companion webpage: (http://research.spa.aalto.fi/publications/papers/ieee-taslp-babe/)Comment: Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processin

arXiv.org e-Print Archive

Doing Experimental Media Archaeology

Author: Kolkowski Aleksander
van der Heijden Tim
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 31/12/2022
Field of study

The hands-on and experimental approach of DEMA offers the unique opportunity to ‘grasp’ media and communication technologies in their concrete materiality and tangibility and to (re)-sensitize historians and communication scholars for the material qualities and performative dimension of past media devices and practices

Open University of the Netherlands Research Portal

Directory of Open Access Books (DOAB)

Doing Experimental Media Archaeology

Author: Kolkowski Aleksander
van der Heijden Tim
Publication venue: 'Walter de Gruyter GmbH'
Publication date
Field of study

OAPEN Library

Inpainting of Missing Audio Signal Samples

Author: Mach Václav
Publication venue: Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií
Publication date: 01/01/2016
Field of study

V oblasti zpracování signálů se v současné době čím dál více využívají tzv. řídké reprezentace signálů, tzn. že daný signál je možné vyjádřit přesně či velmi dobře aproximovat lineární kombinací velmi malého počtu vektorů ze zvoleného reprezentačního systému. Tato práce se zabývá využitím řídkých reprezentací pro rekonstrukci poškozených zvukových záznamů, ať už historických nebo nově vzniklých. Především historické zvukové nahrávky trpí zarušením jako praskání nebo šum. Krátkodobé poškození zvukových nahrávek bylo doposud řešeno interpolačními technikami, zejména pomocí autoregresního modelování. V nedávné době byl představen algoritmus s názvem Audio Inpainting, který řeší doplňování chybějících vzorků ve zvukovém signálu pomocí řídkých reprezentací. Zmíněný algoritmus využívá tzv. hladové algoritmy pro řešení optimalizačních úloh. Cílem této práce je porovnání dosavadních interpolačních metod s technikou Audio Inpaintingu. Navíc, k řešení optimalizačních úloh jsou využívány algoritmy založené na l1-relaxaci, a to jak ve formě analyzujícího, tak i syntetizujícího modelu. Především se jedná o proximální algoritmy. Tyto algoritmy pracují jak s jednotlivými koeficienty samostatně, tak s koeficienty v závislosti na jejich okolí, tzv. strukturovaná řídkost. Strukturovaná řídkost je dále využita taky pro odšumování zvukových nahrávek. Jednotlivé algoritmy jsou v praktické části zhodnoceny z hlediska nastavení parametrů pro optimální poměr rekonstrukce vs. výpočetní čas. Všechny algoritmy popsané v práci jsou na praktických příkladech porovnány pomocí objektivních metod odstupu signálu od šumu (SNR) a PEMO-Q. Na závěr je úspěšnost rekonstrukce poškozených zvukových signálů vyhodnocena.Recently, sparse representations of signals became very popular in the field of signal processing. Sparse representation mean that the signal is represented exactly or very well approximated by a linear combination of only a few vectors from the specific representation system. This thesis deals with the utilization of sparse representations of signals for the process of audio restoration, either historical or recent records. Primarily old audio recordings suffer from defects like crackles or noise. Until now, short gaps in audio signals were repaired by interpolation techniques, especially autoregressive modeling. Few years ago, an algorithm termed the Audio Inpainting was introduced. This algorithm solves the missing audio signal samples inpainting using sparse representations through the greedy algorithm for sparse approximation. This thesis aims to compare the state-of-the-art interpolation methods with the Audio Inpainting. Besides this, l1-relaxation methods are utilized for sparse approximation, while both analysis and synthesis models are incorporated. Algorithms used for the sparse approximation are called the proximal algorithms. These algorithms treat the coefficients either separately or with relations to their neighbourhood (structured sparsity). Further, structured sparsity is used for audio denoising. In the experimental part of the thesis, parameters of each algorithm are evaluated in terms of optimal restoration efficiency vs. processing time efficiency. All of the algorithms described in the thesis are compared using objective evaluation methods Signal-to-Noise ratio (SNR) and PEMO-Q. Finally, the overall conclusion and discussion on the restoration results is presented.

Digital library of Brno University of Technology

National Repository of Grey Literature

Exploring visual representation of sound in computer music software through programming and composition

Author: Freeman Samuel David
Publication venue
Publication date
Field of study

Presented through contextualisation of the portfolio works are developments of a practice in which the acts of programming and composition are intrinsically connected. This practice-based research (conducted 2009–2013) explores visual representation of sound in computer music software. Towards greater understanding of composing with the software medium, initial questions are taken as stimulus to explore the subject through artistic practice and critical thinking. The project begins by asking: How might the ways in which sound is visually represented influence the choices that are made while those representations are being manipulated and organised as music? Which aspects of sound are represented visually, and how are those aspects shown? Recognising sound as a psychophysical phenomenon, the physical and psychological aspects of aesthetic interest to my work are identified. Technological factors of mediating these aspects for the interactive visual-domain of software are considered, and a techno-aesthetic understanding developed. Through compositional studies of different approaches to the problem of looking at sound in software, on screen, a number of conceptual themes emerge in this work: the idea of software as substance, both as a malleable material (such as in live coding), and in terms of outcome artefacts; the direct mapping between audio data and screen pixels; the use of colour that maintains awareness of its discrete (as opposed to continuous) basis; the need for integrated display of parameter controls with their target data; and the tildegraph concept that began as a conceptual model of a gramophone and which is a spatio-visual sound synthesis technique related to wave terrain synthesis. The spiroid-frequency-space representation is introduced, contextualised, and combined both with those themes and a bespoke geometrical drawing system (named thisis), to create a new modular computer music software environment named sdfsys

University of Huddersfield Repository

Nonstationary Signal Processing with Application to Reverberation Cancellation in Acoustic Environments

Author: Hopgood James
Publication venue
Publication date: 01/04/2001
Field of study

Edinburgh Research Explorer

Blind dereverberation of speech from moving and stationary speakers using sequential Monte Carlo methods

Author: Evers Christine
Publication venue: The University of Edinburgh
Publication date: 01/01/2010
Field of study

Speech signals radiated in confined spaces are subject to reverberation due to reflections of surrounding walls and obstacles. Reverberation leads to severe degradation of speech intelligibility and can be prohibitive for applications where speech is digitally recorded, such as audio conferencing or hearing aids. Dereverberation of speech is therefore an important field in speech enhancement. Driven by consumer demand, blind speech dereverberation has become a popular field in the research community and has led to many interesting approaches in the literature. However, most existing methods are dictated by their underlying models and hence suffer from assumptions that constrain the approaches to specific subproblems of blind speech dereverberation. For example, many approaches limit the dereverberation to voiced speech sounds, leading to poor results for unvoiced speech. Few approaches tackle single-sensor blind speech dereverberation, and only a very limited subset allows for dereverberation of speech from moving speakers. Therefore, the aim of this dissertation is the development of a flexible and extendible framework for blind speech dereverberation accommodating different speech sound types, single- or multiple sensor as well as stationary and moving speakers. Bayesian methods benefit from – rather than being dictated by – appropriate model choices. Therefore, the problem of blind speech dereverberation is considered from a Bayesian perspective in this thesis. A generic sequential Monte Carlo approach accommodating a multitude of models for the speech production mechanism and room transfer function is consequently derived. In this approach both the anechoic source signal and reverberant channel are estimated using their optimal estimators by means of Rao-Blackwellisation of the state-space of unknown variables. The remaining model parameters are estimated using sequential importance resampling. The proposed approach is implemented for two different speech production models for stationary speakers, demonstrating substantial reduction in reverberation for both unvoiced and voiced speech sounds. Furthermore, the channel model is extended to facilitate blind dereverberation of speech from moving speakers. Due to the structure of measurement model, single- as well as multi-microphone processing is facilitated, accommodating physically constrained scenarios where only a single sensor can be used as well as allowing for the exploitation of spatial diversity in scenarios where the physical size of microphone arrays is of no concern. This dissertation is concluded with a survey of possible directions for future research, including the use of switching Markov source models, joint target tracking and enhancement, as well as an extension to subband processing for improved computational efficiency

Edinburgh Research Archive