285 research outputs found

    Blind Single Channel Deconvolution using Nonstationary Signal Processing

    Get PDF

    Gramophone noise detection and reconstruction using time delay artificial neural networks

    Get PDF
    Gramophone records were the main recording medium for more than seven decades and regained widespread popularity over the past several years. Being an analog storage medium, gramophone records are subject to distortions caused by scratches, dust particles, degradation, and other means of improper handling. The observed noise often leads to an unpleasant listening experience and requires a filtering process to remove the unwanted disruptions and improve the audio quality. This paper proposes a novel approach that employs various feed forward time delay artificial neural networks to detect and reconstruct noise in musical sound waves. A set of 800 songs from eight different genres were used to validate the performance of the neural networks. The performance was analyzed according to the outlier detection and interpolation accuracy, the computational time and the tradeoff between the accuracy and the time. The empirical results of both detection and reconstruction neural networks were compared to a number of other algorithms, including various statistical measurements, duplication approaches, trigonometric processes, polynomials, and time series models. It was found that the neural networks' outlier detection accuracy was slightly lower than some of the other noise identification algorithms, but achieved a more efficient tradeoff by detecting most of the noise in real time. The reconstruction process favored neural networks with an increase in the interpolation accuracy compared to other widely used time series models. It was also found that certain genres such as classical, country, and jazz music were interpolated more accurately. Volatile signals, such as electronic, metal, and pop music were more challenging to reconstruct and were substantially better interpolated using neural networks than the other examined algorithms.http://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6221021hj2017Computer Scienc

    Zero-Shot Blind Audio Bandwidth Extension

    Full text link
    Audio bandwidth extension involves the realistic reconstruction of high-frequency spectra from bandlimited observations. In cases where the lowpass degradation is unknown, such as in restoring historical audio recordings, this becomes a blind problem. This paper introduces a novel method called BABE (Blind Audio Bandwidth Extension) that addresses the blind problem in a zero-shot setting, leveraging the generative priors of a pre-trained unconditional diffusion model. During the inference process, BABE utilizes a generalized version of diffusion posterior sampling, where the degradation operator is unknown but parametrized and inferred iteratively. The performance of the proposed method is evaluated using objective and subjective metrics, and the results show that BABE surpasses state-of-the-art blind bandwidth extension baselines and achieves competitive performance compared to non-blind filter-informed methods when tested with synthetic data. Moreover, BABE exhibits robust generalization capabilities when enhancing real historical recordings, effectively reconstructing the missing high-frequency content while maintaining coherence with the original recording. Subjective preference tests confirm that BABE significantly improves the audio quality of historical music recordings. Examples of historical recordings restored with the proposed method are available on the companion webpage: (http://research.spa.aalto.fi/publications/papers/ieee-taslp-babe/)Comment: Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processin

    Doing Experimental Media Archaeology

    Get PDF
    The hands-on and experimental approach of DEMA offers the unique opportunity to ‘grasp’ media and communication technologies in their concrete materiality and tangibility and to (re)-sensitize historians and communication scholars for the material qualities and performative dimension of past media devices and practices

    Doing Experimental Media Archaeology

    Get PDF
    The hands-on and experimental approach of DEMA offers the unique opportunity to ‘grasp’ media and communication technologies in their concrete materiality and tangibility and to (re)-sensitize historians and communication scholars for the material qualities and performative dimension of past media devices and practices

    Inpainting of Missing Audio Signal Samples

    Get PDF
    V oblasti zpracovĂĄnĂ­ signĂĄlĆŻ se v současnĂ© době čím dĂĄl vĂ­ce vyuĆŸĂ­vajĂ­ tzv. ƙídkĂ© reprezentace signĂĄlĆŻ, tzn. ĆŸe danĂœ signĂĄl je moĆŸnĂ© vyjĂĄdƙit pƙesně či velmi dobƙe aproximovat lineĂĄrnĂ­ kombinacĂ­ velmi malĂ©ho počtu vektorĆŻ ze zvolenĂ©ho reprezentačnĂ­ho systĂ©mu. Tato prĂĄce se zabĂœvĂĄ vyuĆŸitĂ­m ƙídkĂœch reprezentacĂ­ pro rekonstrukci poĆĄkozenĂœch zvukovĂœch zĂĄznamĆŻ, aĆ„ uĆŸ historickĂœch nebo nově vzniklĂœch. PƙedevĆĄĂ­m historickĂ© zvukovĂ© nahrĂĄvky trpĂ­ zaruĆĄenĂ­m jako praskĂĄnĂ­ nebo ĆĄum. KrĂĄtkodobĂ© poĆĄkozenĂ­ zvukovĂœch nahrĂĄvek bylo doposud ƙeĆĄeno interpolačnĂ­mi technikami, zejmĂ©na pomocĂ­ autoregresnĂ­ho modelovĂĄnĂ­. V nedĂĄvnĂ© době byl pƙedstaven algoritmus s nĂĄzvem Audio Inpainting, kterĂœ ƙeĆĄĂ­ doplƈovĂĄnĂ­ chybějĂ­cĂ­ch vzorkĆŻ ve zvukovĂ©m signĂĄlu pomocĂ­ ƙídkĂœch reprezentacĂ­. ZmĂ­něnĂœ algoritmus vyuĆŸĂ­vĂĄ tzv. hladovĂ© algoritmy pro ƙeĆĄenĂ­ optimalizačnĂ­ch Ășloh. CĂ­lem tĂ©to prĂĄce je porovnĂĄnĂ­ dosavadnĂ­ch interpolačnĂ­ch metod s technikou Audio Inpaintingu. NavĂ­c, k ƙeĆĄenĂ­ optimalizačnĂ­ch Ășloh jsou vyuĆŸĂ­vĂĄny algoritmy zaloĆŸenĂ© na l1-relaxaci, a to jak ve formě analyzujĂ­cĂ­ho, tak i syntetizujĂ­cĂ­ho modelu. PƙedevĆĄĂ­m se jednĂĄ o proximĂĄlnĂ­ algoritmy. Tyto algoritmy pracujĂ­ jak s jednotlivĂœmi koeficienty samostatně, tak s koeficienty v zĂĄvislosti na jejich okolĂ­, tzv. strukturovanĂĄ ƙídkost. StrukturovanĂĄ ƙídkost je dĂĄle vyuĆŸita taky pro odĆĄumovĂĄnĂ­ zvukovĂœch nahrĂĄvek. JednotlivĂ© algoritmy jsou v praktickĂ© části zhodnoceny z hlediska nastavenĂ­ parametrĆŻ pro optimĂĄlnĂ­ poměr rekonstrukce vs. vĂœpočetnĂ­ čas. VĆĄechny algoritmy popsanĂ© v prĂĄci jsou na praktickĂœch pƙíkladech porovnĂĄny pomocĂ­ objektivnĂ­ch metod odstupu signĂĄlu od ĆĄumu (SNR) a PEMO-Q. Na zĂĄvěr je Ășspěơnost rekonstrukce poĆĄkozenĂœch zvukovĂœch signĂĄlĆŻ vyhodnocena.Recently, sparse representations of signals became very popular in the field of signal processing. Sparse representation mean that the signal is represented exactly or very well approximated by a linear combination of only a few vectors from the specific representation system. This thesis deals with the utilization of sparse representations of signals for the process of audio restoration, either historical or recent records. Primarily old audio recordings suffer from defects like crackles or noise. Until now, short gaps in audio signals were repaired by interpolation techniques, especially autoregressive modeling. Few years ago, an algorithm termed the Audio Inpainting was introduced. This algorithm solves the missing audio signal samples inpainting using sparse representations through the greedy algorithm for sparse approximation. This thesis aims to compare the state-of-the-art interpolation methods with the Audio Inpainting. Besides this, l1-relaxation methods are utilized for sparse approximation, while both analysis and synthesis models are incorporated. Algorithms used for the sparse approximation are called the proximal algorithms. These algorithms treat the coefficients either separately or with relations to their neighbourhood (structured sparsity). Further, structured sparsity is used for audio denoising. In the experimental part of the thesis, parameters of each algorithm are evaluated in terms of optimal restoration efficiency vs. processing time efficiency. All of the algorithms described in the thesis are compared using objective evaluation methods Signal-to-Noise ratio (SNR) and PEMO-Q. Finally, the overall conclusion and discussion on the restoration results is presented.

    Exploring visual representation of sound in computer music software through programming and composition

    Get PDF
    Presented through contextualisation of the portfolio works are developments of a practice in which the acts of programming and composition are intrinsically connected. This practice-based research (conducted 2009–2013) explores visual representation of sound in computer music software. Towards greater understanding of composing with the software medium, initial questions are taken as stimulus to explore the subject through artistic practice and critical thinking. The project begins by asking: How might the ways in which sound is visually represented influence the choices that are made while those representations are being manipulated and organised as music? Which aspects of sound are represented visually, and how are those aspects shown? Recognising sound as a psychophysical phenomenon, the physical and psychological aspects of aesthetic interest to my work are identified. Technological factors of mediating these aspects for the interactive visual-domain of software are considered, and a techno-aesthetic understanding developed. Through compositional studies of different approaches to the problem of looking at sound in software, on screen, a number of conceptual themes emerge in this work: the idea of software as substance, both as a malleable material (such as in live coding), and in terms of outcome artefacts; the direct mapping between audio data and screen pixels; the use of colour that maintains awareness of its discrete (as opposed to continuous) basis; the need for integrated display of parameter controls with their target data; and the tildegraph concept that began as a conceptual model of a gramophone and which is a spatio-visual sound synthesis technique related to wave terrain synthesis. The spiroid-frequency-space representation is introduced, contextualised, and combined both with those themes and a bespoke geometrical drawing system (named thisis), to create a new modular computer music software environment named sdfsys

    Blind dereverberation of speech from moving and stationary speakers using sequential Monte Carlo methods

    Get PDF
    Speech signals radiated in confined spaces are subject to reverberation due to reflections of surrounding walls and obstacles. Reverberation leads to severe degradation of speech intelligibility and can be prohibitive for applications where speech is digitally recorded, such as audio conferencing or hearing aids. Dereverberation of speech is therefore an important field in speech enhancement. Driven by consumer demand, blind speech dereverberation has become a popular field in the research community and has led to many interesting approaches in the literature. However, most existing methods are dictated by their underlying models and hence suffer from assumptions that constrain the approaches to specific subproblems of blind speech dereverberation. For example, many approaches limit the dereverberation to voiced speech sounds, leading to poor results for unvoiced speech. Few approaches tackle single-sensor blind speech dereverberation, and only a very limited subset allows for dereverberation of speech from moving speakers. Therefore, the aim of this dissertation is the development of a flexible and extendible framework for blind speech dereverberation accommodating different speech sound types, single- or multiple sensor as well as stationary and moving speakers. Bayesian methods benefit from – rather than being dictated by – appropriate model choices. Therefore, the problem of blind speech dereverberation is considered from a Bayesian perspective in this thesis. A generic sequential Monte Carlo approach accommodating a multitude of models for the speech production mechanism and room transfer function is consequently derived. In this approach both the anechoic source signal and reverberant channel are estimated using their optimal estimators by means of Rao-Blackwellisation of the state-space of unknown variables. The remaining model parameters are estimated using sequential importance resampling. The proposed approach is implemented for two different speech production models for stationary speakers, demonstrating substantial reduction in reverberation for both unvoiced and voiced speech sounds. Furthermore, the channel model is extended to facilitate blind dereverberation of speech from moving speakers. Due to the structure of measurement model, single- as well as multi-microphone processing is facilitated, accommodating physically constrained scenarios where only a single sensor can be used as well as allowing for the exploitation of spatial diversity in scenarios where the physical size of microphone arrays is of no concern. This dissertation is concluded with a survey of possible directions for future research, including the use of switching Markov source models, joint target tracking and enhancement, as well as an extension to subband processing for improved computational efficiency
    • 

    corecore