285 research outputs found
Gramophone noise detection and reconstruction using time delay artificial neural networks
Gramophone records were the main recording medium for more than seven decades and regained widespread popularity over the past several years. Being an analog storage medium, gramophone records are subject to distortions caused by scratches, dust particles, degradation, and other means of improper handling. The observed noise often leads to an unpleasant listening experience and requires a filtering process to remove the unwanted disruptions and improve the audio quality. This paper proposes a novel approach that employs various feed forward time delay artificial neural networks to detect and reconstruct noise in musical sound waves. A set of 800 songs from eight different genres were used to validate the performance of the neural networks. The performance was analyzed according to the outlier detection and interpolation accuracy, the computational time and the tradeoff between the accuracy and the time. The empirical results of both detection and reconstruction neural networks were compared to a number of other algorithms, including various statistical measurements, duplication approaches, trigonometric processes, polynomials, and time series models. It was found that the neural networks' outlier detection accuracy was slightly lower than some of the other noise identification algorithms, but achieved a more efficient tradeoff by detecting most of the noise in real time. The reconstruction process favored neural networks with an increase in the interpolation accuracy compared to other widely used time series models. It was also found that certain genres such as classical, country, and jazz music were interpolated more accurately. Volatile signals, such as electronic, metal, and pop music were more challenging to reconstruct and were substantially better interpolated using neural networks than the other examined algorithms.http://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6221021hj2017Computer Scienc
Zero-Shot Blind Audio Bandwidth Extension
Audio bandwidth extension involves the realistic reconstruction of
high-frequency spectra from bandlimited observations. In cases where the
lowpass degradation is unknown, such as in restoring historical audio
recordings, this becomes a blind problem. This paper introduces a novel method
called BABE (Blind Audio Bandwidth Extension) that addresses the blind problem
in a zero-shot setting, leveraging the generative priors of a pre-trained
unconditional diffusion model. During the inference process, BABE utilizes a
generalized version of diffusion posterior sampling, where the degradation
operator is unknown but parametrized and inferred iteratively. The performance
of the proposed method is evaluated using objective and subjective metrics, and
the results show that BABE surpasses state-of-the-art blind bandwidth extension
baselines and achieves competitive performance compared to non-blind
filter-informed methods when tested with synthetic data. Moreover, BABE
exhibits robust generalization capabilities when enhancing real historical
recordings, effectively reconstructing the missing high-frequency content while
maintaining coherence with the original recording. Subjective preference tests
confirm that BABE significantly improves the audio quality of historical music
recordings. Examples of historical recordings restored with the proposed method
are available on the companion webpage:
(http://research.spa.aalto.fi/publications/papers/ieee-taslp-babe/)Comment: Submitted to IEEE/ACM Transactions on Audio, Speech and Language
Processin
Doing Experimental Media Archaeology
The hands-on and experimental approach of DEMA offers the unique opportunity to âgraspâ media and communication technologies in their concrete materiality and tangibility and to (re)-sensitize historians and communication scholars for the material qualities and performative dimension of past media devices and practices
Doing Experimental Media Archaeology
The hands-on and experimental approach of DEMA offers the unique opportunity to âgraspâ media and communication technologies in their concrete materiality and tangibility and to (re)-sensitize historians and communication scholars for the material qualities and performative dimension of past media devices and practices
Inpainting of Missing Audio Signal Samples
V oblasti zpracovĂĄnĂ signĂĄlĆŻ se v souÄasnĂ© dobÄ ÄĂm dĂĄl vĂce vyuĆŸĂvajĂ tzv. ĆĂdkĂ© reprezentace signĂĄlĆŻ, tzn. ĆŸe danĂœ signĂĄl je moĆŸnĂ© vyjĂĄdĆit pĆesnÄ Äi velmi dobĆe aproximovat lineĂĄrnĂ kombinacĂ velmi malĂ©ho poÄtu vektorĆŻ ze zvolenĂ©ho reprezentaÄnĂho systĂ©mu. Tato prĂĄce se zabĂœvĂĄ vyuĆŸitĂm ĆĂdkĂœch reprezentacĂ pro rekonstrukci poĆĄkozenĂœch zvukovĂœch zĂĄznamĆŻ, aĆ„ uĆŸ historickĂœch nebo novÄ vzniklĂœch. PĆedevĆĄĂm historickĂ© zvukovĂ© nahrĂĄvky trpĂ zaruĆĄenĂm jako praskĂĄnĂ nebo ĆĄum. KrĂĄtkodobĂ© poĆĄkozenĂ zvukovĂœch nahrĂĄvek bylo doposud ĆeĆĄeno interpolaÄnĂmi technikami, zejmĂ©na pomocĂ autoregresnĂho modelovĂĄnĂ. V nedĂĄvnĂ© dobÄ byl pĆedstaven algoritmus s nĂĄzvem Audio Inpainting, kterĂœ ĆeĆĄĂ doplĆovĂĄnĂ chybÄjĂcĂch vzorkĆŻ ve zvukovĂ©m signĂĄlu pomocĂ ĆĂdkĂœch reprezentacĂ. ZmĂnÄnĂœ algoritmus vyuĆŸĂvĂĄ tzv. hladovĂ© algoritmy pro ĆeĆĄenĂ optimalizaÄnĂch Ășloh. CĂlem tĂ©to prĂĄce je porovnĂĄnĂ dosavadnĂch interpolaÄnĂch metod s technikou Audio Inpaintingu. NavĂc, k ĆeĆĄenĂ optimalizaÄnĂch Ășloh jsou vyuĆŸĂvĂĄny algoritmy zaloĆŸenĂ© na l1-relaxaci, a to jak ve formÄ analyzujĂcĂho, tak i syntetizujĂcĂho modelu. PĆedevĆĄĂm se jednĂĄ o proximĂĄlnĂ algoritmy. Tyto algoritmy pracujĂ jak s jednotlivĂœmi koeficienty samostatnÄ, tak s koeficienty v zĂĄvislosti na jejich okolĂ, tzv. strukturovanĂĄ ĆĂdkost. StrukturovanĂĄ ĆĂdkost je dĂĄle vyuĆŸita taky pro odĆĄumovĂĄnĂ zvukovĂœch nahrĂĄvek. JednotlivĂ© algoritmy jsou v praktickĂ© ÄĂĄsti zhodnoceny z hlediska nastavenĂ parametrĆŻ pro optimĂĄlnĂ pomÄr rekonstrukce vs. vĂœpoÄetnĂ Äas. VĆĄechny algoritmy popsanĂ© v prĂĄci jsou na praktickĂœch pĆĂkladech porovnĂĄny pomocĂ objektivnĂch metod odstupu signĂĄlu od ĆĄumu (SNR) a PEMO-Q. Na zĂĄvÄr je ĂșspÄĆĄnost rekonstrukce poĆĄkozenĂœch zvukovĂœch signĂĄlĆŻ vyhodnocena.Recently, sparse representations of signals became very popular in the field of signal processing. Sparse representation mean that the signal is represented exactly or very well approximated by a linear combination of only a few vectors from the specific representation system. This thesis deals with the utilization of sparse representations of signals for the process of audio restoration, either historical or recent records. Primarily old audio recordings suffer from defects like crackles or noise. Until now, short gaps in audio signals were repaired by interpolation techniques, especially autoregressive modeling. Few years ago, an algorithm termed the Audio Inpainting was introduced. This algorithm solves the missing audio signal samples inpainting using sparse representations through the greedy algorithm for sparse approximation. This thesis aims to compare the state-of-the-art interpolation methods with the Audio Inpainting. Besides this, l1-relaxation methods are utilized for sparse approximation, while both analysis and synthesis models are incorporated. Algorithms used for the sparse approximation are called the proximal algorithms. These algorithms treat the coefficients either separately or with relations to their neighbourhood (structured sparsity). Further, structured sparsity is used for audio denoising. In the experimental part of the thesis, parameters of each algorithm are evaluated in terms of optimal restoration efficiency vs. processing time efficiency. All of the algorithms described in the thesis are compared using objective evaluation methods Signal-to-Noise ratio (SNR) and PEMO-Q. Finally, the overall conclusion and discussion on the restoration results is presented.
Exploring visual representation of sound in computer music software through programming and composition
Presented through contextualisation of the portfolio works are developments of a practice in which the acts of programming and composition are intrinsically connected. This practice-based research (conducted 2009â2013) explores visual representation of sound in computer music software.
Towards greater understanding of composing with the software medium, initial questions are taken as stimulus to explore the subject through artistic practice and critical thinking. The project begins by asking: How might the ways in which sound is visually represented influence the choices that are made while those representations are being manipulated and organised as music? Which aspects of sound are represented visually, and how are those aspects shown?
Recognising sound as a psychophysical phenomenon, the physical and psychological aspects of aesthetic interest to my work are identified. Technological factors of mediating these aspects for the interactive visual-domain of software are considered, and a techno-aesthetic understanding developed.
Through compositional studies of different approaches to the problem of looking at sound in software, on screen, a number of conceptual themes emerge in this work: the idea of software as substance, both as a malleable material (such as in live coding), and in terms of outcome artefacts; the direct mapping between audio data and screen pixels; the use of colour that maintains awareness of its discrete (as opposed to continuous) basis; the need for integrated display of parameter controls with their target data; and the tildegraph concept that began as a conceptual model of a gramophone and which is a spatio-visual sound synthesis technique related to wave terrain synthesis. The spiroid-frequency-space representation is introduced, contextualised, and combined both with those themes and a bespoke geometrical drawing system (named thisis), to create a new modular computer music software environment named sdfsys
Blind dereverberation of speech from moving and stationary speakers using sequential Monte Carlo methods
Speech signals radiated in confined spaces are subject to reverberation due to reflections
of surrounding walls and obstacles. Reverberation leads to severe degradation
of speech intelligibility and can be prohibitive for applications where speech is digitally
recorded, such as audio conferencing or hearing aids. Dereverberation of speech
is therefore an important field in speech enhancement.
Driven by consumer demand, blind speech dereverberation has become a popular
field in the research community and has led to many interesting approaches in the literature.
However, most existing methods are dictated by their underlying models and
hence suffer from assumptions that constrain the approaches to specific subproblems
of blind speech dereverberation. For example, many approaches limit the dereverberation
to voiced speech sounds, leading to poor results for unvoiced speech. Few
approaches tackle single-sensor blind speech dereverberation, and only a very limited
subset allows for dereverberation of speech from moving speakers.
Therefore, the aim of this dissertation is the development of a flexible and extendible
framework for blind speech dereverberation accommodating different speech
sound types, single- or multiple sensor as well as stationary and moving speakers.
Bayesian methods benefit from â rather than being dictated by â appropriate model
choices. Therefore, the problem of blind speech dereverberation is considered from
a Bayesian perspective in this thesis. A generic sequential Monte Carlo approach
accommodating a multitude of models for the speech production mechanism and
room transfer function is consequently derived. In this approach both the anechoic
source signal and reverberant channel are estimated using their optimal estimators by
means of Rao-Blackwellisation of the state-space of unknown variables. The remaining
model parameters are estimated using sequential importance resampling.
The proposed approach is implemented for two different speech production models
for stationary speakers, demonstrating substantial reduction in reverberation for
both unvoiced and voiced speech sounds. Furthermore, the channel model is extended
to facilitate blind dereverberation of speech from moving speakers. Due to the
structure of measurement model, single- as well as multi-microphone processing is facilitated,
accommodating physically constrained scenarios where only a single sensor
can be used as well as allowing for the exploitation of spatial diversity in scenarios
where the physical size of microphone arrays is of no concern.
This dissertation is concluded with a survey of possible directions for future research,
including the use of switching Markov source models, joint target tracking
and enhancement, as well as an extension to subband processing for improved computational
efficiency
- âŠ