242 research outputs found
Recommended from our members
Signal separation of musical instruments: simulation-based methods for musical signal decomposition and transcription
This thesis presents techniques for the modelling of musical signals, with particular regard to monophonic and polyphonic pitch estimation. Musical signals are modelled as a set of notes, each comprising of a set of harmonically-related sinusoids. An hierarchical model is presented that is very general and applicable to any signal that can be decomposed as the sum of basis functions. Parameter estimation is posed within a Bayesian framework, allowing for the incorporation of prior information about model parameters. The resulting posterior distribution is of variable dimension and so reversible jump MCMC simulation techniques are employed for the parameter estimation task. The extension of the model to time-varying signals with high posterior correlations between model parameters is described. The parameters and hyperparameters of several frames of data are estimated jointly to achieve a more robust detection. A general model for the description of time-varying homogeneous and heterogeneous multiple component signals is developed, and then applied to the analysis of musical signals. The importance of high level musical and perceptual psychological knowledge in the formulation of the model is highlighted, and attention is drawn to the limitation of pure signal processing techniques for dealing with musical signals. Gestalt psychological grouping principles motivate the hierarchical signal model, and component identifiability is considered in terms of perceptual streaming where each component establishes its own context. A major emphasis of this thesis is the practical application of MCMC techniques, which are generally deemed to be too slow for many applications. Through the design of efficient transition kernels highly optimised for harmonic models, and by careful choice of assumptions and approximations, implementations approaching the order of realtime are viable.Engineering and Physical Sciences Research Counci
Computer Models for Musical Instrument Identification
PhDA particular aspect in the perception of sound is concerned with what is commonly
termed as texture or timbre. From a perceptual perspective, timbre is what allows us
to distinguish sounds that have similar pitch and loudness. Indeed most people are
able to discern a piano tone from a violin tone or able to distinguish different voices
or singers.
This thesis deals with timbre modelling. Specifically, the formant theory of timbre
is the main theme throughout. This theory states that acoustic musical instrument
sounds can be characterised by their formant structures. Following this principle, the
central point of our approach is to propose a computer implementation for building
musical instrument identification and classification systems.
Although the main thrust of this thesis is to propose a coherent and unified
approach to the musical instrument identification problem, it is oriented towards the
development of algorithms that can be used in Music Information Retrieval (MIR)
frameworks. Drawing on research in speech processing, a complete supervised system
taking into account both physical and perceptual aspects of timbre is described.
The approach is composed of three distinct processing layers. Parametric models
that allow us to represent signals through mid-level physical and perceptual representations
are considered. Next, the use of the Line Spectrum Frequencies as spectral
envelope and formant descriptors is emphasised. Finally, the use of generative and
discriminative techniques for building instrument and database models is investigated.
Our system is evaluated under realistic recording conditions using databases of isolated
notes and melodic phrases
Comparison of modelled pursuits with ESPRIT and the matrix pencil method in the modelling of medical percussion signals
The objective of this paper is to compare Modelled Pursuits (MoP), a recently developed iterative signal decomposition method, with more established matrix based subspace methods used to aid or automate medical percussion diagnoses. Medical percussion is a technique used by clinicians to aid the diagnosis of pulmonary disease. It requires considerable expertise, so it is desirable to automate this process where possible. Previous work has examined the application of modal decomposition techniques, since medical percussion signals (MPS) can be intuitively characterised as combinations of exponentially decaying sinusoidal (EDS) vibrations. Best results have typically been reported with matrix based subspace methods such as Estimation of Signal Parameters via Rotational Invariance Techniques (ESPRIT) and the Matrix Pencil Method (MPM). Since ESPRIT and MPM are computationally expensive, this paper investigates whether an iterative method such as MoP can produce similar results with less computation and/or memory overheads. Using randomly generated synthetic signals designed to replicate typical ‘tympanic’ and ‘resonant’ percussion signals, we compared each method: MoP, ESPRIT, and MPM, for accuracy, speed and memory usage. We find that for low Signal to Noise Ratios (SNRs) MoP gives less accuracy than both ESPRIT and MPM, however for high SNRs (as would be typically encountered in a clinical setting) it is more accurate than MPM but less accurate than ESPRIT. We conclude that in embedded clinical applications where both operations-per-second and memory-usage are a factor, MoP is less computationally intensive than ESPRIT and thus is worth considering for use in those contexts
Statistical models for natural sounds
It is important to understand the rich structure of natural sounds in order to solve important
tasks, like automatic speech recognition, and to understand auditory processing
in the brain. This thesis takes a step in this direction by characterising the statistics of
simple natural sounds. We focus on the statistics because perception often appears to
depend on them, rather than on the raw waveform. For example the perception of auditory
textures, like running water, wind, fire and rain, depends on summary-statistics,
like the rate of falling rain droplets, rather than on the exact details of the physical
source.
In order to analyse the statistics of sounds accurately it is necessary to improve a
number of traditional signal processing methods, including those for amplitude demodulation,
time-frequency analysis, and sub-band demodulation. These estimation tasks
are ill-posed and therefore it is natural to treat them as Bayesian inference problems.
The new probabilistic versions of these methods have several advantages. For example,
they perform more accurately on natural signals and are more robust to noise,
they can also fill-in missing sections of data, and provide error-bars. Furthermore,
free-parameters can be learned from the signal. Using these new algorithms we demonstrate
that the energy, sparsity, modulation depth and modulation time-scale in each
sub-band of a signal are critical statistics, together with the dependencies between the
sub-band modulators. In order to validate this claim, a model containing co-modulated
coloured noise carriers is shown to be capable of generating a range of realistic sounding
auditory textures.
Finally, we explored the connection between the statistics of natural sounds and perception.
We demonstrate that inference in the model for auditory textures qualitatively
replicates the primitive grouping rules that listeners use to understand simple acoustic
scenes. This suggests that the auditory system is optimised for the statistics of natural
sounds
Transient and steady-state component separation for audio signals
In this work the problem of transient and steady-state component separation of an audio signal was addressed. In particular, a recently proposed method for separation of transient and steady-state components based on the median filter was investigated. For a better understanding of the processes involved, a modification of the filtering stage of the algorithm was proposed. This modification was evaluated subjectively by listening tests and objectively by an application-based comparison. Also some extensions to the model were presented in conjunction with different possible applications for the transient and steady-state decomposition in the area of audio editing and processing
- …