242 research outputs found

    Computer Models for Musical Instrument Identification

    Get PDF
    PhDA particular aspect in the perception of sound is concerned with what is commonly termed as texture or timbre. From a perceptual perspective, timbre is what allows us to distinguish sounds that have similar pitch and loudness. Indeed most people are able to discern a piano tone from a violin tone or able to distinguish different voices or singers. This thesis deals with timbre modelling. Specifically, the formant theory of timbre is the main theme throughout. This theory states that acoustic musical instrument sounds can be characterised by their formant structures. Following this principle, the central point of our approach is to propose a computer implementation for building musical instrument identification and classification systems. Although the main thrust of this thesis is to propose a coherent and unified approach to the musical instrument identification problem, it is oriented towards the development of algorithms that can be used in Music Information Retrieval (MIR) frameworks. Drawing on research in speech processing, a complete supervised system taking into account both physical and perceptual aspects of timbre is described. The approach is composed of three distinct processing layers. Parametric models that allow us to represent signals through mid-level physical and perceptual representations are considered. Next, the use of the Line Spectrum Frequencies as spectral envelope and formant descriptors is emphasised. Finally, the use of generative and discriminative techniques for building instrument and database models is investigated. Our system is evaluated under realistic recording conditions using databases of isolated notes and melodic phrases

    Comparison of modelled pursuits with ESPRIT and the matrix pencil method in the modelling of medical percussion signals

    Get PDF
    The objective of this paper is to compare Modelled Pursuits (MoP), a recently developed iterative signal decomposition method, with more established matrix based subspace methods used to aid or automate medical percussion diagnoses. Medical percussion is a technique used by clinicians to aid the diagnosis of pulmonary disease. It requires considerable expertise, so it is desirable to automate this process where possible. Previous work has examined the application of modal decomposition techniques, since medical percussion signals (MPS) can be intuitively characterised as combinations of exponentially decaying sinusoidal (EDS) vibrations. Best results have typically been reported with matrix based subspace methods such as Estimation of Signal Parameters via Rotational Invariance Techniques (ESPRIT) and the Matrix Pencil Method (MPM). Since ESPRIT and MPM are computationally expensive, this paper investigates whether an iterative method such as MoP can produce similar results with less computation and/or memory overheads. Using randomly generated synthetic signals designed to replicate typical ‘tympanic’ and ‘resonant’ percussion signals, we compared each method: MoP, ESPRIT, and MPM, for accuracy, speed and memory usage. We find that for low Signal to Noise Ratios (SNRs) MoP gives less accuracy than both ESPRIT and MPM, however for high SNRs (as would be typically encountered in a clinical setting) it is more accurate than MPM but less accurate than ESPRIT. We conclude that in embedded clinical applications where both operations-per-second and memory-usage are a factor, MoP is less computationally intensive than ESPRIT and thus is worth considering for use in those contexts

    Statistical models for natural sounds

    Get PDF
    It is important to understand the rich structure of natural sounds in order to solve important tasks, like automatic speech recognition, and to understand auditory processing in the brain. This thesis takes a step in this direction by characterising the statistics of simple natural sounds. We focus on the statistics because perception often appears to depend on them, rather than on the raw waveform. For example the perception of auditory textures, like running water, wind, fire and rain, depends on summary-statistics, like the rate of falling rain droplets, rather than on the exact details of the physical source. In order to analyse the statistics of sounds accurately it is necessary to improve a number of traditional signal processing methods, including those for amplitude demodulation, time-frequency analysis, and sub-band demodulation. These estimation tasks are ill-posed and therefore it is natural to treat them as Bayesian inference problems. The new probabilistic versions of these methods have several advantages. For example, they perform more accurately on natural signals and are more robust to noise, they can also fill-in missing sections of data, and provide error-bars. Furthermore, free-parameters can be learned from the signal. Using these new algorithms we demonstrate that the energy, sparsity, modulation depth and modulation time-scale in each sub-band of a signal are critical statistics, together with the dependencies between the sub-band modulators. In order to validate this claim, a model containing co-modulated coloured noise carriers is shown to be capable of generating a range of realistic sounding auditory textures. Finally, we explored the connection between the statistics of natural sounds and perception. We demonstrate that inference in the model for auditory textures qualitatively replicates the primitive grouping rules that listeners use to understand simple acoustic scenes. This suggests that the auditory system is optimised for the statistics of natural sounds

    Transient and steady-state component separation for audio signals

    Get PDF
    In this work the problem of transient and steady-state component separation of an audio signal was addressed. In particular, a recently proposed method for separation of transient and steady-state components based on the median filter was investigated. For a better understanding of the processes involved, a modification of the filtering stage of the algorithm was proposed. This modification was evaluated subjectively by listening tests and objectively by an application-based comparison. Also some extensions to the model were presented in conjunction with different possible applications for the transient and steady-state decomposition in the area of audio editing and processing
    corecore