26 research outputs found
Computer Models for Musical Instrument Identification
PhDA particular aspect in the perception of sound is concerned with what is commonly
termed as texture or timbre. From a perceptual perspective, timbre is what allows us
to distinguish sounds that have similar pitch and loudness. Indeed most people are
able to discern a piano tone from a violin tone or able to distinguish different voices
or singers.
This thesis deals with timbre modelling. Specifically, the formant theory of timbre
is the main theme throughout. This theory states that acoustic musical instrument
sounds can be characterised by their formant structures. Following this principle, the
central point of our approach is to propose a computer implementation for building
musical instrument identification and classification systems.
Although the main thrust of this thesis is to propose a coherent and unified
approach to the musical instrument identification problem, it is oriented towards the
development of algorithms that can be used in Music Information Retrieval (MIR)
frameworks. Drawing on research in speech processing, a complete supervised system
taking into account both physical and perceptual aspects of timbre is described.
The approach is composed of three distinct processing layers. Parametric models
that allow us to represent signals through mid-level physical and perceptual representations
are considered. Next, the use of the Line Spectrum Frequencies as spectral
envelope and formant descriptors is emphasised. Finally, the use of generative and
discriminative techniques for building instrument and database models is investigated.
Our system is evaluated under realistic recording conditions using databases of isolated
notes and melodic phrases
Perceptual models in speech quality assessment and coding
The ever-increasing demand for good communications/toll
quality speech has created a renewed interest into the
perceptual impact of rate compression. Two general areas are
investigated in this work, namely speech quality assessment
and speech coding.
In the field of speech quality assessment, a model is
developed which simulates the processing stages of the
peripheral auditory system. At the output of the model a
"running" auditory spectrum is obtained. This represents
the auditory (spectral) equivalent of any acoustic sound such
as speech. Auditory spectra from coded speech segments serve
as inputs to a second model. This model simulates the
information centre in the brain which performs the speech
quality assessment. [Continues.
Three-dimensional morphanalysis of the face.
The aim of the work reported in this thesis was to determine the extent to which orthogonal two-dimensional morphanalytic (universally relatable) craniofacial imaging methods can be extended into the realm of computer-based three-dimensional imaging. New methods are presented for capturing universally relatable laser-video surface data, for inter-relating facial surface scans and for constructing probabilistic facial averages. Universally relatable surface scans are captured using the fixed relations principle com- bined with a new laser-video scanner calibration method. Inter- subject comparison of facial surface scans is achieved using inter- active feature labelling and warping methods. These methods have been extended to groups of subjects to allow the construction of three-dimensional probabilistic facial averages. The potential of universally relatable facial surface data for applications such as growth studies and patient assessment is demonstrated. In addition, new methods for scattered data interpolation, for controlling overlap in image warping and a fast, high-resolution method for simulating craniofacial surgery are described. The results demonstrate that it is not only possible to extend universally relatable imaging into three dimensions, but that the extension also enhances the established methods, providing a wide
range of new applications
Multiresolution techniques for audio signal restoration
This thesis describes a study of techniques for the restoration of musical audio signals using a multiresolution signal representation called the multiresolution Fourier transform (MFT), a time-frequency-scale representation. This representation allows the restoration to adapt to the local signal structure, which typically consists of a set of approximately sinusoidal partials, each consisting of an “onset” of rapid energy variation followed by more slowly varying “sustain” and “decay” phases.
It must be decided what components of a noisy audio signal are to be kept in the restored version and, conversely, which must be removed. A simple filter is introduced that retains only musical signal —that is signal which adheres to the musical model — and rejects everything else. It is shown that this filter used in conjunction with the MIT has a low computational complexity. The MIT is used to capture the transient energy present at the onset of notes by splitting the time axis of a musical signal into steady-state and transient zones using a simple onset detector, which measures the expected energy at a given lime against the actual energy present.
Past audio signal restoration systems have relied on estimating a restored audio signal’s spectrum from the noisy audio signal presented to the algorithm. In this thesis the idea of having more than one version of a recording is used in order to gain further information about the ideal spectrum of the noisy signal. This poses a number of problems with regards to matching the time scales of two versions of the same piece. These are addressed and solutions are offered, based on a novel multiresolution warping algorithm.
Finally, various methods for using the detected signal spectrum of a clean modern signal to restore a noisy signal using the warping techniques and musical event detection filters are shown. These account for variations in scale and input signal to noise ratio (SNR) in the noisy signal. It is also shown how the simple adaptive filter introduced earlier can be used to restore audio signals with impulse noise as well as while additive noise. This filter and the time-warping technique is compared to adaptive Wiener filtering as an audio restoration method
Time and frequency domain algorithms for speech coding
The promise of digital hardware economies (due to recent advances in
VLSI technology), has focussed much attention on more complex and sophisticated
speech coding algorithms which offer improved quality at relatively
low bit rates.
This thesis describes the results (obtained from computer simulations)
of research into various efficient (time and frequency domain) speech
encoders operating at a transmission bit rate of 16 Kbps.
In the time domain, Adaptive Differential Pulse Code Modulation (ADPCM)
systems employing both forward and backward adaptive prediction were
examined. A number of algorithms were proposed and evaluated, including
several variants of the Stochastic Approximation Predictor (SAP). A
Backward Block Adaptive (BBA) predictor was also developed and found to
outperform the conventional stochastic methods, even though its complexity
in terms of signal processing requirements is lower. A simplified
Adaptive Predictive Coder (APC) employing a single tap pitch predictor
considered next provided a slight improvement in performance over ADPCM,
but with rather greater complexity.
The ultimate test of any speech coding system is the perceptual performance
of the received speech. Recent research has indicated that this
may be enhanced by suitable control of the noise spectrum according to
the theory of auditory masking. Various noise shaping ADPCM
configurations were examined, and it was demonstrated that a proposed
pre-/post-filtering arrangement which exploits advantageously the
predictor-quantizer interaction, leads to the best subjective
performance in both forward and backward prediction systems.
Adaptive quantization is instrumental to the performance of ADPCM systems.
Both the forward adaptive quantizer (AQF) and the backward oneword
memory adaptation (AQJ) were examined. In addition, a novel method
of decreasing quantization noise in ADPCM-AQJ coders, which involves the
application of correction to the decoded speech samples, provided
reduced output noise across the spectrum, with considerable high frequency
noise suppression.
More powerful (and inevitably more complex) frequency domain speech
coders such as the Adaptive Transform Coder (ATC) and the Sub-band Coder
(SBC) offer good quality speech at 16 Kbps. To reduce complexity and
coding delay, whilst retaining the advantage of sub-band coding, a novel
transform based split-band coder (TSBC) was developed and found to compare
closely in performance with the SBC.
To prevent the heavy side information requirement associated with a
large number of bands in split-band coding schemes from impairing coding
accuracy, without forgoing the efficiency provided by adaptive bit
allocation, a method employing AQJs to code the sub-band signals together
with vector quantization of the bit allocation patterns was also
proposed.
Finally, 'pipeline' methods of bit allocation and step size estimation
(using the Fast Fourier Transform (FFT) on the input signal) were examined.
Such methods, although less accurate, are nevertheless useful in
limiting coding delay associated with SRC schemes employing Quadrature
Mirror Filters (QMF)
Multiple Track Performance of a Digital Magnetic Tape System : Experimental Study and Simulation using Parallel Processing Techniques
The primary aim of the magnetic recording industry is to
increase storage capacities and transfer rates whilst maintaining or
reducing costs. In multiple-track tape systems, as recorded track
dimensions decrease, higher precision tape transport mechanisms and
dedicated coding circuitry are required. This leads to increased
manufacturing costs and a loss of flexibility. This thesis reports on
the performance of a low precision low-cost multiple-track tape
transport system. Software based techniques to study system
performance, and to compensate for the mechanical deficiencies of
this system were developed using occam and the transputer.
The inherent parallelism of the multiple-track format was
exploited by integrating a transputer into the recording channel
to perform the signal processing tasks. An innovative model of the
recording channel, written exclusively in occam, was developed.
The effect of parameters, such as data rate, track dimensions and
head misregistration on system performance was determined from the
detailed error profile produced. This model may be run on
a network of transputers, allowing its speed of execution to be
scaled to suit the investigation. These features, combined with its
modular flexibility makes it a powerful tool that may be applied to
other multiple-track systems, such as digital HDTV.
A greater understanding of the effects of mechanical
deficiencies on the performance of multiple-track systems was gained
from this study. This led to the development of a software based
compensation scheme to reduce the effects of Lateral Head
Displacement and allow low-cost tape transport mechanisms to be used
with narrow, closely spaced tracks, facilitating higher packing
densities.
The experimental and simulated investigation of system
performance, the development of the model and compensation scheme
using parallel processing techniques has led to the publication of a
paper and two further publications are expected.Thorn EMI,
Central Research Laboratories,
Hayes, Middlese
The assessment and development of methods in (spatial) sound ecology
As vital ecosystems across the globe enter unchartered pressure from climate change industrial land use, understanding the processes driving ecosystem viability has never been more critical. Nuanced ecosystem understanding comes from well-collected field data and a wealth of associated interpretations. In recent years the most popular methods of ecosystem monitoring have revolutionised from often damaging and labour-intensive manual data collection to automated methods of data collection and analysis. Sound ecology describes the school of research that uses information transmitted through sound to infer properties about an area's species, biodiversity, and health. In this thesis, we explore and develop state-of-the-art automated monitoring with sound, specifically relating to data storage practice and spatial acoustic recording and data analysis.
In the first chapter, we explore the necessity and methods of ecosystem monitoring, focusing on acoustic monitoring, later exploring how and why sound is recorded and the current state-of-the-art in acoustic monitoring. Chapter one concludes with us setting out the aims and overall content of the following chapters. We begin the second chapter by exploring methods used to mitigate data storage expense, a widespread issue as automated methods quickly amass vast amounts of data which can be expensive and impractical to manage. Importantly I explain how these data management practices are often used without known consequence, something I then address. Specifically, I present evidence that the most used data reduction methods (namely compression and temporal subsetting) have a surprisingly small impact on the information content of recorded sound compared to the method of analysis. This work also adds to the increasing evidence that deep learning-based methods of environmental sound quantification are more powerful and robust to experimental variation than more traditional acoustic indices.
In the latter chapters, I focus on using multichannel acoustic recording for sound-source localisation. Knowing where a sound originated has a range of ecological uses, including counting individuals, locating threats, and monitoring habitat use. While an exciting application of acoustic technology, spatial acoustics has had minimal uptake owing to the expense, impracticality and inaccessibility of equipment. In my third chapter, I introduce MAARU (Multichannel Acoustic Autonomous Recording Unit), a low-cost, easy-to-use and accessible solution to this problem. I explain the software and hardware necessary for spatial recording and show how MAARU can be used to localise the direction of a sound to within ±10˚ accurately. In the fourth chapter, I explore how MAARU devices deployed in the field can be used for enhanced ecosystem monitoring by spatially clustering individuals by calling directions for more accurate abundance approximations and crude species-specific habitat usage monitoring. Most literature on spatial acoustics cites the need for many accurately synced recording devices over an area. This chapter provides the first evidence of advances made with just one recorder.
Finally, I conclude this thesis by restating my aims and discussing my success in achieving them. Specifically, in the thesis’ conclusion, I reiterate the contributions made to the field as a direct result of this work and outline some possible development avenues.Open Acces
Separation of musical sources and structure from single-channel polyphonic recordings
EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Directional edge and texture representations for image processing
An efficient representation for natural images is of fundamental importance in image processing and analysis. The commonly used separable transforms such as wavelets axe not best suited for images due to their inability to exploit directional regularities such as edges and oriented textural patterns; while most of the recently proposed directional schemes cannot represent these two types of features in a unified transform. This thesis focuses on the development of directional representations for images which can capture both edges and textures in a multiresolution manner. The thesis first considers the problem of extracting linear features with the multiresolution Fourier transform (MFT). Based on a previous MFT-based linear feature model, the work extends the extraction method into the situation when the image is corrupted by noise. The problem is tackled by the combination of a "Signal+Noise" frequency model, a refinement stage and a robust classification scheme. As a result, the MFT is able to perform linear feature analysis on noisy images on which previous methods failed. A new set of transforms called the multiscale polar cosine transforms (MPCT) are also proposed in order to represent textures. The MPCT can be regarded as real-valued MFT with similar basis functions of oriented sinusoids. It is shown that the transform can represent textural patches more efficiently than the conventional Fourier basis. With a directional best cosine basis, the MPCT packet (MPCPT) is shown to be an efficient representation for edges and textures, despite its high computational burden. The problem of representing edges and textures in a fixed transform with less complexity is then considered. This is achieved by applying a Gaussian frequency filter, which matches the disperson of the magnitude spectrum, on the local MFT coefficients. This is particularly effective in denoising natural images, due to its ability to preserve both types of feature. Further improvements can be made by employing the information given by the linear feature extraction process in the filter's configuration. The denoising results compare favourably against other state-of-the-art directional representations
Relative-fuzzy: a novel approach for handling complex ambiguity for software engineering of data mining models
There are two main defined classes of uncertainty namely: fuzziness and ambiguity, where ambiguity is ‘one-to-many’ relationship between syntax and semantic of a proposition. This definition seems that it ignores ‘many-to-many’ relationship ambiguity type of uncertainty. In this thesis, we shall use complex-uncertainty to term many-to-many relationship ambiguity type of uncertainty.
This research proposes a new approach for handling the complex ambiguity type of uncertainty that may exist in data, for software engineering of predictive Data Mining (DM) classification models. The proposed approach is based on Relative-Fuzzy Logic (RFL), a novel type of fuzzy logic. RFL defines a new formulation of the problem of ambiguity type of uncertainty in terms of States Of Proposition (SOP). RFL describes its membership (semantic) value by using the new definition of Domain of Proposition (DOP), which is based on the relativity principle as defined by possible-worlds logic.
To achieve the goal of proposing RFL, a question is needed to be answered, which is: how these two approaches; i.e. fuzzy logic and possible-world, can be mixed to produce a new membership value set (and later logic) that able to handle fuzziness and multiple viewpoints at the same time? Achieving such goal comes via providing possible world logic the ability to quantifying multiple viewpoints and also model fuzziness in each of these multiple viewpoints and expressing that in a new set of membership value.
Furthermore, a new architecture of Hierarchical Neural Network (HNN) called ML/RFL-Based Net has been developed in this research, along with a new learning algorithm and new recalling algorithm. The architecture, learning algorithm and recalling algorithm of ML/RFL-Based Net follow the principles of RFL. This new type of HNN is considered to be a RFL computation machine.
The ability of the Relative Fuzzy-based DM prediction model to tackle the problem of complex ambiguity type of uncertainty has been tested. Special-purpose Integrated Development Environment (IDE) software, which generates a DM prediction model for speech recognition, has been developed in this research too, which is called RFL4ASR. This special purpose IDE is an extension of the definition of the traditional IDE.
Using multiple sets of TIMIT speech data, the prediction model of type ML/RFL-Based Net has classification accuracy of 69.2308%. This accuracy is higher than the best achievements of WEKA data mining machines given the same speech data