26 research outputs found

    Computer Models for Musical Instrument Identification

    Get PDF
    PhDA particular aspect in the perception of sound is concerned with what is commonly termed as texture or timbre. From a perceptual perspective, timbre is what allows us to distinguish sounds that have similar pitch and loudness. Indeed most people are able to discern a piano tone from a violin tone or able to distinguish different voices or singers. This thesis deals with timbre modelling. Specifically, the formant theory of timbre is the main theme throughout. This theory states that acoustic musical instrument sounds can be characterised by their formant structures. Following this principle, the central point of our approach is to propose a computer implementation for building musical instrument identification and classification systems. Although the main thrust of this thesis is to propose a coherent and unified approach to the musical instrument identification problem, it is oriented towards the development of algorithms that can be used in Music Information Retrieval (MIR) frameworks. Drawing on research in speech processing, a complete supervised system taking into account both physical and perceptual aspects of timbre is described. The approach is composed of three distinct processing layers. Parametric models that allow us to represent signals through mid-level physical and perceptual representations are considered. Next, the use of the Line Spectrum Frequencies as spectral envelope and formant descriptors is emphasised. Finally, the use of generative and discriminative techniques for building instrument and database models is investigated. Our system is evaluated under realistic recording conditions using databases of isolated notes and melodic phrases

    Perceptual models in speech quality assessment and coding

    Get PDF
    The ever-increasing demand for good communications/toll quality speech has created a renewed interest into the perceptual impact of rate compression. Two general areas are investigated in this work, namely speech quality assessment and speech coding. In the field of speech quality assessment, a model is developed which simulates the processing stages of the peripheral auditory system. At the output of the model a "running" auditory spectrum is obtained. This represents the auditory (spectral) equivalent of any acoustic sound such as speech. Auditory spectra from coded speech segments serve as inputs to a second model. This model simulates the information centre in the brain which performs the speech quality assessment. [Continues.

    Three-dimensional morphanalysis of the face.

    Get PDF
    The aim of the work reported in this thesis was to determine the extent to which orthogonal two-dimensional morphanalytic (universally relatable) craniofacial imaging methods can be extended into the realm of computer-based three-dimensional imaging. New methods are presented for capturing universally relatable laser-video surface data, for inter-relating facial surface scans and for constructing probabilistic facial averages. Universally relatable surface scans are captured using the fixed relations principle com- bined with a new laser-video scanner calibration method. Inter- subject comparison of facial surface scans is achieved using inter- active feature labelling and warping methods. These methods have been extended to groups of subjects to allow the construction of three-dimensional probabilistic facial averages. The potential of universally relatable facial surface data for applications such as growth studies and patient assessment is demonstrated. In addition, new methods for scattered data interpolation, for controlling overlap in image warping and a fast, high-resolution method for simulating craniofacial surgery are described. The results demonstrate that it is not only possible to extend universally relatable imaging into three dimensions, but that the extension also enhances the established methods, providing a wide range of new applications

    Multiresolution techniques for audio signal restoration

    Get PDF
    This thesis describes a study of techniques for the restoration of musical audio signals using a multiresolution signal representation called the multiresolution Fourier transform (MFT), a time-frequency-scale representation. This representation allows the restoration to adapt to the local signal structure, which typically consists of a set of approximately sinusoidal partials, each consisting of an “onset” of rapid energy variation followed by more slowly varying “sustain” and “decay” phases. It must be decided what components of a noisy audio signal are to be kept in the restored version and, conversely, which must be removed. A simple filter is introduced that retains only musical signal —that is signal which adheres to the musical model — and rejects everything else. It is shown that this filter used in conjunction with the MIT has a low computational complexity. The MIT is used to capture the transient energy present at the onset of notes by splitting the time axis of a musical signal into steady-state and transient zones using a simple onset detector, which measures the expected energy at a given lime against the actual energy present. Past audio signal restoration systems have relied on estimating a restored audio signal’s spectrum from the noisy audio signal presented to the algorithm. In this thesis the idea of having more than one version of a recording is used in order to gain further information about the ideal spectrum of the noisy signal. This poses a number of problems with regards to matching the time scales of two versions of the same piece. These are addressed and solutions are offered, based on a novel multiresolution warping algorithm. Finally, various methods for using the detected signal spectrum of a clean modern signal to restore a noisy signal using the warping techniques and musical event detection filters are shown. These account for variations in scale and input signal to noise ratio (SNR) in the noisy signal. It is also shown how the simple adaptive filter introduced earlier can be used to restore audio signals with impulse noise as well as while additive noise. This filter and the time-warping technique is compared to adaptive Wiener filtering as an audio restoration method

    Time and frequency domain algorithms for speech coding

    Get PDF
    The promise of digital hardware economies (due to recent advances in VLSI technology), has focussed much attention on more complex and sophisticated speech coding algorithms which offer improved quality at relatively low bit rates. This thesis describes the results (obtained from computer simulations) of research into various efficient (time and frequency domain) speech encoders operating at a transmission bit rate of 16 Kbps. In the time domain, Adaptive Differential Pulse Code Modulation (ADPCM) systems employing both forward and backward adaptive prediction were examined. A number of algorithms were proposed and evaluated, including several variants of the Stochastic Approximation Predictor (SAP). A Backward Block Adaptive (BBA) predictor was also developed and found to outperform the conventional stochastic methods, even though its complexity in terms of signal processing requirements is lower. A simplified Adaptive Predictive Coder (APC) employing a single tap pitch predictor considered next provided a slight improvement in performance over ADPCM, but with rather greater complexity. The ultimate test of any speech coding system is the perceptual performance of the received speech. Recent research has indicated that this may be enhanced by suitable control of the noise spectrum according to the theory of auditory masking. Various noise shaping ADPCM configurations were examined, and it was demonstrated that a proposed pre-/post-filtering arrangement which exploits advantageously the predictor-quantizer interaction, leads to the best subjective performance in both forward and backward prediction systems. Adaptive quantization is instrumental to the performance of ADPCM systems. Both the forward adaptive quantizer (AQF) and the backward oneword memory adaptation (AQJ) were examined. In addition, a novel method of decreasing quantization noise in ADPCM-AQJ coders, which involves the application of correction to the decoded speech samples, provided reduced output noise across the spectrum, with considerable high frequency noise suppression. More powerful (and inevitably more complex) frequency domain speech coders such as the Adaptive Transform Coder (ATC) and the Sub-band Coder (SBC) offer good quality speech at 16 Kbps. To reduce complexity and coding delay, whilst retaining the advantage of sub-band coding, a novel transform based split-band coder (TSBC) was developed and found to compare closely in performance with the SBC. To prevent the heavy side information requirement associated with a large number of bands in split-band coding schemes from impairing coding accuracy, without forgoing the efficiency provided by adaptive bit allocation, a method employing AQJs to code the sub-band signals together with vector quantization of the bit allocation patterns was also proposed. Finally, 'pipeline' methods of bit allocation and step size estimation (using the Fast Fourier Transform (FFT) on the input signal) were examined. Such methods, although less accurate, are nevertheless useful in limiting coding delay associated with SRC schemes employing Quadrature Mirror Filters (QMF)

    Multiple Track Performance of a Digital Magnetic Tape System : Experimental Study and Simulation using Parallel Processing Techniques

    Get PDF
    The primary aim of the magnetic recording industry is to increase storage capacities and transfer rates whilst maintaining or reducing costs. In multiple-track tape systems, as recorded track dimensions decrease, higher precision tape transport mechanisms and dedicated coding circuitry are required. This leads to increased manufacturing costs and a loss of flexibility. This thesis reports on the performance of a low precision low-cost multiple-track tape transport system. Software based techniques to study system performance, and to compensate for the mechanical deficiencies of this system were developed using occam and the transputer. The inherent parallelism of the multiple-track format was exploited by integrating a transputer into the recording channel to perform the signal processing tasks. An innovative model of the recording channel, written exclusively in occam, was developed. The effect of parameters, such as data rate, track dimensions and head misregistration on system performance was determined from the detailed error profile produced. This model may be run on a network of transputers, allowing its speed of execution to be scaled to suit the investigation. These features, combined with its modular flexibility makes it a powerful tool that may be applied to other multiple-track systems, such as digital HDTV. A greater understanding of the effects of mechanical deficiencies on the performance of multiple-track systems was gained from this study. This led to the development of a software based compensation scheme to reduce the effects of Lateral Head Displacement and allow low-cost tape transport mechanisms to be used with narrow, closely spaced tracks, facilitating higher packing densities. The experimental and simulated investigation of system performance, the development of the model and compensation scheme using parallel processing techniques has led to the publication of a paper and two further publications are expected.Thorn EMI, Central Research Laboratories, Hayes, Middlese

    The assessment and development of methods in (spatial) sound ecology

    Get PDF
    As vital ecosystems across the globe enter unchartered pressure from climate change industrial land use, understanding the processes driving ecosystem viability has never been more critical. Nuanced ecosystem understanding comes from well-collected field data and a wealth of associated interpretations. In recent years the most popular methods of ecosystem monitoring have revolutionised from often damaging and labour-intensive manual data collection to automated methods of data collection and analysis. Sound ecology describes the school of research that uses information transmitted through sound to infer properties about an area's species, biodiversity, and health. In this thesis, we explore and develop state-of-the-art automated monitoring with sound, specifically relating to data storage practice and spatial acoustic recording and data analysis. In the first chapter, we explore the necessity and methods of ecosystem monitoring, focusing on acoustic monitoring, later exploring how and why sound is recorded and the current state-of-the-art in acoustic monitoring. Chapter one concludes with us setting out the aims and overall content of the following chapters. We begin the second chapter by exploring methods used to mitigate data storage expense, a widespread issue as automated methods quickly amass vast amounts of data which can be expensive and impractical to manage. Importantly I explain how these data management practices are often used without known consequence, something I then address. Specifically, I present evidence that the most used data reduction methods (namely compression and temporal subsetting) have a surprisingly small impact on the information content of recorded sound compared to the method of analysis. This work also adds to the increasing evidence that deep learning-based methods of environmental sound quantification are more powerful and robust to experimental variation than more traditional acoustic indices. In the latter chapters, I focus on using multichannel acoustic recording for sound-source localisation. Knowing where a sound originated has a range of ecological uses, including counting individuals, locating threats, and monitoring habitat use. While an exciting application of acoustic technology, spatial acoustics has had minimal uptake owing to the expense, impracticality and inaccessibility of equipment. In my third chapter, I introduce MAARU (Multichannel Acoustic Autonomous Recording Unit), a low-cost, easy-to-use and accessible solution to this problem. I explain the software and hardware necessary for spatial recording and show how MAARU can be used to localise the direction of a sound to within ±10˚ accurately. In the fourth chapter, I explore how MAARU devices deployed in the field can be used for enhanced ecosystem monitoring by spatially clustering individuals by calling directions for more accurate abundance approximations and crude species-specific habitat usage monitoring. Most literature on spatial acoustics cites the need for many accurately synced recording devices over an area. This chapter provides the first evidence of advances made with just one recorder. Finally, I conclude this thesis by restating my aims and discussing my success in achieving them. Specifically, in the thesis’ conclusion, I reiterate the contributions made to the field as a direct result of this work and outline some possible development avenues.Open Acces

    Separation of musical sources and structure from single-channel polyphonic recordings

    Get PDF
    EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Directional edge and texture representations for image processing

    Get PDF
    An efficient representation for natural images is of fundamental importance in image processing and analysis. The commonly used separable transforms such as wavelets axe not best suited for images due to their inability to exploit directional regularities such as edges and oriented textural patterns; while most of the recently proposed directional schemes cannot represent these two types of features in a unified transform. This thesis focuses on the development of directional representations for images which can capture both edges and textures in a multiresolution manner. The thesis first considers the problem of extracting linear features with the multiresolution Fourier transform (MFT). Based on a previous MFT-based linear feature model, the work extends the extraction method into the situation when the image is corrupted by noise. The problem is tackled by the combination of a "Signal+Noise" frequency model, a refinement stage and a robust classification scheme. As a result, the MFT is able to perform linear feature analysis on noisy images on which previous methods failed. A new set of transforms called the multiscale polar cosine transforms (MPCT) are also proposed in order to represent textures. The MPCT can be regarded as real-valued MFT with similar basis functions of oriented sinusoids. It is shown that the transform can represent textural patches more efficiently than the conventional Fourier basis. With a directional best cosine basis, the MPCT packet (MPCPT) is shown to be an efficient representation for edges and textures, despite its high computational burden. The problem of representing edges and textures in a fixed transform with less complexity is then considered. This is achieved by applying a Gaussian frequency filter, which matches the disperson of the magnitude spectrum, on the local MFT coefficients. This is particularly effective in denoising natural images, due to its ability to preserve both types of feature. Further improvements can be made by employing the information given by the linear feature extraction process in the filter's configuration. The denoising results compare favourably against other state-of-the-art directional representations

    Relative-fuzzy: a novel approach for handling complex ambiguity for software engineering of data mining models

    Get PDF
    There are two main defined classes of uncertainty namely: fuzziness and ambiguity, where ambiguity is ‘one-to-many’ relationship between syntax and semantic of a proposition. This definition seems that it ignores ‘many-to-many’ relationship ambiguity type of uncertainty. In this thesis, we shall use complex-uncertainty to term many-to-many relationship ambiguity type of uncertainty. This research proposes a new approach for handling the complex ambiguity type of uncertainty that may exist in data, for software engineering of predictive Data Mining (DM) classification models. The proposed approach is based on Relative-Fuzzy Logic (RFL), a novel type of fuzzy logic. RFL defines a new formulation of the problem of ambiguity type of uncertainty in terms of States Of Proposition (SOP). RFL describes its membership (semantic) value by using the new definition of Domain of Proposition (DOP), which is based on the relativity principle as defined by possible-worlds logic. To achieve the goal of proposing RFL, a question is needed to be answered, which is: how these two approaches; i.e. fuzzy logic and possible-world, can be mixed to produce a new membership value set (and later logic) that able to handle fuzziness and multiple viewpoints at the same time? Achieving such goal comes via providing possible world logic the ability to quantifying multiple viewpoints and also model fuzziness in each of these multiple viewpoints and expressing that in a new set of membership value. Furthermore, a new architecture of Hierarchical Neural Network (HNN) called ML/RFL-Based Net has been developed in this research, along with a new learning algorithm and new recalling algorithm. The architecture, learning algorithm and recalling algorithm of ML/RFL-Based Net follow the principles of RFL. This new type of HNN is considered to be a RFL computation machine. The ability of the Relative Fuzzy-based DM prediction model to tackle the problem of complex ambiguity type of uncertainty has been tested. Special-purpose Integrated Development Environment (IDE) software, which generates a DM prediction model for speech recognition, has been developed in this research too, which is called RFL4ASR. This special purpose IDE is an extension of the definition of the traditional IDE. Using multiple sets of TIMIT speech data, the prediction model of type ML/RFL-Based Net has classification accuracy of 69.2308%. This accuracy is higher than the best achievements of WEKA data mining machines given the same speech data
    corecore