156 research outputs found
Computer Models for Musical Instrument Identification
PhDA particular aspect in the perception of sound is concerned with what is commonly
termed as texture or timbre. From a perceptual perspective, timbre is what allows us
to distinguish sounds that have similar pitch and loudness. Indeed most people are
able to discern a piano tone from a violin tone or able to distinguish different voices
or singers.
This thesis deals with timbre modelling. Specifically, the formant theory of timbre
is the main theme throughout. This theory states that acoustic musical instrument
sounds can be characterised by their formant structures. Following this principle, the
central point of our approach is to propose a computer implementation for building
musical instrument identification and classification systems.
Although the main thrust of this thesis is to propose a coherent and unified
approach to the musical instrument identification problem, it is oriented towards the
development of algorithms that can be used in Music Information Retrieval (MIR)
frameworks. Drawing on research in speech processing, a complete supervised system
taking into account both physical and perceptual aspects of timbre is described.
The approach is composed of three distinct processing layers. Parametric models
that allow us to represent signals through mid-level physical and perceptual representations
are considered. Next, the use of the Line Spectrum Frequencies as spectral
envelope and formant descriptors is emphasised. Finally, the use of generative and
discriminative techniques for building instrument and database models is investigated.
Our system is evaluated under realistic recording conditions using databases of isolated
notes and melodic phrases
Recommended from our members
Image coding employing vector quantisation
The work described in this thesis is concerned with the coding of digitised images employing vector quantisation (VQ). A new VQ-based coding system, named Directional Classified Gain-Shape Vector Quantisation (DCGSVQ), has been developed. It combines vector quantisation with transform coding tech-niques and exploits various properties of the human visual system (HVS) like frequency sensitivity, the masking effect, and orientation sensitivity, to produce reconstructed images with good subjective quality at low bit rates (0.48 bit per pixel).
A content classifier, operating in the spatial domain, is employed to classify each image block of 8x8 pixels into one of several classes which represent various image patterns (edges in various directions, monotone areas, complex texture, etc.). Then a classified gain-shape vector quantiser is employed in the cosine domain to encode vectors of AC transform coefficients, while using either a scalar quantiser or a gain-shape vector quantiser to encode the DC coefficients. A new vector configuration strategy for defining AC vectors in the cosine domain has been proposed to better adapt the system to the local statistics of the image blocks. Accordingly, the AC coefficients are first weighted by an equivalent modulation transfer function (MTF) that represents the filtering characteristics of the HVS, and then they are grouped into directional vectors according to their direction in the cosine domain. An optional simple method for feature enhancement, based on inherent properties of the proposed strategy, has also been proposed enabling further image processing at the receiver.
A new algorithm for designing the various DCGSVQ codebooks has been developed in two steps. First, a general-purpose new algorithm for classified VQ (CVQ) codebook design has been developed as an alternative to empirical methods proposed in the literature. The new algorithm provides a simple and systematic method for codebook design and reduces considerably the total num-ber of mathematical operations during codebook design. We have named this new algorithm Classified Nearest Neighbour Clustering (CNNC). A fast search algorithm has also been developed to reduce further computational efforts during codebook design.
Secondly, a new optimisation criterion which is more suitable for shape code-book design has been developed and employed within the CNNC algorithm to design classified shape codebooks for the DCGSVQ. We have named this algo-rithm modified CNNC. The new algorithm designs the various shape codebooks simultaneously giving the designer full freedom to assign more importance to certain classes of vectors or to certain training vectors. The DCGSVQ system has been shown to outperform the full search VQ, the CVQ, and the transform coding CVQ (TC-CVQ) producing nicer coded images with better signal to noise ratio (SNR) figures at various bit rates.
To improve further the perceived quality of coded images, a new postpro-cessing algorithm that can be applied at the decoder without increasing the bit rate has been developed. The proposed algorithm is based on various charac-teristics of the signal spectrum and the noise spectrum, and exploits various properties of the HVS. The proposed algorithm is a general-purpose algorithm that can be applied to block-coded images produced by various systems like VQ, transform coding (TC), and Block Truncation Coding (BTC). The algorithm is modular and can be applied in an adaptive way depending on the quality of the block-coded image.
The last theme of this work has been the identification of useful fidelity criteria for image quality assessment. Quality predictors in the form of some subjectively weighted error measures were sought such that a smooth functional relationship exists between them and quality ratings made by human viewers. Quality predictors that incorporate simplified models of the HVS have been proposed and tested on a large set of VQ-coded images. Two such predictors have been shown to be better suited for image quality assessment than the commonly used mean square error (MSE) measure
Fractal image compression and the self-affinity assumption : a stochastic signal modelling perspective
Bibliography: p. 208-225.Fractal image compression is a comparatively new technique which has gained considerable attention in the popular technical press, and more recently in the research literature. The most significant advantages claimed are high reconstruction quality at low coding rates, rapid decoding, and "resolution independence" in the sense that an encoded image may be decoded at a higher resolution than the original. While many of the claims published in the popular technical press are clearly extravagant, it appears from the rapidly growing body of published research that fractal image compression is capable of performance comparable with that of other techniques enjoying the benefit of a considerably more robust theoretical foundation. . So called because of the similarities between the form of image representation and a mechanism widely used in generating deterministic fractal images, fractal compression represents an image by the parameters of a set of affine transforms on image blocks under which the image is approximately invariant. Although the conditions imposed on these transforms may be shown to be sufficient to guarantee that an approximation of the original image can be reconstructed, there is no obvious theoretical reason to expect this to represent an efficient representation for image coding purposes. The usual analogy with vector quantisation, in which each image is considered to be represented in terms of code vectors extracted from the image itself is instructive, but transforms the fundamental problem into one of understanding why this construction results in an efficient codebook. The signal property required for such a codebook to be effective, termed "self-affinity", is poorly understood. A stochastic signal model based examination of this property is the primary contribution of this dissertation. The most significant findings (subject to some important restrictions} are that "self-affinity" is not a natural consequence of common statistical assumptions but requires particular conditions which are inadequately characterised by second order statistics, and that "natural" images are only marginally "self-affine", to the extent that fractal image compression is effective, but not more so than comparable standard vector quantisation techniques
Novel transmission and beamforming strategies for multiuser MIMO with various CSIT types
In multiuser multi-antenna wireless systems, the transmission and beamforming strategies that achieve the sum rate capacity depend critically on the acquisition of perfect Channel State Information at the Transmitter (CSIT).
Accordingly, a high-rate low-latency feedback link between the receiver and the transmitter is required to keep the latter accurately and instantaneously informed about the CSI.
In realistic wireless systems, however, only imperfect CSIT is achievable due to pilot contamination, estimation error, limited feedback and delay, etc.
As an intermediate solution, this thesis investigates novel transmission strategies suitable for various imperfect CSIT scenarios and the associated beamforming techniques to optimise the rate performance.
First, we consider a two-user Multiple-Input-Single-Output (MISO) Broadcast Channel (BC) under statistical and delayed CSIT.
We mainly focus on linear beamforming and power allocation designs for ergodic sum rate maximisation.
The proposed designs enable higher sum rate than the conventional designs.
Interestingly, we propose a novel transmission framework which makes better use of statistical and delayed CSIT and smoothly bridges between statistical CSIT-based strategies and delayed CSIT-based strategies.
Second, we consider a multiuser massive MIMO system under partial and statistical CSIT.
In order to tackle multiuser interference incurred by partial CSIT, a Rate-Splitting (RS) transmission strategy has been proposed recently.
We generalise the idea of RS into the large-scale array.
By further exploiting statistical CSIT, we propose a novel framework Hierarchical-Rate-Splitting that is particularly suited to massive MIMO systems.
Third, we consider a multiuser Millimetre Wave (mmWave) system with hybrid analog/digital precoding under statistical and quantised CSIT.
We leverage statistical CSIT to design digital precoder for interference mitigation while all feedback overhead is reserved for precise analog beamforming.
For very limited feedback and/or very sparse channels, the proposed precoding scheme yields higher sum rate than the conventional precoding schemes under a fixed total feedback constraint.
Moreover, a RS transmission strategy is introduced to further tackle the multiuser interference, enabling remarkable saving in feedback overhead compared with conventional transmission strategies.
Finally, we investigate the downlink hybrid precoding for physical layer multicasting with a limited number of RF chains.
We propose a low complexity algorithm to compute the analog precoder that achieves near-optimal max-min performance.
Moreover, we derive a simple condition under which the hybrid precoding driven by a limited number of RF chains incurs no loss of optimality with respect to the fully digital precoding case.Open Acces
Digital watermark technology in security applications
With the rising emphasis on security and the number of fraud related crimes
around the world, authorities are looking for new technologies to tighten
security of identity. Among many modern electronic technologies, digital
watermarking has unique advantages to enhance the document authenticity.
At the current status of the development, digital watermarking technologies
are not as matured as other competing technologies to support identity authentication
systems. This work presents improvements in performance of
two classes of digital watermarking techniques and investigates the issue of
watermark synchronisation.
Optimal performance can be obtained if the spreading sequences are designed
to be orthogonal to the cover vector. In this thesis, two classes of
orthogonalisation methods that generate binary sequences quasi-orthogonal
to the cover vector are presented. One method, namely "Sorting and Cancelling"
generates sequences that have a high level of orthogonality to the
cover vector. The Hadamard Matrix based orthogonalisation method, namely
"Hadamard Matrix Search" is able to realise overlapped embedding, thus the
watermarking capacity and image fidelity can be improved compared to using
short watermark sequences. The results are compared with traditional
pseudo-randomly generated binary sequences. The advantages of both classes
of orthogonalisation inethods are significant.
Another watermarking method that is introduced in the thesis is based
on writing-on-dirty-paper theory. The method is presented with biorthogonal
codes that have the best robustness. The advantage and trade-offs of
using biorthogonal codes with this watermark coding methods are analysed
comprehensively. The comparisons between orthogonal and non-orthogonal
codes that are used in this watermarking method are also made. It is found
that fidelity and robustness are contradictory and it is not possible to optimise
them simultaneously.
Comparisons are also made between all proposed methods. The comparisons
are focused on three major performance criteria, fidelity, capacity and
robustness. aom two different viewpoints, conclusions are not the same. For
fidelity-centric viewpoint, the dirty-paper coding methods using biorthogonal
codes has very strong advantage to preserve image fidelity and the advantage
of capacity performance is also significant. However, from the power
ratio point of view, the orthogonalisation methods demonstrate significant
advantage on capacity and robustness. The conclusions are contradictory
but together, they summarise the performance generated by different design
considerations.
The synchronisation of watermark is firstly provided by high contrast
frames around the watermarked image. The edge detection filters are used
to detect the high contrast borders of the captured image. By scanning
the pixels from the border to the centre, the locations of detected edges
are stored. The optimal linear regression algorithm is used to estimate the
watermarked image frames. Estimation of the regression function provides
rotation angle as the slope of the rotated frames. The scaling is corrected by
re-sampling the upright image to the original size. A theoretically studied
method that is able to synchronise captured image to sub-pixel level accuracy
is also presented. By using invariant transforms and the "symmetric
phase only matched filter" the captured image can be corrected accurately
to original geometric size. The method uses repeating watermarks to form an
array in the spatial domain of the watermarked image and the the array that
the locations of its elements can reveal information of rotation, translation
and scaling with two filtering processes
- …