Search CORE

156 research outputs found

Computer Models for Musical Instrument Identification

Author: Chetry Nicolas D.
Publication venue
Publication date: 01/01/2006
Field of study

PhDA particular aspect in the perception of sound is concerned with what is commonly termed as texture or timbre. From a perceptual perspective, timbre is what allows us to distinguish sounds that have similar pitch and loudness. Indeed most people are able to discern a piano tone from a violin tone or able to distinguish different voices or singers. This thesis deals with timbre modelling. Specifically, the formant theory of timbre is the main theme throughout. This theory states that acoustic musical instrument sounds can be characterised by their formant structures. Following this principle, the central point of our approach is to propose a computer implementation for building musical instrument identification and classification systems. Although the main thrust of this thesis is to propose a coherent and unified approach to the musical instrument identification problem, it is oriented towards the development of algorithms that can be used in Music Information Retrieval (MIR) frameworks. Drawing on research in speech processing, a complete supervised system taking into account both physical and perceptual aspects of timbre is described. The approach is composed of three distinct processing layers. Parametric models that allow us to represent signals through mid-level physical and perceptual representations are considered. Next, the use of the Line Spectrum Frequencies as spectral envelope and formant descriptors is emphasised. Finally, the use of generative and discriminative techniques for building instrument and database models is investigated. Our system is evaluated under realistic recording conditions using databases of isolated notes and melodic phrases

Queen Mary Research Online

OpenGrey Repository

Recommended from our members

Image coding employing vector quantisation

Author: Kubrick A. H.
Publication venue
Publication date
Field of study

The work described in this thesis is concerned with the coding of digitised images employing vector quantisation (VQ). A new VQ-based coding system, named Directional Classified Gain-Shape Vector Quantisation (DCGSVQ), has been developed. It combines vector quantisation with transform coding tech-niques and exploits various properties of the human visual system (HVS) like frequency sensitivity, the masking effect, and orientation sensitivity, to produce reconstructed images with good subjective quality at low bit rates (0.48 bit per pixel). A content classifier, operating in the spatial domain, is employed to classify each image block of 8x8 pixels into one of several classes which represent various image patterns (edges in various directions, monotone areas, complex texture, etc.). Then a classified gain-shape vector quantiser is employed in the cosine domain to encode vectors of AC transform coefficients, while using either a scalar quantiser or a gain-shape vector quantiser to encode the DC coefficients. A new vector configuration strategy for defining AC vectors in the cosine domain has been proposed to better adapt the system to the local statistics of the image blocks. Accordingly, the AC coefficients are first weighted by an equivalent modulation transfer function (MTF) that represents the filtering characteristics of the HVS, and then they are grouped into directional vectors according to their direction in the cosine domain. An optional simple method for feature enhancement, based on inherent properties of the proposed strategy, has also been proposed enabling further image processing at the receiver. A new algorithm for designing the various DCGSVQ codebooks has been developed in two steps. First, a general-purpose new algorithm for classified VQ (CVQ) codebook design has been developed as an alternative to empirical methods proposed in the literature. The new algorithm provides a simple and systematic method for codebook design and reduces considerably the total num-ber of mathematical operations during codebook design. We have named this new algorithm Classified Nearest Neighbour Clustering (CNNC). A fast search algorithm has also been developed to reduce further computational efforts during codebook design. Secondly, a new optimisation criterion which is more suitable for shape code-book design has been developed and employed within the CNNC algorithm to design classified shape codebooks for the DCGSVQ. We have named this algo-rithm modified CNNC. The new algorithm designs the various shape codebooks simultaneously giving the designer full freedom to assign more importance to certain classes of vectors or to certain training vectors. The DCGSVQ system has been shown to outperform the full search VQ, the CVQ, and the transform coding CVQ (TC-CVQ) producing nicer coded images with better signal to noise ratio (SNR) figures at various bit rates. To improve further the perceived quality of coded images, a new postpro-cessing algorithm that can be applied at the decoder without increasing the bit rate has been developed. The proposed algorithm is based on various charac-teristics of the signal spectrum and the noise spectrum, and exploits various properties of the HVS. The proposed algorithm is a general-purpose algorithm that can be applied to block-coded images produced by various systems like VQ, transform coding (TC), and Block Truncation Coding (BTC). The algorithm is modular and can be applied in an adaptive way depending on the quality of the block-coded image. The last theme of this work has been the identification of useful fidelity criteria for image quality assessment. Quality predictors in the form of some subjectively weighted error measures were sought such that a smooth functional relationship exists between them and quality ratings made by human viewers. Quality predictors that incorporate simplified models of the HVS have been proposed and tested on a large set of VQ-coded images. Two such predictors have been shown to be better suited for image quality assessment than the commonly used mean square error (MSE) measure

City Research Online

Fractal image compression and the self-affinity assumption : a stochastic signal modelling perspective

Author: Wohlberg Brendt
Publication venue: Department of Electrical Engineering
Publication date: 01/01/1996
Field of study

Bibliography: p. 208-225.Fractal image compression is a comparatively new technique which has gained considerable attention in the popular technical press, and more recently in the research literature. The most significant advantages claimed are high reconstruction quality at low coding rates, rapid decoding, and "resolution independence" in the sense that an encoded image may be decoded at a higher resolution than the original. While many of the claims published in the popular technical press are clearly extravagant, it appears from the rapidly growing body of published research that fractal image compression is capable of performance comparable with that of other techniques enjoying the benefit of a considerably more robust theoretical foundation. . So called because of the similarities between the form of image representation and a mechanism widely used in generating deterministic fractal images, fractal compression represents an image by the parameters of a set of affine transforms on image blocks under which the image is approximately invariant. Although the conditions imposed on these transforms may be shown to be sufficient to guarantee that an approximation of the original image can be reconstructed, there is no obvious theoretical reason to expect this to represent an efficient representation for image coding purposes. The usual analogy with vector quantisation, in which each image is considered to be represented in terms of code vectors extracted from the image itself is instructive, but transforms the fundamental problem into one of understanding why this construction results in an efficient codebook. The signal property required for such a codebook to be effective, termed "self-affinity", is poorly understood. A stochastic signal model based examination of this property is the primary contribution of this dissertation. The most significant findings (subject to some important restrictions} are that "self-affinity" is not a natural consequence of common statistical assumptions but requires particular conditions which are inadequately characterised by second order statistics, and that "natural" images are only marginally "self-affine", to the extent that fractal image compression is effective, but not more so than comparable standard vector quantisation techniques

Cape Town University OpenUCT

Novel transmission and beamforming strategies for multiuser MIMO with various CSIT types

Author: Dai Mingbo
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/12/2016
Field of study

In multiuser multi-antenna wireless systems, the transmission and beamforming strategies that achieve the sum rate capacity depend critically on the acquisition of perfect Channel State Information at the Transmitter (CSIT). Accordingly, a high-rate low-latency feedback link between the receiver and the transmitter is required to keep the latter accurately and instantaneously informed about the CSI. In realistic wireless systems, however, only imperfect CSIT is achievable due to pilot contamination, estimation error, limited feedback and delay, etc. As an intermediate solution, this thesis investigates novel transmission strategies suitable for various imperfect CSIT scenarios and the associated beamforming techniques to optimise the rate performance. First, we consider a two-user Multiple-Input-Single-Output (MISO) Broadcast Channel (BC) under statistical and delayed CSIT. We mainly focus on linear beamforming and power allocation designs for ergodic sum rate maximisation. The proposed designs enable higher sum rate than the conventional designs. Interestingly, we propose a novel transmission framework which makes better use of statistical and delayed CSIT and smoothly bridges between statistical CSIT-based strategies and delayed CSIT-based strategies. Second, we consider a multiuser massive MIMO system under partial and statistical CSIT. In order to tackle multiuser interference incurred by partial CSIT, a Rate-Splitting (RS) transmission strategy has been proposed recently. We generalise the idea of RS into the large-scale array. By further exploiting statistical CSIT, we propose a novel framework Hierarchical-Rate-Splitting that is particularly suited to massive MIMO systems. Third, we consider a multiuser Millimetre Wave (mmWave) system with hybrid analog/digital precoding under statistical and quantised CSIT. We leverage statistical CSIT to design digital precoder for interference mitigation while all feedback overhead is reserved for precise analog beamforming. For very limited feedback and/or very sparse channels, the proposed precoding scheme yields higher sum rate than the conventional precoding schemes under a fixed total feedback constraint. Moreover, a RS transmission strategy is introduced to further tackle the multiuser interference, enabling remarkable saving in feedback overhead compared with conventional transmission strategies. Finally, we investigate the downlink hybrid precoding for physical layer multicasting with a limited number of RF chains. We propose a low complexity algorithm to compute the analog precoder that achieves near-optimal max-min performance. Moreover, we derive a simple condition under which the hybrid precoding driven by a limited number of RF chains incurs no loss of optimality with respect to the fully digital precoding case.Open Acces

Spiral - Imperial College Digital Repository

Image indexing and retrieval in the compressed domain

Author: Armstrong Andrew
Publication venue
Publication date: 01/09/2003
Field of study

University of South Wales Research Explorer

Digital watermark technology in security applications

Author: Xu Xin
Publication venue: 'University of Plymouth'
Publication date: 01/01/2008
Field of study

With the rising emphasis on security and the number of fraud related crimes around the world, authorities are looking for new technologies to tighten security of identity. Among many modern electronic technologies, digital watermarking has unique advantages to enhance the document authenticity. At the current status of the development, digital watermarking technologies are not as matured as other competing technologies to support identity authentication systems. This work presents improvements in performance of two classes of digital watermarking techniques and investigates the issue of watermark synchronisation. Optimal performance can be obtained if the spreading sequences are designed to be orthogonal to the cover vector. In this thesis, two classes of orthogonalisation methods that generate binary sequences quasi-orthogonal to the cover vector are presented. One method, namely "Sorting and Cancelling" generates sequences that have a high level of orthogonality to the cover vector. The Hadamard Matrix based orthogonalisation method, namely "Hadamard Matrix Search" is able to realise overlapped embedding, thus the watermarking capacity and image fidelity can be improved compared to using short watermark sequences. The results are compared with traditional pseudo-randomly generated binary sequences. The advantages of both classes of orthogonalisation inethods are significant. Another watermarking method that is introduced in the thesis is based on writing-on-dirty-paper theory. The method is presented with biorthogonal codes that have the best robustness. The advantage and trade-offs of using biorthogonal codes with this watermark coding methods are analysed comprehensively. The comparisons between orthogonal and non-orthogonal codes that are used in this watermarking method are also made. It is found that fidelity and robustness are contradictory and it is not possible to optimise them simultaneously. Comparisons are also made between all proposed methods. The comparisons are focused on three major performance criteria, fidelity, capacity and robustness. aom two different viewpoints, conclusions are not the same. For fidelity-centric viewpoint, the dirty-paper coding methods using biorthogonal codes has very strong advantage to preserve image fidelity and the advantage of capacity performance is also significant. However, from the power ratio point of view, the orthogonalisation methods demonstrate significant advantage on capacity and robustness. The conclusions are contradictory but together, they summarise the performance generated by different design considerations. The synchronisation of watermark is firstly provided by high contrast frames around the watermarked image. The edge detection filters are used to detect the high contrast borders of the captured image. By scanning the pixels from the border to the centre, the locations of detected edges are stored. The optimal linear regression algorithm is used to estimate the watermarked image frames. Estimation of the regression function provides rotation angle as the slope of the rotated frames. The scaling is corrected by re-sampling the upright image to the original size. A theoretically studied method that is able to synchronise captured image to sub-pixel level accuracy is also presented. By using invariant transforms and the "symmetric phase only matched filter" the captured image can be corrected accurately to original geometric size. The method uses repeating watermarks to form an array in the spatial domain of the watermarked image and the the array that the locations of its elements can reveal information of rotation, translation and scaling with two filtering processes

Plymouth Electronic Archive and Research Library