Search CORE

6 research outputs found

Self-Organising Stochastic Encoders

Author: Luttrell Stephen
Publication venue
Publication date: 01/01/2010
Field of study

The processing of mega-dimensional data, such as images, scales linearly with image size only if fixed size processing windows are used. It would be very useful to be able to automate the process of sizing and interconnecting the processing windows. A stochastic encoder that is an extension of the standard Linde-Buzo-Gray vector quantiser, called a stochastic vector quantiser (SVQ), includes this required behaviour amongst its emergent properties, because it automatically splits the input space into statistically independent subspaces, which it then separately encodes. Various optimal SVQs have been obtained, both analytically and numerically. Analytic solutions which demonstrate how the input space is split into independent subspaces may be obtained when an SVQ is used to encode data that lives on a 2-torus (e.g. the superposition of a pair of uncorrelated sinusoids). Many numerical solutions have also been obtained, using both SVQs and chains of linked SVQs: (1) images of multiple independent targets (encoders for single targets emerge), (2) images of multiple correlated targets (various types of encoder for single and multiple targets emerge), (3) superpositions of various waveforms (encoders for the separate waveforms emerge - this is a type of independent component analysis (ICA)), (4) maternal and foetal ECGs (another example of ICA), (5) images of textures (orientation maps and dominance stripes emerge). Overall, SVQs exhibit a rich variety of self-organising behaviour, which effectively discovers the internal structure of the training data. This should have an immediate impact on "intelligent" computation, because it reduces the need for expert human intervention in the design of data processing algorithms.Comment: 23 pages, 23 figure

arXiv.org e-Print Archive

CiteSeerX

Stochastic Vector Quantisers

Author: Luttrell Stephen
Publication venue
Publication date: 01/01/2010
Field of study

In this paper a stochastic generalisation of the standard Linde-Buzo-Gray (LBG) approach to vector quantiser (VQ) design is presented, in which the encoder is implemented as the sampling of a vector of code indices from a probability distribution derived from the input vector, and the decoder is implemented as a superposition of reconstruction vectors, and the stochastic VQ is optimised using a minimum mean Euclidean reconstruction distortion criterion, as in the LBG case. Numerical simulations are used to demonstrate how this leads to self-organisation of the stochastic VQ, where different stochastically sampled code indices become associated with different input subspaces. This property may be used to automate the process of splitting high-dimensional input vectors into low-dimensional blocks before encoding them.Comment: 22 pages, 12 figure

arXiv.org e-Print Archive

CiteSeerX

The Development of Novel Pulse Shape Analysis Algorithms for AGATA

Author: Holloway Fraser
Publication venue
Publication date
Field of study

In the field of Nuclear Physics the use of large-scale γ-Ray Tracking (GRT) for arrays like the Advanced Gamma Tracking Array (AGATA) is critical in pushing the envelope of our understanding of the complex phenomena that govern our universe. GRT allows for AGATA to track γ-rays across crystals within the array, allowing for unrivalled Doppler correction and Compton add-back. In order for GRT to function effectively, the interaction position and energy depositions of γ-rays within the array must be effectively determined using Pulse Shape Analysis (PSA). Within AGATA, optimisation-based PSA methods are used to localise γ-ray interactions by comparing experimental detector signals against a simulated basis. A simulated basis has been produced for the A005 AGATA detector crystal, which was used to underpin the development and evaluation of novel PSA methods. Machine Learning was also utilised to perform signal discrimination, compression, correction & regression. Graph-Accelerated k-Nearest Neighbour techniques for PSA were profiled and found to offer significant improvements to execution rate and accuracy. An extensive investigation into the performance of the PSA algorithms with respect to noise level, timeshifting and embedded dimensionality was performed to determine to the most effective algorithm of PSA for AGATA. By utilising the GPU & graph-accelerated algorithm Facebook AI Similarity Search (FAISS) on a principal component analysis reduced 100D embedding, comparable accuracy to the accepted standard was found with an ∼ 43, 000% increase in execution rate. The mathematical framework for the efficient precomputation of the responses of γ-rays that interact multiple times across the crystal (High-Fold) is proposed that should allow the augmentation of Fold-1 kNN search to work on High-Fold with minimal penalty to execution rate. It has also been demonstrated that FAISS can successfully reconstruct a variety of experimental data acquired with AGATA detector crystals

University of Liverpool Repository

Advanced signal processing techniques for pitch synchronous sinusoidal speech coders

Author: Edwards Richard
Publication venue
Publication date: 01/01/2007
Field of study

Recent trends in commercial and consumer demand have led to the increasing use of multimedia applications in mobile and Internet telephony. Although audio, video and data communications are becoming more prevalent, a major application is and will remain the transmission of speech. Speech coding techniques suited to these new trends must be developed, not only to provide high quality speech communication but also to minimise the required bandwidth for speech, so as to maximise that available for the new audio, video and data services. The majority of current speech coders employed in mobile and Internet applications employ a Code Excited Linear Prediction (CELP) model. These coders attempt to reproduce the input speech signal and can produce high quality synthetic speech at bit rates above 8 kbps. Sinusoidal speech coders tend to dominate at rates below 6 kbps but due to limitations in the sinusoidal speech coding model, their synthetic speech quality cannot be significantly improved even if their bit rate is increased. Recent developments have seen the emergence and application of Pitch Synchronous (PS) speech coding techniques to these coders in order to remove the limitations of the sinusoidal speech coding model. The aim of the research presented in this thesis is to investigate and eliminate the factors that limit the quality of the synthetic speech produced by PS sinusoidal coders. In order to achieve this innovative signal processing techniques have been developed. New parameter analysis and quantisation techniques have been produced which overcome many of the problems associated with applying PS techniques to sinusoidal coders. In sinusoidal based coders, two of the most important elements are the correct formulation of pitch and voicing values from the' input speech. The techniques introduced here have greatly improved these calculations resulting in a higher quality PS sinusoidal speech coder than was previously available. A new quantisation method which is able to reduce the distortion from quantising speech spectral information has also been developed. When these new techniques are utilised they effectively raise the synthetic speech quality of sinusoidal coders to a level comparable to that produced by CELP based schemes, making PS sinusoidal coders a promising alternative at low to medium bit rates.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

OpenGrey Repository

Energy efficient enabling technologies for semantic video processing on mobile devices

Author: Larkin Daniel
Publication venue: Dublin City University. Centre for Digital Video Processing (CDVP)
Publication date: 01/11/2008
Field of study

Semantic object-based processing will play an increasingly important role in future multimedia systems due to the ubiquity of digital multimedia capture/playback technologies and increasing storage capacity. Although the object based paradigm has many undeniable benefits, numerous technical challenges remain before the applications becomes pervasive, particularly on computational constrained mobile devices. A fundamental issue is the ill-posed problem of semantic object segmentation. Furthermore, on battery powered mobile computing devices, the additional algorithmic complexity of semantic object based processing compared to conventional video processing is highly undesirable both from a real-time operation and battery life perspective. This thesis attempts to tackle these issues by firstly constraining the solution space and focusing on the human face as a primary semantic concept of use to users of mobile devices. A novel face detection algorithm is proposed, which from the outset was designed to be amenable to be offloaded from the host microprocessor to dedicated hardware, thereby providing real-time performance and reducing power consumption. The algorithm uses an Artificial Neural Network (ANN), whose topology and weights are evolved via a genetic algorithm (GA). The computational burden of the ANN evaluation is offloaded to a dedicated hardware accelerator, which is capable of processing any evolved network topology. Efficient arithmetic circuitry, which leverages modified Booth recoding, column compressors and carry save adders, is adopted throughout the design. To tackle the increased computational costs associated with object tracking or object based shape encoding, a novel energy efficient binary motion estimation architecture is proposed. Energy is reduced in the proposed motion estimation architecture by minimising the redundant operations inherent in the binary data. Both architectures are shown to compare favourable with the relevant prior art

DCU Online Research Access Service

On Automatic Music Genre Recognition by Sparse Representation Classification using Auditory Temporal Modulations

Author: Noorzad Pardis
Sturm Bob L.
Publication venue
Publication date: 01/01/2012
Field of study

VBN