74 research outputs found

    Graph-based Estimation of Information Divergence Functions

    Get PDF
    abstract: Information divergence functions, such as the Kullback-Leibler divergence or the Hellinger distance, play a critical role in statistical signal processing and information theory; however estimating them can be challenge. Most often, parametric assumptions are made about the two distributions to estimate the divergence of interest. In cases where no parametric model fits the data, non-parametric density estimation is used. In statistical signal processing applications, Gaussianity is usually assumed since closed-form expressions for common divergence measures have been derived for this family of distributions. Parametric assumptions are preferred when it is known that the data follows the model, however this is rarely the case in real-word scenarios. Non-parametric density estimators are characterized by a very large number of parameters that have to be tuned with costly cross-validation. In this dissertation we focus on a specific family of non-parametric estimators, called direct estimators, that bypass density estimation completely and directly estimate the quantity of interest from the data. We introduce a new divergence measure, the DpD_p-divergence, that can be estimated directly from samples without parametric assumptions on the distribution. We show that the DpD_p-divergence bounds the binary, cross-domain, and multi-class Bayes error rates and, in certain cases, provides provably tighter bounds than the Hellinger divergence. In addition, we also propose a new methodology that allows the experimenter to construct direct estimators for existing divergence measures or to construct new divergence measures with custom properties that are tailored to the application. To examine the practical efficacy of these new methods, we evaluate them in a statistical learning framework on a series of real-world data science problems involving speech-based monitoring of neuro-motor disorders.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

    Learning and recognition by a dynamical system with a plastic velocity field

    Get PDF
    Learning is a mechanism intrinsic to all sentient biological systems. Despite the diverse range of paradigms that exist, it appears that an artificial system has yet to be developed that can emulate learning with a comparable degree of accuracy or efficiency to the human brain. With the development of new approaches comes the opportunity to reduce this disparity in performance. A model presented by Janson and Marsden [arXiv:1107.0674 (2011)] (Memory foam model) redefines the critical features that an intelligent system should demonstrate. Rather than focussing on the topological constraints of the rigid neuron structure, the emphasis is placed on the on-line, unsupervised, classification, retention and recognition of stimuli. In contrast to traditional AI approaches, the system s memory is not plagued by spurious attractors or the curse of dimensionality. The ability to continuously learn, whilst simultaneously recognising aspects of a stimuli ensures that this model more closely embodies the operations occurring in the brain than many other AI approaches. Here we consider the pertinent deficiencies of classical artificial learning models before introducing and developing this memory foam self-shaping system. As this model is relatively new, its limitations are not yet apparent. These must be established by testing the model in various complex environments. Here we consider its ability to learn and recognize the RGB colours composing cartoons as observed via a web-camera. The self-shaping vector field of the system is shown to adjust its composition to reflect the distribution of three-dimensional inputs. The model builds a memory of its experiences and is shown to recognize unfamiliar colours by locating the most appropriate class with which to associate a stimuli. In addition, we discuss a method to map a three-dimensional RGB input onto a line spectrum of colours. The corresponding reduction of the models dimensions is shown to dramatically improve computational speed, however, the model is then restricted to a much smaller set of representable colours. This models prototype offers a gradient description of recognition, it is evident that a more complex, non-linear alternative may be used to better characterize the classes of the system. It is postulated that non-linear attractors may be utilized to convey the concept of hierarchy that relates the different classes of the system. We relate the dynamics of the van der Pol oscillator to this plastic self-shaping system, first demonstrating the recognition of stimuli with limit cycle trajectories. The location and frequency of each cycle is dependent on the topology of the systems energy potential. For a one-dimensional stimuli the dynamics are restricted to the cycle, the extension of the model to an N dimensional stimuli is approached via the coupling of N oscillators. Here we study systems of up to three mutually coupled oscillators and relate limit cycles, fixed points and quasi-periodic orbits to the recognition of stimuli

    Sparse Modeling for Image and Vision Processing

    Get PDF
    In recent years, a large amount of multi-disciplinary research has been conducted on sparse models and their applications. In statistics and machine learning, the sparsity principle is used to perform model selection---that is, automatically selecting a simple model among a large collection of them. In signal processing, sparse coding consists of representing data with linear combinations of a few dictionary elements. Subsequently, the corresponding tools have been widely adopted by several scientific communities such as neuroscience, bioinformatics, or computer vision. The goal of this monograph is to offer a self-contained view of sparse modeling for visual recognition and image processing. More specifically, we focus on applications where the dictionary is learned and adapted to data, yielding a compact representation that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics and Visio

    Estimating the Rate-Distortion Function by Wasserstein Gradient Descent

    Full text link
    In the theory of lossy compression, the rate-distortion (R-D) function R(D)R(D) describes how much a data source can be compressed (in bit-rate) at any given level of fidelity (distortion). Obtaining R(D)R(D) for a given data source establishes the fundamental performance limit for all compression algorithms. We propose a new method to estimate R(D)R(D) from the perspective of optimal transport. Unlike the classic Blahut--Arimoto algorithm which fixes the support of the reproduction distribution in advance, our Wasserstein gradient descent algorithm learns the support of the optimal reproduction distribution by moving particles. We prove its local convergence and analyze the sample complexity of our R-D estimator based on a connection to entropic optimal transport. Experimentally, we obtain comparable or tighter bounds than state-of-the-art neural network methods on low-rate sources while requiring considerably less tuning and computation effort. We also highlight a connection to maximum-likelihood deconvolution and introduce a new class of sources that can be used as test cases with known solutions to the R-D problem.Comment: Accepted as conference paper at NeurIPS 202

    Viterbi algorithm in continuous-phase frequency shift keying

    Get PDF
    The Viterbi algorithm, an application of dynamic programming, is widely used for estimation and detection problems in digital communications and signal processing. It is used to detect signals in communication channels with memory, and to decode sequential error-control codes that are used to enhance the performance of digital communication systems. The Viterbi algorithm is also used in speech and character recognition tasks where the speech signals or characters are modeled by hidden Markov models. This project explains the basics of the Viterbi algorithm as applied to systems in digital communication systems, and speech and character recognition. It also focuses on the operations and the practical memory requirements to implement the Viterbi algorithm in real-time. A forward error correction technique known as convolutional coding with Viterbi decoding was explored. In this project, the basic Viterbi decoder behavior model was built and simulated. The convolutional encoder, BPSK and AWGN channel were implemented in MATLAB code. The BER was tested to evaluate the decoding performance. The theory of Viterbi Algorithm is introduced based on convolutional coding. The application of Viterbi Algorithm in the Continuous-Phase Frequency Shift Keying (CPFSK) is presented. Analysis for the performance is made and compared with the conventional coherent estimator. The main issue of this thesis is to implement the RTL level model of Viterbi decoder. The RTL Viterbi decoder model includes the Branch Metric block, the Add-Compare-Select block, the trace-back block, the decoding block and next state block. With all done, we further understand about the Viterbi decoding algorithm

    Fast high-dimensional Bayesian classification and clustering

    Get PDF
    We introduce a fast approach to classification and clustering applicable to high-dimensional continuous data, based on Bayesian mixture models for which explicit computations are available. This permits us to treat classification and clustering in a single framework, and allows calculation of unobserved class probability. The new classifier is robust to adding noise variables as a drawback of the built-in spike-and-slab structure of the proposed Bayesian model. The usefulness of classification using our method is shown on metabololomic example, and on the Iris data with and without noise variables. Agglomerative hierarchical clustering is used to construct a dendrogram based on the posterior probabilities of particular partitions, to provide a dendrogram with a probabilistic interpretation. An extension to variable selection is proposed which summarises the importance of variables for classification or clustering and has probabilistic interpretation. Having a simple model provides estimation of the model parameters using maximum likelihood and therefore yields a fully automatic algorithm. The new clustering method is applied to metabolomic, microarray, and image data and is studied using simulated data motivated by real datasets. The computational difficulties of the new approach are discussed, solutions for algorithm acceleration are proposed, and the written computer code is briefly analysed. Simulations shows that the quality of the estimated model parameters depends on the parametric distribution assumed for effects, but after fixing the model parameters to reasonable values, the distribution of the effects influences clustering very little. Simulations confirms that the clustering algorithm and the proposed variable selection method is reliable when the model assumptions are wrong. The new approach is compared with the popular Bayesian clustering alternative, MCLUST, fitted on the principal components using two loss functions in which our proposed approach is found to be more efficient in almost every situation
    corecore