37 research outputs found

    Optimization of Alpha-Beta Log-Det Divergences and their Application in the Spatial Filtering of Two Class Motor Imagery Movements

    Get PDF
    The Alpha-Beta Log-Det divergences for positive definite matrices are flexible divergences that are parameterized by two real constants and are able to specialize several relevant classical cases like the squared Riemannian metric, the Steins loss, the S-divergence, etc. A novel classification criterion based on these divergences is optimized to address the problem of classification of the motor imagery movements. This research paper is divided into three main sections in order to address the above mentioned problem: (1) Firstly, it is proven that a suitable scaling of the class conditional covariance matrices can be used to link the Common Spatial Pattern (CSP) solution with a predefined number of spatial filters for each class and its representation as a divergence optimization problem by making their different filter selection policies compatible; (2) A closed form formula for the gradient of the Alpha-Beta Log-Det divergences is derived that allows to perform optimization as well as easily use it in many practical applications; (3) Finally, in similarity with the work of Samek et al. 2014, which proposed the robust spatial filtering of the motor imagery movements based on the beta-divergence, the optimization of the Alpha-Beta Log-Det divergences is applied to this problem. The resulting subspace algorithm provides a unified framework for testing the performance and robustness of the several divergences in different scenarios.Ministerio de Economía y Competitividad TEC2014-53103-

    A Survey on Metric Learning for Feature Vectors and Structured Data

    Full text link
    The need for appropriate ways to measure the distance or similarity between data is ubiquitous in machine learning, pattern recognition and data mining, but handcrafting such good metrics for specific problems is generally difficult. This has led to the emergence of metric learning, which aims at automatically learning a metric from data and has attracted a lot of interest in machine learning and related fields for the past ten years. This survey paper proposes a systematic review of the metric learning literature, highlighting the pros and cons of each approach. We pay particular attention to Mahalanobis distance metric learning, a well-studied and successful framework, but additionally present a wide range of methods that have recently emerged as powerful alternatives, including nonlinear metric learning, similarity learning and local metric learning. Recent trends and extensions, such as semi-supervised metric learning, metric learning for histogram data and the derivation of generalization guarantees, are also covered. Finally, this survey addresses metric learning for structured data, in particular edit distance learning, and attempts to give an overview of the remaining challenges in metric learning for the years to come.Comment: Technical report, 59 pages. Changes in v2: fixed typos and improved presentation. Changes in v3: fixed typos. Changes in v4: fixed typos and new method

    On robust spatial filtering of EEG in nonstationary environments

    Full text link

    Geometry Aware Deep Metric Learning

    Get PDF
    A diverse range of applications in computer vision benefit from the data representations which are dense and compact, yet discriminative enough to learn the subtle changes in the data. Such representation learning seems necessary especially in the Zero Shot Learning applications where the train and the test classes are mutually exclusive. In other words, the learned representations should be discriminative enough to identify the minute cues in the data samples such that the unseen data can be properly categorized accordingly. With the advent of Deep Neural Networks over the last few years, several metric learning algorithms have been developed to address the aforementioned challenging objective. These algorithms learn the embedding space whilst considering the relative similarity/dissimilarity relationships between the data points across the various classes. Although successful, they suffer from a number of serious drawbacks, some of which have been addressed in this thesis. As our first objective, we extended two popular optimizers, namely Stochastic Gradient Descent with Momentum (SGD-M) and RMSProp, to their respective Riemannian counterparts. Such extension deems necessary while trying to optimize a model under the constrained problem settings. Our proposal reaps the benefits of standard manifold operations while optimizing the parameters of the network that are constrained to reside on a Riemannian manifold. The experimental evaluations vividly showed that the constrained optimizers clearly outperform their non-constrained equivalents over a wide range of datasets and application settings with regards to the improved learning of the embedding space. We then turn our attention to the general training protocol of Siamese Neural Networks (SiNNs), and address a major yet obvious drawback in its training practice. SiNNs are characterized by a Positive Semi Definite (PSD) matrix M which is invariant to the action of the orthogonal group O(p); thereby resulting in an equivalence class of solutions for M. Taking such invariances into account, we proposed a novel matrix manifold qConv and used it along with the popular Stiefel manifold to exploit the invariances in the siamese networks. We made use of our constrained optimizers to optimize over these two manifolds. Our empirical evaluations clearly showed that the training of SiNNs benefit by invoking such geometrical constraints over the search space whilst making use of such invariances inherent in SiNNs. As our final contribution, we designed and developed a novel, yet effective loss function that incorporates class-wise dissimilarity relationships in learning a discriminative embedding space. Such class-wise dissimilarity relationships have not been considered in the loss functions developed till now; thereby resulting in learning of a sub-optimal embedding space. Hence, we have integrated and maximized such dissimilarity constraints using two standard variants of Sinkhorn Divergences. Further, our experimental evaluations signified the importance of enforcing such constraints in learning a superior embedding space in the presence and absence of noise

    Probabilistic methods for high dimensional signal processing

    Get PDF
    This thesis investigates the use of probabilistic and Bayesian methods for analysing high dimensional signals. The work proceeds in three main parts sharing similar objectives. Throughout we focus on building data efficient inference mechanisms geared toward high dimensional signal processing. This is achieved by using probabilistic models on top of informative data representation operators. We also improve on the fitting objective to make it better suited to our requirements. Variational Inference We introduce a variational approximation framework using direct optimisation of what is known as the scale invariant Alpha-Beta divergence (sAB-divergence). This new objective encompasses most variational objectives that use the Kullback-Leibler, the Rényi or the gamma divergences. It also gives access to objective functions never exploited before in the context of variational inference. This is achieved via two easy to interpret control parameters, which allow for a smooth interpolation over the divergence space while trading-off properties such as mass-covering of a target distribution and robustness to outliers in the data. Furthermore, the sAB variational objective can be optimised directly by re-purposing existing methods for Monte Carlo computation of complex variational objectives, leading to estimates of the divergence instead of variational lower bounds. We show the advantages of this objective on Bayesian models for regression problems. Roof-Edge hidden Markov Random Field We propose a method for semi-local Hurst estimation by incorporating a Markov random field model to constrain a wavelet-based pointwise Hurst estimator. This results in an estimator which is able to exploit the spatial regularities of a piecewise parametric varying Hurst parameter. The pointwise estimates are jointly inferred along with the parametric form of the underlying Hurst function which characterises how the Hurst parameter varies deterministically over the spatial support of the data. Unlike recent Hurst regularisation methods, the proposed approach is flexible in that arbitrary parametric forms can be considered and is extensible in as much as the associated gradient descent algorithm can accommodate a broad class of distributional assumptions without any significant modifications. The potential benefits of the approach are illustrated with simulations of various first-order polynomial forms. Scattering Hidden Markov Tree We here combine the rich, over-complete signal representation afforded by the scattering transform together with a probabilistic graphical model which captures hierarchical dependencies between coefficients at different layers. The wavelet scattering network result in a high-dimensional representation which is translation invariant and stable to deformations whilst preserving informative content. Such properties are achieved by cascading wavelet transform convolutions with non-linear modulus and averaging operators. The network structure and its distributions are described using a Hidden Markov Tree. This yields a generative model for high dimensional inference and offers a means to perform various inference tasks such as prediction. Our proposed scattering convolutional hidden Markov tree displays promising results on classification tasks of complex images in the challenging case where the number of training examples is extremely small. We also use variational methods on the aforementioned model and leverage the objective sAB variational objective defined earlier to improve the quality of the approximation

    EEG Signal Processing in Motor Imagery Brain Computer Interfaces with Improved Covariance Estimators

    Get PDF
    Desde hace unos años hasta la actualidad, el desarrollo en el campo de los interfaces cerebro ordenador ha ido aumentando. Este aumento viene motivado por una serie de factores distintos. A medida que aumenta el conocimiento acerca del cerebro humano y como funciona (del que aún se conoce relativamente poco), van surgiendo nuevos avances en los sistemas BCI que, a su vez, sirven de motivación para que se investigue más acerca de este órgano. Además, los sistemas BCI abren una puerta para que cualquier persona pueda interactuar con su entorno independientemente de la discapacidad física que pueda tener, simplemente haciendo uso de sus pensamientos. Recientemente, la industria tecnológica ha comenzado a mostrar su interés por estos sistemas, motivados tanto por los avances con respecto a lo que conocemos del cerebro y como funciona, como por el uso constante que hacemos de la tecnología en la actuali- dad, ya sea a través de nuestros smartphones, tablets u ordenadores, entre otros muchos dispositivos. Esto motiva que compañías como Facebook inviertan en el desarrollo de sistemas BCI para que tanto personas sin discapacidad como aquellas que, si las tienen, puedan comunicarse con los móviles usando solo el cerebro. El trabajo desarrollado en esta tesis se centra en los sistemas BCI basados en movimien- tos imaginarios. Esto significa que el usuario piensa en movimientos motores que son interpretados por un ordenador como comandos. Las señales cerebrales necesarias para traducir posteriormente a comandos se obtienen mediante un equipo de EEG que se coloca sobre el cuero cabelludo y que mide la actividad electromagnética producida por el cere- bro. Trabajar con estas señales resulta complejo ya que son no estacionarias y, además, suelen estar muy contaminadas por ruido o artefactos. Hemos abordado esta temática desde el punto de vista del procesado estadístico de la señal y mediante algoritmos de aprendizaje máquina. Para ello se ha descompuesto el sistema BCI en tres bloques: preprocesado de la señal, extracción de características y clasificación. Tras revisar el estado del arte de estos bloques, se ha resumido y adjun- tado un conjunto de publicaciones que hemos realizado durante los últimos años, y en las cuales podemos encontrar las diferentes aportaciones que, desde nuestro punto de vista, mejoran cada uno de los bloques anteriormente mencionados. De manera muy resumida, para el bloque de preprocesado proponemos un método mediante el cual conseguimos nor- malizar las fuentes de las señales de EEG. Al igualar las fuentes efectivas conseguimos mejorar la estima de las matrices de covarianza. Con respecto al bloque de extracción de características, hemos conseguido extender el algoritmo CSP a casos no supervisados. Por último, en el bloque de clasificación también hemos conseguido realizar una sepa- ración de clases de manera no supervisada y, por otro lado, hemos observado una mejora cuando se regulariza el algoritmo LDA mediante un método específico para Gaussianas.The research and development in the field of Brain Computer Interfaces (BCI) has been growing during the last years, motivated by several factors. As the knowledge about how the human brain is and works (of which we still know very little) grows, new advances in BCI systems are emerging that, in turn, serve as motivation to do more re- search about this organ. In addition, BCI systems open a door for anyone to interact with their environment regardless of the physical disabilities they may have, by simply using their thoughts. Recently, the technology industry has begun to show its interest in these systems, mo- tivated both by the advances about what we know of the brain and how it works, and by the constant use we make of technology nowadays, whether it is by using our smart- phones, tablets or computers, among many other devices. This motivates companies like Facebook to invest in the development of BCI systems so that people (with or without disabilities) can communicate with their devices using only their brain. The work developed in this thesis focuses on BCI systems based on motor imagery movements. This means that the user thinks of certain motor movements that are in- terpreted by a computer as commands. The brain signals that we need to translate to commands are obtained by an EEG device that is placed on the scalp and measures the electromagnetic activity produced by the brain. Working with these signals is complex since they are non-stationary and, in addition, they are usually heavily contaminated by noise or artifacts. We have approached this subject from the point of view of statistical signal processing and through machine learning algorithms. For this, the BCI system has been split into three blocks: preprocessing, feature extraction and classification. After reviewing the state of the art of these blocks, a set of publications that we have made in recent years has been summarized and attached. In these publications we can find the different contribu- tions that, from our point of view, improve each one of the blocks previously mentioned. As a brief summary, for the preprocessing block we propose a method that lets us nor- malize the sources of the EEG signals. By equalizing the effective sources, we are able to improve the estimation of the covariance matrices. For the feature extraction block, we have managed to extend the CSP algorithm for unsupervised cases. Finally, in the classification block we have also managed to perform a separation of classes in an blind way and we have also observed an improvement when the LDA algorithm is regularized by a specific method for Gaussian distributions

    Integration of Auxiliary Data Knowledge in Prototype Based Vector Quantization and Classification Models

    Get PDF
    This thesis deals with the integration of auxiliary data knowledge into machine learning methods especially prototype based classification models. The problem of classification is diverse and evaluation of the result by using only the accuracy is not adequate in many applications. Therefore, the classification tasks are analyzed more deeply. Possibilities to extend prototype based methods to integrate extra knowledge about the data or the classification goal is presented to obtain problem adequate models. One of the proposed extensions is Generalized Learning Vector Quantization for direct optimization of statistical measurements besides the classification accuracy. But also modifying the metric adaptation of the Generalized Learning Vector Quantization for functional data, i. e. data with lateral dependencies in the features, is considered.:Symbols and Abbreviations 1 Introduction 1.1 Motivation and Problem Description . . . . . . . . . . . . . . . . . 1 1.2 Utilized Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Prototype Based Methods 19 2.1 Unsupervised Vector Quantization . . . . . . . . . . . . . . . . . . 22 2.1.1 C-means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.1.2 Self-Organizing Map . . . . . . . . . . . . . . . . . . . . . . 25 2.1.3 Neural Gas . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.1.4 Common Generalizations . . . . . . . . . . . . . . . . . . . 30 2.2 Supervised Vector Quantization . . . . . . . . . . . . . . . . . . . . 35 2.2.1 The Family of Learning Vector Quantizers - LVQ . . . . . . 36 2.2.2 Generalized Learning Vector Quantization . . . . . . . . . 38 2.3 Semi-Supervised Vector Quantization . . . . . . . . . . . . . . . . 42 2.3.1 Learning Associations by Self-Organization . . . . . . . . . 42 2.3.2 Fuzzy Labeled Self-Organizing Map . . . . . . . . . . . . . 43 2.3.3 Fuzzy Labeled Neural Gas . . . . . . . . . . . . . . . . . . 45 2.4 Dissimilarity Measures . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.4.1 Differentiable Kernels in Generalized LVQ . . . . . . . . . 52 2.4.2 Dissimilarity Adaptation for Performance Improvement . 56 3 Deeper Insights into Classification Problems - From the Perspective of Generalized LVQ- 81 3.1 Classification Models . . . . . . . . . . . . . . . . . . . . . . . . . . 81 3.2 The Classification Task . . . . . . . . . . . . . . . . . . . . . . . . . 84 3.3 Evaluation of Classification Results . . . . . . . . . . . . . . . . . . 88 3.4 The Classification Task as an Ill-Posed Problem . . . . . . . . . . . 92 4 Auxiliary Structure Information and Appropriate Dissimilarity Adaptation in Prototype Based Methods 93 4.1 Supervised Vector Quantization for Functional Data . . . . . . . . 93 4.1.1 Functional Relevance/Matrix LVQ . . . . . . . . . . . . . . 95 4.1.2 Enhancement Generalized Relevance/Matrix LVQ . . . . 109 4.2 Fuzzy Information About the Labels . . . . . . . . . . . . . . . . . 121 4.2.1 Fuzzy Semi-Supervised Self-Organizing Maps . . . . . . . 122 4.2.2 Fuzzy Semi-Supervised Neural Gas . . . . . . . . . . . . . 123 5 Variants of Classification Costs and Class Sensitive Learning 137 5.1 Border Sensitive Learning in Generalized LVQ . . . . . . . . . . . 137 5.1.1 Border Sensitivity by Additive Penalty Function . . . . . . 138 5.1.2 Border Sensitivity by Parameterized Transfer Function . . 139 5.2 Optimizing Different Validation Measures by the Generalized LVQ 147 5.2.1 Attention Based Learning Strategy . . . . . . . . . . . . . . 148 5.2.2 Optimizing Statistical Validation Measurements for Binary Class Problems in the GLVQ . . . . . . . . . . . . . 155 5.3 Integration of Structural Knowledge about the Labeling in Fuzzy Supervised Neural Gas . . . . . . . . . . . . . . . . . . . . . . . . . 160 6 Conclusion and Future Work 165 My Publications 168 A Appendix 173 A.1 Stochastic Gradient Descent (SGD) . . . . . . . . . . . . . . . . . . 173 A.2 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . 175 A.3 Fuzzy Supervised Neural Gas Algorithm Solved by SGD . . . . . 179 Bibliography 182 Acknowledgements 20
    corecore