537 research outputs found

    On Recursive Edit Distance Kernels with Application to Time Series Classification

    Get PDF
    This paper proposes some extensions to the work on kernels dedicated to string or time series global alignment based on the aggregation of scores obtained by local alignments. The extensions we propose allow to construct, from classical recursive definition of elastic distances, recursive edit distance (or time-warp) kernels that are positive definite if some sufficient conditions are satisfied. The sufficient conditions we end-up with are original and weaker than those proposed in earlier works, although a recursive regularizing term is required to get the proof of the positive definiteness as a direct consequence of the Haussler's convolution theorem. The classification experiment we conducted on three classical time warp distances (two of which being metrics), using Support Vector Machine classifier, leads to conclude that, when the pairwise distance matrix obtained from the training data is \textit{far} from definiteness, the positive definite recursive elastic kernels outperform in general the distance substituting kernels for the classical elastic distances we have tested.Comment: 14 page

    Optimization with Sparsity-Inducing Penalties

    Get PDF
    Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. They were first dedicated to linear variable selection but numerous extensions have now emerged such as structured sparsity or kernel selection. It turns out that many of the related estimation problems can be cast as convex optimization problems by regularizing the empirical risk with appropriate non-smooth norms. The goal of this paper is to present from a general perspective optimization tools and techniques dedicated to such sparsity-inducing penalties. We cover proximal methods, block-coordinate descent, reweighted â„“2\ell_2-penalized techniques, working-set and homotopy methods, as well as non-convex formulations and extensions, and provide an extensive set of experiments to compare various algorithms from a computational point of view

    Kernel Methods for Surrogate Modeling

    Full text link
    This chapter deals with kernel methods as a special class of techniques for surrogate modeling. Kernel methods have proven to be efficient in machine learning, pattern recognition and signal analysis due to their flexibility, excellent experimental performance and elegant functional analytic background. These data-based techniques provide so called kernel expansions, i.e., linear combinations of kernel functions which are generated from given input-output point samples that may be arbitrarily scattered. In particular, these techniques are meshless, do not require or depend on a grid, hence are less prone to the curse of dimensionality, even for high-dimensional problems. In contrast to projection-based model reduction, we do not necessarily assume a high-dimensional model, but a general function that models input-output behavior within some simulation context. This could be some micro-model in a multiscale-simulation, some submodel in a coupled system, some initialization function for solvers, coefficient function in PDEs, etc. First, kernel surrogates can be useful if the input-output function is expensive to evaluate, e.g. is a result of a finite element simulation. Here, acceleration can be obtained by sparse kernel expansions. Second, if a function is available only via measurements or a few function evaluation samples, kernel approximation techniques can provide function surrogates that allow global evaluation. We present some important kernel approximation techniques, which are kernel interpolation, greedy kernel approximation and support vector regression. Pseudo-code is provided for ease of reproducibility. In order to illustrate the main features, commonalities and differences, we compare these techniques on a real-world application. The experiments clearly indicate the enormous acceleration potentia

    Using Mean Embeddings for State Estimation and Reinforcement Learning

    Get PDF
    To act in complex, high-dimensional environments, autonomous systems require versatile state estimation techniques and compact state representations. State estimation is crucial when the system only has access to stochastic measurements or partial observations. Furthermore, in combination with models of the system such techniques allow to predict the future which enables the system to asses the outcome of possible decisions. Compact state representations alleviate the curse of dimensionality by distilling the important information from high-dimensional observations. Due to noisy sensory information and non-perfect models of the system, estimates of the state never reflect the true state perfectly but are always subject to errors. The natural choice to incorporate the uncertainty about the state estimate is to use a probability distribution as representation. This results in the so called belief state. High-dimensional observations, for example images, often contain much less information than conveyed by their dimensionality. But also if all the information is necessary to describe the state of the system—for example, think of the state of a swarm with the positions of all agents—a less complex description might be a sufficient representation. In such situations, finding the generative distribution that explains the state would give a much more compact while informative representation. Traditionally, parametric distributions have been used as state representations such as most prevalently the Gaussian distribution. However, in many cases a unimodal distribution might not be sufficient to represent the belief state. Using multi-modal probability distributions, instead, requires more advanced approaches such as mixture models or particle-based Monte Carlo methods. Learning mixture models is however not straight-forward and often results in locally optimal solutions. Similarly, maintaining a good population of particles during inference is a complicated and cumbersome process. A third approach is kernel density estimation which is located at the intersection of mixture models and particle-based approaches. Still, performing inference with any of these approaches requires heuristics that lead to poor performance and a limited scalability to higher dimensional spaces. A recent technique that alleviates this problem are the embeddings of probability distributions into reproducing kernel Hilbert spaces (RKHS). Conditional distributions can be embedded as operators based on which a framework for inference has been presented that allows to apply the sum rule, the product rule and Bayes’ rule entirely in Hilbert space. Using sample based estimators and the kernel-trick of the representer theorem allows to represent the operations as vector-matrix manipulations. The contributions of this thesis are based on or inspired by the embeddings of distributions into reproducing kernel Hilbert spaces. In the first part of this thesis, I propose additions to the framework for nonparametric inference that allow the inference operators to scale more gracefully with the number of samples in the training set. The first contribution is an alternative approach to the conditional embedding operator formulated as a least-squares problem i which allows to use only a subset of the data as representation while using the full data set to learn the conditional operator. I call this operator the subspace conditional embedding operator. Inspired by the least-squares derivations of the Kalman filter, I furthermore propose an alternative operator for Bayesian updates in Hilbert space, the kernel Kalman rule. This alternative approach is numerically more robust than the kernel Bayes rule presented in the framework for non-parametric inference and scales better with the number of samples. Based on the kernel Kalman rule, I derive the kernel Kalman filter and the kernel forward-backward smoother to perform state estimation, prediction and smoothing based on Hilbert space embeddings of the belief state. This representation is able to capture multi-modal distributions and inference resolves--due to the kernel trick--into easy matrix manipulations. In the second part of this thesis, I propose a representation for large sets of homogeneous observations. Specifically, I consider the problem of learning a controller for object assembly and object manipulation with a robotic swarm. I assume a swarm of homogeneous robots that are controlled by a common input signal, e.g., the gradient of a light source or a magnetic field. Learning policies for swarms is a challenging problem since the state space grows with the number of agents and becomes quickly very high dimensional. Furthermore, the exact number of agents and the order of the agents in the observation is not important to solve the task. To approach this issue, I propose the swarm kernel which uses a Hilbert space embedding to represent the swarm. Instead of the exact positions of the agents in the swarm, the embedding estimates the generative distribution behind the swarm configuration. The specific agent positions are regarded as samples of this distribution. Since the swarm kernel compares the embeddings of distributions, it can compare swarm configurations with varying numbers of individuals and is invariant to the permutation of the agents. I present a hierarchical approach for solving the object manipulation task where I assume a high-level object assembly policy as given. To learn the low-level object pushing policy, I use the swarm kernel with an actor-critic policy search method. The policies which I learn in simulation can be directly transferred to a real robotic system. In the last part of this thesis, I investigate how we can employ the idea of kernel mean embeddings to deep reinforcement learning. As in the previous part, I consider a variable number of homogeneous observations—such as robot swarms where the number of agents can change. Another example is the representation of 3D structures as point clouds. The number of points in such clouds can vary strongly and the order of the points in a vectorized representation is arbitrary. The common architectures for neural networks have a fixed structure that requires that the dimensionality of inputs and outputs is known in advance. A variable number of inputs can only be processed by applying tricks. To approach this problem, I propose the deep M-embeddings which are inspired by the kernel mean embeddings. The deep M-embeddings provide a network structure to compute a fixed length representation from a variable number of inputs. Additionally, the deep M-embeddings exploit the homogeneous nature of the inputs to reduce the number of parameters in the network and, thus, make the learning easier. Similar to the swarm kernel, the policies learned with the deep M-embeddings can be transferred to different swarm sizes and different number of objects in the environment without further learning

    Convolutional Sparse Kernel Network for Unsupervised Medical Image Analysis

    Full text link
    The availability of large-scale annotated image datasets and recent advances in supervised deep learning methods enable the end-to-end derivation of representative image features that can impact a variety of image analysis problems. Such supervised approaches, however, are difficult to implement in the medical domain where large volumes of labelled data are difficult to obtain due to the complexity of manual annotation and inter- and intra-observer variability in label assignment. We propose a new convolutional sparse kernel network (CSKN), which is a hierarchical unsupervised feature learning framework that addresses the challenge of learning representative visual features in medical image analysis domains where there is a lack of annotated training data. Our framework has three contributions: (i) We extend kernel learning to identify and represent invariant features across image sub-patches in an unsupervised manner. (ii) We initialise our kernel learning with a layer-wise pre-training scheme that leverages the sparsity inherent in medical images to extract initial discriminative features. (iii) We adapt a multi-scale spatial pyramid pooling (SPP) framework to capture subtle geometric differences between learned visual features. We evaluated our framework in medical image retrieval and classification on three public datasets. Our results show that our CSKN had better accuracy when compared to other conventional unsupervised methods and comparable accuracy to methods that used state-of-the-art supervised convolutional neural networks (CNNs). Our findings indicate that our unsupervised CSKN provides an opportunity to leverage unannotated big data in medical imaging repositories.Comment: Accepted by Medical Image Analysis (with a new title 'Convolutional Sparse Kernel Network for Unsupervised Medical Image Analysis'). The manuscript is available from following link (https://doi.org/10.1016/j.media.2019.06.005
    • …
    corecore