18,879 research outputs found

    Random Feature-based Online Multi-kernel Learning in Environments with Unknown Dynamics

    Get PDF
    Kernel-based methods exhibit well-documented performance in various nonlinear learning tasks. Most of them rely on a preselected kernel, whose prudent choice presumes task-specific prior information. Especially when the latter is not available, multi-kernel learning has gained popularity thanks to its flexibility in choosing kernels from a prescribed kernel dictionary. Leveraging the random feature approximation and its recent orthogonality-promoting variant, the present contribution develops a scalable multi-kernel learning scheme (termed Raker) to obtain the sought nonlinear learning function `on the fly,' first for static environments. To further boost performance in dynamic environments, an adaptive multi-kernel learning scheme (termed AdaRaker) is developed. AdaRaker accounts not only for data-driven learning of kernel combination, but also for the unknown dynamics. Performance is analyzed in terms of both static and dynamic regrets. AdaRaker is uniquely capable of tracking nonlinear learning functions in environments with unknown dynamics, and with with analytic performance guarantees. Tests with synthetic and real datasets are carried out to showcase the effectiveness of the novel algorithms.Comment: 36 page

    Learning with SGD and Random Features

    Get PDF
    Sketching and stochastic gradient methods are arguably the most common techniques to derive efficient large scale learning algorithms. In this paper, we investigate their application in the context of nonparametric statistical learning. More precisely, we study the estimator defined by stochastic gradient with mini batches and random features. The latter can be seen as form of nonlinear sketching and used to define approximate kernel methods. The considered estimator is not explicitly penalized/constrained and regularization is implicit. Indeed, our study highlights how different parameters, such as number of features, iterations, step-size and mini-batch size control the learning properties of the solutions. We do this by deriving optimal finite sample bounds, under standard assumptions. The obtained results are corroborated and illustrated by numerical experiments

    A Distributed Frank-Wolfe Algorithm for Communication-Efficient Sparse Learning

    Full text link
    Learning sparse combinations is a frequent theme in machine learning. In this paper, we study its associated optimization problem in the distributed setting where the elements to be combined are not centrally located but spread over a network. We address the key challenges of balancing communication costs and optimization errors. To this end, we propose a distributed Frank-Wolfe (dFW) algorithm. We obtain theoretical guarantees on the optimization error ϵ\epsilon and communication cost that do not depend on the total number of combining elements. We further show that the communication cost of dFW is optimal by deriving a lower-bound on the communication cost required to construct an ϵ\epsilon-approximate solution. We validate our theoretical analysis with empirical studies on synthetic and real-world data, which demonstrate that dFW outperforms both baselines and competing methods. We also study the performance of dFW when the conditions of our analysis are relaxed, and show that dFW is fairly robust.Comment: Extended version of the SIAM Data Mining 2015 pape

    Hashing for Similarity Search: A Survey

    Full text link
    Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since the pioneering work locality sensitive hashing. We divide the hashing algorithms two main categories: locality sensitive hashing, which designs hash functions without exploring the data distribution and learning to hash, which learns hash functions according the data distribution, and review them from various aspects, including hash function design and distance measure and search scheme in the hash coding space

    Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations

    Full text link
    Deep CCA is a recently proposed deep neural network extension to the traditional canonical correlation analysis (CCA), and has been successful for multi-view representation learning in several domains. However, stochastic optimization of the deep CCA objective is not straightforward, because it does not decouple over training examples. Previous optimizers for deep CCA are either batch-based algorithms or stochastic optimization using large minibatches, which can have high memory consumption. In this paper, we tackle the problem of stochastic optimization for deep CCA with small minibatches, based on an iterative solution to the CCA objective, and show that we can achieve as good performance as previous optimizers and thus alleviate the memory requirement.Comment: in 2015 Annual Allerton Conference on Communication, Control and Computin
    • …
    corecore