11 research outputs found

    An Identity for Kernel Ridge Regression

    Full text link
    This paper derives an identity connecting the square loss of ridge regression in on-line mode with the loss of the retrospectively best regressor. Some corollaries about the properties of the cumulative loss of on-line ridge regression are also obtained.Comment: 35 pages; extended version of ALT 2010 paper (Proceedings of ALT 2010, LNCS 6331, Springer, 2010

    Editors' Introduction to [Algorithmic Learning Theory: 21st International Conference, ALT 2010, Canberra, Australia, October 6-8, 2010. Proceedings]

    No full text
    Learning theory is an active research area that incorporates ideas, problems, and techniques from a wide range of disciplines including statistics, artificial intelligence, information theory, pattern recognition, and theoretical computer science. The research reported at the 21st International Conference on Algorithmic Learning Theory (ALT 2010) ranges over areas such as query models, online learning, inductive inference, boosting, kernel methods, complexity and learning, reinforcement learning, unsupervised learning, grammatical inference, and algorithmic forecasting. In this introduction we give an overview of the five invited talks and the regular contributions of ALT 2010

    Efficient second-order online kernel learning with adaptive embedding

    Get PDF
    International audienceOnline kernel learning (OKL) is a flexible framework to approach prediction problems, since the large approximation space provided by reproducing kernel Hilbert spaces can contain an accurate function for the problem. Nonetheless, optimizing over this space is computationally expensive. Not only first order methods accumulate O( sqrt T ) more loss than the optimal function, but the curse of kernelization results in a O(t) per step complexity. Second-order methods get closer to the optimum much faster, suffering only O( log(T)) regret, but second-order updates are even more expensive, with a O(t 2) per-step cost. Existing approximate OKL methods try to reduce this complexity either by limiting the Support Vectors (SV) introduced in the predictor, or by avoiding the kernelization process altogether using embedding. Nonetheless, as long as the size of the approximation space or the number of SV does not grow over time, an adversary can always exploit the approximation process. In this paper, we propose PROS-N-KONS, a method that combines Nystrom sketching to project the input point in a small, accurate embedded space, and performs efficient second-order updates in this space. The embedded space is continuously updated to guarantee that the embedding remains accurate, and we show that the per-step cost only grows with the effective dimension of the problem and not with T . Moreover, the second-order updated allows us to achieve the logarithmic regret. We empirically compare our algorithm on recent large-scales benchmarks and show it performs favorably

    Efficient online learning with kernels for adversarial large scale problems

    Get PDF
    We are interested in a framework of online learning with kernels for low-dimensional but large-scale and potentially adversarial datasets. We study the computational and theoretical performance of online variations of kernel Ridge regression. Despite its simplicity, the algorithm we study is the first to achieve the optimal regret for a wide range of kernels with a per-round complexity of order nαn^\alpha with α<2\alpha < 2. The algorithm we consider is based on approximating the kernel with the linear span of basis functions. Our contributions is two-fold: 1) For the Gaussian kernel, we propose to build the basis beforehand (independently of the data) through Taylor expansion. For dd-dimensional inputs, we provide a (close to) optimal regret of order O((logn)d+1)O((\log n)^{d+1}) with per-round time complexity and space complexity O((logn)2d)O((\log n)^{2d}). This makes the algorithm a suitable choice as soon as nedn \gg e^d which is likely to happen in a scenario with small dimensional and large-scale dataset; 2) For general kernels with low effective dimension, the basis functions are updated sequentially in a data-adaptive fashion by sampling Nyström points. In this case, our algorithm improves the computational trade-off known for online kernel regression
    corecore