Search CORE

23,953 research outputs found

Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP)

Author: Nickisch Hannes
Wilson Andrew Gordon
Publication venue
Publication date: 03/03/2015
Field of study

We introduce a new structured kernel interpolation (SKI) framework, which generalises and unifies inducing point methods for scalable Gaussian processes (GPs). SKI methods produce kernel approximations for fast computations through kernel interpolation. The SKI framework clarifies how the quality of an inducing point approach depends on the number of inducing (aka interpolation) points, interpolation strategy, and GP covariance kernel. SKI also provides a mechanism to create new scalable kernel methods, through choosing different kernel interpolation strategies. Using SKI, with local cubic kernel interpolation, we introduce KISS-GP, which is 1) more scalable than inducing point alternatives, 2) naturally enables Kronecker and Toeplitz algebra for substantial additional gains in scalability, without requiring any grid data, and 3) can be used for fast and expressive kernel learning. KISS-GP costs O(n) time and storage for GP inference. We evaluate KISS-GP for kernel matrix approximation, kernel learning, and natural sound modelling.Comment: 19 pages, 4 figure

arXiv.org e-Print Archive

CiteSeerX

Measuring Blood Glucose Concentrations in Photometric Glucometers Requiring Very Small Sample Volumes

Author: Demitri Nevine
Zoubir Abdelhak M.
Publication venue
Publication date: 01/01/2016
Field of study

Glucometers present an important self-monitoring tool for diabetes patients and therefore must exhibit high accu- racy as well as good usability features. Based on an invasive, photometric measurement principle that drastically reduces the volume of the blood sample needed from the patient, we present a framework that is capable of dealing with small blood samples, while maintaining the required accuracy. The framework consists of two major parts: 1) image segmentation; and 2) convergence detection. Step 1) is based on iterative mode-seeking methods to estimate the intensity value of the region of interest. We present several variations of these methods and give theoretical proofs of their convergence. Our approach is able to deal with changes in the number and position of clusters without any prior knowledge. Furthermore, we propose a method based on sparse approximation to decrease the computational load, while maintaining accuracy. Step 2) is achieved by employing temporal tracking and prediction, herewith decreasing the measurement time, and, thus, improving usability. Our framework is validated on several real data sets with different characteristics. We show that we are able to estimate the underlying glucose concentration from much smaller blood samples than is currently state-of-the- art with sufficient accuracy according to the most recent ISO standards and reduce measurement time significantly compared to state-of-the-art methods

arXiv.org e-Print Archive

TUbiblio

Understanding and Comparing Scalable Gaussian Process Regression for Big Data

Author: Cai Jianfei
Liu Haitao
Ong Yew-Soon
Wang Yi
Publication venue
Publication date: 01/01/2018
Field of study

As a non-parametric Bayesian model which produces informative predictive distribution, Gaussian process (GP) has been widely used in various fields, like regression, classification and optimization. The cubic complexity of standard GP however leads to poor scalability, which poses challenges in the era of big data. Hence, various scalable GPs have been developed in the literature in order to improve the scalability while retaining desirable prediction accuracy. This paper devotes to investigating the methodological characteristics and performance of representative global and local scalable GPs including sparse approximations and local aggregations from four main perspectives: scalability, capability, controllability and robustness. The numerical experiments on two toy examples and five real-world datasets with up to 250K points offer the following findings. In terms of scalability, most of the scalable GPs own a time complexity that is linear to the training size. In terms of capability, the sparse approximations capture the long-term spatial correlations, the local aggregations capture the local patterns but suffer from over-fitting in some scenarios. In terms of controllability, we could improve the performance of sparse approximations by simply increasing the inducing size. But this is not the case for local aggregations. In terms of robustness, local aggregations are robust to various initializations of hyperparameters due to the local attention mechanism. Finally, we highlight that the proper hybrid of global and local scalable GPs may be a promising way to improve both the model capability and scalability for big data.Comment: 25 pages, 15 figures, preprint submitted to KB

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Scalable Sparse Cox's Regression for Large-Scale Survival Data via Broken Adaptive Ridge

Author: Kawaguchi Eric S.
Li Gang
Liu Zhenqiu
Suchard Marc A.
Publication venue: 'Wiley'
Publication date: 25/07/2018
Field of study

This paper develops a new scalable sparse Cox regression tool for sparse high-dimensional massive sample size (sHDMSS) survival data. The method is a local

L_0

-penalized Cox regression via repeatedly performing reweighted

L_2

-penalized Cox regression. We show that the resulting estimator enjoys the best of

L_0

- and

L_2

-penalized Cox regressions while overcoming their limitations. Specifically, the estimator is selection consistent, oracle for parameter estimation, and possesses a grouping property for highly correlated covariates. Simulation results suggest that when the sample size is large, the proposed method with pre-specified tuning parameters has a comparable or better performance than some popular penalized regression methods. More importantly, because the method naturally enables adaptation of efficient algorithms for massive

L_2

-penalized optimization and does not require costly data driven tuning parameter selection, it has a significant computational advantage for sHDMSS data, offering an average of 5-fold speedup over its closest competitor in empirical studies

arXiv.org e-Print Archive

eScholarship - University of California