Search CORE

11,288 research outputs found

Supporting User-Defined Functions on Uncertain Data

Author: Adler R. J.
Antova L.
Bishop C. M.
Chaudhuri S.
Dalvi N. N.
Denny M.
Deshpande A.
Gibbs A. L.
Girard A.
Kurose J. F.
McLachlan G.
Nguyen D. T.
O'Hagan A.
Ranganathan A.
Rasmussen C. E.
Sen P.
Singh S.
Szalay A. S.
Tran T.
Tran T. T. L.
Publication venue
Publication date: 01/01/2013
Field of study

Uncertain data management has become crucial in many sensing and scientific applications. As user-defined functions (UDFs) become widely used in these applications, an important task is to capture result uncertainty for queries that evaluate UDFs on uncertain data. In this work, we provide a general framework for supporting UDFs on uncertain data. Specifically, we propose a learning approach based on Gaussian processes (GPs) to compute approximate output distributions of a UDF when evaluated on uncertain input, with guaranteed error bounds. We also devise an online algorithm to compute such output distributions, which employs a suite of optimizations to improve accuracy and performance. Our evaluation using both real-world and synthetic functions shows that our proposed GP approach can outperform the state-of-the-art sampling approach with up to two orders of magnitude improvement for a variety of UDFs. 1

CiteSeerX

Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives

Author: Cichocki A.
Lee N.
Mandic D.
Oseledets I. V.
Phan A-H.
Sugiyama M.
Zhao Q.
Publication venue: 'Now Publishers'
Publication date: 01/01/2017
Field of study

Part 2 of this monograph builds on the introduction to tensor networks and their operations presented in Part 1. It focuses on tensor network models for super-compressed higher-order representation of data/parameters and related cost functions, while providing an outline of their applications in machine learning and data analytics. A particular emphasis is on the tensor train (TT) and Hierarchical Tucker (HT) decompositions, and their physically meaningful interpretations which reflect the scalability of the tensor network approach. Through a graphical approach, we also elucidate how, by virtue of the underlying low-rank tensor approximations and sophisticated contractions of core tensors, tensor networks have the ability to perform distributed computations on otherwise prohibitively large volumes of data/parameters, thereby alleviating or even eliminating the curse of dimensionality. The usefulness of this concept is illustrated over a number of applied areas, including generalized regression and classification (support tensor machines, canonical correlation analysis, higher order partial least squares), generalized eigenvalue decomposition, Riemannian optimization, and in the optimization of deep neural networks. Part 1 and Part 2 of this work can be used either as stand-alone separate texts, or indeed as a conjoint comprehensive review of the exciting field of low-rank tensor networks and tensor decompositions.Comment: 232 page

arXiv.org e-Print Archive

Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives

Author: Cichocki A.
Phan A-H.
Zhao Q.
Lee N.
Oseledets I. V.
Sugiyama M.
Mandic D.
Publication venue
Publication date: 01/01/2017
Field of study

arXiv.org e-Print Archive

FigShare

Large-scale Nonlinear Variable Selection via Kernel Random Features

Author: A Beck
A Rakotomamonjy
B Schölkopf
DX Zhou
F Bach
GI Allen
I Guyon
J Weston
K Muandet
L Rosasco
M Yamada
P Gurram
R Kohavi
S Maldonado
S Mosci
T Hastie
T Hastie
V Bolón-Canedo
V Bolón-Canedo
V Koltchinskii
Y Lin
Publication venue
Publication date: 01/09/2018
Field of study

We propose a new method for input variable selection in nonlinear regression. The method is embedded into a kernel regression machine that can model general nonlinear functions, not being a priori limited to additive models. This is the first kernel-based variable selection method applicable to large datasets. It sidesteps the typical poor scaling properties of kernel methods by mapping the inputs into a relatively low-dimensional space of random features. The algorithm discovers the variables relevant for the regression task together with learning the prediction model through learning the appropriate nonlinear random feature maps. We demonstrate the outstanding performance of our method on a set of large-scale synthetic and real datasets.Comment: Final version for proceedings of ECML/PKDD 201

arXiv.org e-Print Archive

Fast matrix computations for functional additive models

Author: Barthelme Simon
Publication venue
Publication date: 01/01/2014
Field of study

It is common in functional data analysis to look at a set of related functions: a set of learning curves, a set of brain signals, a set of spatial maps, etc. One way to express relatedness is through an additive model, whereby each individual function

g_{i}\left(x\right)

is assumed to be a variation around some shared mean

f(x)

. Gaussian processes provide an elegant way of constructing such additive models, but suffer from computational difficulties arising from the matrix operations that need to be performed. Recently Heersink & Furrer have shown that functional additive model give rise to covariance matrices that have a specific form they called quasi-Kronecker (QK), whose inverses are relatively tractable. We show that under additional assumptions the two-level additive model leads to a class of matrices we call restricted quasi-Kronecker, which enjoy many interesting properties. In particular, we formulate matrix factorisations whose complexity scales only linearly in the number of functions in latent field, an enormous improvement over the cubic scaling of na\"ive approaches. We describe how to leverage the properties of rQK matrices for inference in Latent Gaussian Models

arXiv.org e-Print Archive

CiteSeerX

Hal-Diderot

Regularizing Portfolio Optimization

Author: Acerbi C
Acerbi C Nordio C Sirtori C
Bengio Y
Bertsekas D P
Bordes A
Bottou L
Bouchaud J-Ph
Burda Z
Chopra V K
DeMiguel V
Elton E J
Embrechts P
Frahm G
Frahm G Memmel Ch
Gulyas N Kondor I
Imre Kondor
Jobson J D
Jorion P
Kempf A
Kondor I Varga-Haszonits I
Macrae R
Markowitz H
Morgan J P Reuters Riskmetrics
Perez-Cruz F
Potters M
Rockafellar R T
Schölkopf B
Schölkopf B
Susanne Still
Tibshirani R
Vanderbei R J
Vapnik V
Vapnik V
Vapnik V
Varga-Haszonits I
Publication venue: 'IOP Publishing'
Publication date: 09/11/2009
Field of study

The optimization of large portfolios displays an inherent instability to estimation error. This poses a fundamental problem, because solutions that are not stable under sample fluctuations may look optimal for a given sample, but are, in effect, very far from optimal with respect to the average risk. In this paper, we approach the problem from the point of view of statistical learning theory. The occurrence of the instability is intimately related to over-fitting which can be avoided using known regularization methods. We show how regularized portfolio optimization with the expected shortfall as a risk measure is related to support vector regression. The budget constraint dictates a modification. We present the resulting optimization problem and discuss the solution. The L2 norm of the weight vector is used as a regularizer, which corresponds to a diversification "pressure". This means that diversification, besides counteracting downward fluctuations in some assets by upward fluctuations in others, is also crucial because it improves the stability of the solution. The approach we provide here allows for the simultaneous treatment of optimization and diversification in one framework that enables the investor to trade-off between the two, depending on the size of the available data set

arXiv.org e-Print Archive

ELTE Digital Institutional Repository (EDIT)

Bayesian optimization for materials design

Author: A Booker
A Forrester
AB Gelman
AIJ Forrester
B Ankenman
BE Stuckman
CE Rasmussen
D Huang
D Huang
David Ginsbourger
Diana M. Negoescu
DR Jones
HJ Kushner
J Bect
J Knowles
J Mockus
J Villemonteix
J Xie
LP Kaelbling
Noel Cressie
PI Frazier
PI Frazier
PI Frazier
R Waeber
RA Howard
RS Sutton
Sethuraman Sankaran
TJ Santner
W Scott
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/06/2015
Field of study

We introduce Bayesian optimization, a technique developed for optimizing time-consuming engineering simulations and for fitting machine learning models on large datasets. Bayesian optimization guides the choice of experiments during materials design and discovery to find good material designs in as few experiments as possible. We focus on the case when materials designs are parameterized by a low-dimensional vector. Bayesian optimization is built on a statistical technique called Gaussian process regression, which allows predicting the performance of a new design based on previously tested designs. After providing a detailed introduction to Gaussian process regression, we introduce two Bayesian optimization methods: expected improvement, for design problems with noise-free evaluations; and the knowledge-gradient method, which generalizes expected improvement and may be used in design problems with noisy evaluations. Both methods are derived using a value-of-information analysis, and enjoy one-step Bayes-optimality

arXiv.org e-Print Archive