Search CORE

69 research outputs found

bsamGP: An R Package for Bayesian Spectral Analysis Models Using Gaussian Process Priors

Author: Choi Taeryon
Jo Seongil
Lenk Peter
Park Beomjo
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 31/07/2019
Field of study

The Bayesian spectral analysis model (BSAM) is a powerful tool to deal with semiparametric methods in regression and density estimation based on the spectral representation of Gaussian process priors. The bsamGP package for R provides a comprehensive set of programs for the implementation of fully Bayesian semiparametric methods based on BSAM. Currently, bsamGP includes semiparametric additive models for regression, generalized models and density estimation. In particular, bsamGP deals with constrained regression models with monotone, convex/concave, S-shaped and U-shaped functions by modeling derivatives of regression functions as squared Gaussian processes. bsamGP also contains Bayesian model selection procedures for testing the adequacy of a parametric model relative to a non-specific semiparametric alternative and the existence of the shape restriction. To maximize computational efficiency, we carry out posterior sampling algorithms of all models using compiled Fortran code. The package is illustrated through Bayesian semiparametric analyses of synthetic data and benchmark data

Journal of Statistical Software

Quality vs. Quantity of Data in Contextual Decision-Making: Exact Analysis under Newsvendor Loss

Author: Besbes Omar
Ma Will
Mouchtaki Omar
Publication venue
Publication date: 16/02/2023
Field of study

When building datasets, one needs to invest time, money and energy to either aggregate more data or to improve their quality. The most common practice favors quantity over quality without necessarily quantifying the trade-off that emerges. In this work, we study data-driven contextual decision-making and the performance implications of quality and quantity of data. We focus on contextual decision-making with a Newsvendor loss. This loss is that of a central capacity planning problem in Operations Research, but also that associated with quantile regression. We consider a model in which outcomes observed in similar contexts have similar distributions and analyze the performance of a classical class of kernel policies which weigh data according to their similarity in a contextual space. We develop a series of results that lead to an exact characterization of the worst-case expected regret of these policies. This exact characterization applies to any sample size and any observed contexts. The model we develop is flexible, and captures the case of partially observed contexts. This exact analysis enables to unveil new structural insights on the learning behavior of uniform kernel methods: i) the specialized analysis leads to very large improvements in quantification of performance compared to state of the art general purpose bounds. ii) we show an important non-monotonicity of the performance as a function of data size not captured by previous bounds; and iii) we show that in some regimes, a little increase in the quality of the data can dramatically reduce the amount of samples required to reach a performance target. All in all, our work demonstrates that it is possible to quantify in a precise fashion the interplay of data quality and quantity, and performance in a central problem class. It also highlights the need for problem specific bounds in order to understand the trade-offs at play

arXiv.org e-Print Archive

Estimation of an Order Book Dependent Hawkes Process for Large Datasets

Author: Mucciante Luca
Sancetta Alessio
Publication venue
Publication date: 18/07/2023
Field of study

A point process for event arrivals in high frequency trading is presented. The intensity is the product of a Hawkes process and high dimensional functions of covariates derived from the order book. Conditions for stationarity of the process are stated. An algorithm is presented to estimate the model even in the presence of billions of data points, possibly mapping covariates into a high dimensional space. The large sample size can be common for high frequency data applications using multiple liquid instruments. Convergence of the algorithm is shown, consistency results under weak conditions is established, and a test statistic to assess out of sample performance of different model specifications is suggested. The methodology is applied to the study of four stocks that trade on the New York Stock Exchange (NYSE). The out of sample testing procedure suggests that capturing the nonlinearity of the order book information adds value to the self exciting nature of high frequency trading events

arXiv.org e-Print Archive