Search CORE

7,906 research outputs found

Efficient posterior sampling for high-dimensional imbalanced logistic regression

Author: Dunson David
Lu Jianfeng
Sachs Matthias
Sen Deborshee
Publication venue
Publication date: 14/11/2019
Field of study

High-dimensional data are routinely collected in many areas. We are particularly interested in Bayesian classification models in which one or more variables are imbalanced. Current Markov chain Monte Carlo algorithms for posterior computation are inefficient as

n

and/or

p

increase due to worsening time per step and mixing rates. One strategy is to use a gradient-based sampler to improve mixing while using data sub-samples to reduce per-step computational complexity. However, usual sub-sampling breaks down when applied to imbalanced data. Instead, we generalize piece-wise deterministic Markov chain Monte Carlo algorithms to include importance-weighted and mini-batch sub-sampling. These approaches maintain the correct stationary distribution with arbitrarily small sub-samples, and substantially outperform current competitors. We provide theoretical support and illustrate gains in simulated and real data applications.Comment: 4 figure

arXiv.org e-Print Archive

University of Birmingham Research Portal

PubMed Central

Scalable Audience Reach Estimation in Real-time Online Advertising

Author: Dasdan Ali
Foldes Peter
Jalali Ali
Kolay Santanu
Publication venue
Publication date: 13/05/2013
Field of study

Online advertising has been introduced as one of the most efficient methods of advertising throughout the recent years. Yet, advertisers are concerned about the efficiency of their online advertising campaigns and consequently, would like to restrict their ad impressions to certain websites and/or certain groups of audience. These restrictions, known as targeting criteria, limit the reachability for better performance. This trade-off between reachability and performance illustrates a need for a forecasting system that can quickly predict/estimate (with good accuracy) this trade-off. Designing such a system is challenging due to (a) the huge amount of data to process, and, (b) the need for fast and accurate estimates. In this paper, we propose a distributed fault tolerant system that can generate such estimates fast with good accuracy. The main idea is to keep a small representative sample in memory across multiple machines and formulate the forecasting problem as queries against the sample. The key challenge is to find the best strata across the past data, perform multivariate stratified sampling while ensuring fuzzy fall-back to cover the small minorities. Our results show a significant improvement over the uniform and simple stratified sampling strategies which are currently widely used in the industry

arXiv.org e-Print Archive

CiteSeerX

Crossref

Anti-aliasing with stratified B-spline filters of arbitrary degree

Author: Anderson J.
Gamito M. N.
Glassner A. S.
Press W. H.
Stark M.
Yellot J. I.
Publication venue: 'Wiley'
Publication date: 01/06/2006
Field of study

A simple and elegant method is presented to perform anti-aliasing in raytraced images. The method uses stratified sampling to reduce the occurrence of artefacts in an image and features a B-spline filter to compute the final luminous intensity at each pixel. The method is scalable through the specification of the filter degree. A B-spline filter of degree one amounts to a simple anti-aliasing scheme with box filtering. Increasing the degree of the B-spline generates progressively smoother filters. Computation of the filter values is done in a recursive way, as part of a sequence of Newton-Raphson iterations, to obtain the optimal sample positions in screen space. The proposed method can perform both anti-aliasing in space and in time, the latter being more commonly known as motion blur. We show an application of the method to the ray casting of implicit procedural surfaces

Crossref

White Rose Research Online

Subsampling MCMC - An introduction for the survey statistician

Author: Dang Khue-Dung
Kohn Robert
Quiroz Matias
Tran Minh-Ngoc
Villani Mattias
Publication venue
Publication date: 20/09/2018
Field of study

The rapid development of computing power and efficient Markov Chain Monte Carlo (MCMC) simulation algorithms have revolutionized Bayesian statistics, making it a highly practical inference method in applied work. However, MCMC algorithms tend to be computationally demanding, and are particularly slow for large datasets. Data subsampling has recently been suggested as a way to make MCMC methods scalable on massively large data, utilizing efficient sampling schemes and estimators from the survey sampling literature. These developments tend to be unknown by many survey statisticians who traditionally work with non-Bayesian methods, and rarely use MCMC. Our article explains the idea of data subsampling in MCMC by reviewing one strand of work, Subsampling MCMC, a so called pseudo-marginal MCMC approach to speeding up MCMC through data subsampling. The review is written for a survey statistician without previous knowledge of MCMC methods since our aim is to motivate survey sampling experts to contribute to the growing Subsampling MCMC literature.Comment: Accepted for publication in Sankhya A. Previous uploaded version contained a bug in generating the figures and reference

arXiv.org e-Print Archive

OPUS - University of Technology Sydney