7,906 research outputs found
Efficient posterior sampling for high-dimensional imbalanced logistic regression
High-dimensional data are routinely collected in many areas. We are
particularly interested in Bayesian classification models in which one or more
variables are imbalanced. Current Markov chain Monte Carlo algorithms for
posterior computation are inefficient as and/or increase due to
worsening time per step and mixing rates. One strategy is to use a
gradient-based sampler to improve mixing while using data sub-samples to reduce
per-step computational complexity. However, usual sub-sampling breaks down when
applied to imbalanced data. Instead, we generalize piece-wise deterministic
Markov chain Monte Carlo algorithms to include importance-weighted and
mini-batch sub-sampling. These approaches maintain the correct stationary
distribution with arbitrarily small sub-samples, and substantially outperform
current competitors. We provide theoretical support and illustrate gains in
simulated and real data applications.Comment: 4 figure
Scalable Audience Reach Estimation in Real-time Online Advertising
Online advertising has been introduced as one of the most efficient methods
of advertising throughout the recent years. Yet, advertisers are concerned
about the efficiency of their online advertising campaigns and consequently,
would like to restrict their ad impressions to certain websites and/or certain
groups of audience. These restrictions, known as targeting criteria, limit the
reachability for better performance. This trade-off between reachability and
performance illustrates a need for a forecasting system that can quickly
predict/estimate (with good accuracy) this trade-off. Designing such a system
is challenging due to (a) the huge amount of data to process, and, (b) the need
for fast and accurate estimates. In this paper, we propose a distributed fault
tolerant system that can generate such estimates fast with good accuracy. The
main idea is to keep a small representative sample in memory across multiple
machines and formulate the forecasting problem as queries against the sample.
The key challenge is to find the best strata across the past data, perform
multivariate stratified sampling while ensuring fuzzy fall-back to cover the
small minorities. Our results show a significant improvement over the uniform
and simple stratified sampling strategies which are currently widely used in
the industry
Anti-aliasing with stratified B-spline filters of arbitrary degree
A simple and elegant method is presented to perform anti-aliasing in raytraced images. The method uses stratified
sampling to reduce the occurrence of artefacts in an image and features a B-spline filter to compute the final
luminous intensity at each pixel. The method is scalable through the specification of the filter degree. A B-spline
filter of degree one amounts to a simple anti-aliasing scheme with box filtering. Increasing the degree of the B-spline generates progressively smoother filters. Computation of the filter values is done in a recursive way, as part of a sequence of Newton-Raphson iterations, to obtain the optimal sample positions in screen space. The proposed method can perform both anti-aliasing in space and in time, the latter being more commonly known as motion blur. We show an application of the method to the ray casting of implicit procedural surfaces
Subsampling MCMC - An introduction for the survey statistician
The rapid development of computing power and efficient Markov Chain Monte
Carlo (MCMC) simulation algorithms have revolutionized Bayesian statistics,
making it a highly practical inference method in applied work. However, MCMC
algorithms tend to be computationally demanding, and are particularly slow for
large datasets. Data subsampling has recently been suggested as a way to make
MCMC methods scalable on massively large data, utilizing efficient sampling
schemes and estimators from the survey sampling literature. These developments
tend to be unknown by many survey statisticians who traditionally work with
non-Bayesian methods, and rarely use MCMC. Our article explains the idea of
data subsampling in MCMC by reviewing one strand of work, Subsampling MCMC, a
so called pseudo-marginal MCMC approach to speeding up MCMC through data
subsampling. The review is written for a survey statistician without previous
knowledge of MCMC methods since our aim is to motivate survey sampling experts
to contribute to the growing Subsampling MCMC literature.Comment: Accepted for publication in Sankhya A. Previous uploaded version
contained a bug in generating the figures and reference
- …