180 research outputs found
Exploration vs. Exploitation in the Information Filtering Problem
We consider information filtering, in which we face a stream of items too
voluminous to process by hand (e.g., scientific articles, blog posts, emails),
and must rely on a computer system to automatically filter out irrelevant
items. Such systems face the exploration vs. exploitation tradeoff, in which it
may be beneficial to present an item despite a low probability of relevance,
just to learn about future items with similar content. We present a Bayesian
sequential decision-making model of this problem, show how it may be solved to
optimality using a decomposition to a collection of two-armed bandit problems,
and show structural results for the optimal policy. We show that the resulting
method is especially useful when facing the cold start problem, i.e., when
filtering items for new users without a long history of past interactions. We
then present an application of this information filtering method to a
historical dataset from the arXiv.org repository of scientific articles.Comment: 36 pages, 5 figure
Asymptotic Validity of the Bayes-Inspired Indifference Zone Procedure: The Non-Normal Known Variance Case
We consider the indifference-zone (IZ) formulation of the ranking and
selection problem in which the goal is to choose an alternative with the
largest mean with guaranteed probability, as long as the difference between
this mean and the second largest exceeds a threshold. Conservatism leads
classical IZ procedures to take too many samples in problems with many
alternatives. The Bayes-inspired Indifference Zone (BIZ) procedure, proposed in
Frazier (2014), is less conservative than previous procedures, but its proof of
validity requires strong assumptions, specifically that samples are normal, and
variances are known with an integer multiple structure. In this paper, we show
asymptotic validity of a slight modification of the original BIZ procedure as
the difference between the best alternative and the second best goes to
zero,when the variances are known and finite, and samples are independent and
identically distributed, but not necessarily normal
Distance Dependent Chinese Restaurant Processes
We develop the distance dependent Chinese restaurant process (CRP), a
flexible class of distributions over partitions that allows for
non-exchangeability. This class can be used to model many kinds of dependencies
between data in infinite clustering models, including dependencies across time
or space. We examine the properties of the distance dependent CRP, discuss its
connections to Bayesian nonparametric mixture models, and derive a Gibbs
sampler for both observed and mixture settings. We study its performance with
three text corpora. We show that relaxing the assumption of exchangeability
with distance dependent CRPs can provide a better fit to sequential data. We
also show its alternative formulation of the traditional CRP leads to a
faster-mixing Gibbs sampling algorithm than the one based on the original
formulation
Hierarchical Knowledge-Gradient for Sequential Sampling
We consider the problem of selecting the best of a finite but very large set of alternatives. Each alternative may be characterized by a multi-dimensional vector and has independent normal rewards. This problem arises in various settings such as (i) ranking and selection, (ii) simulation optimization where the unknown mean of each alternative is estimated with stochastic simulation output, and (iii) approximate dynamic programming where we need to estimate values based on Monte-Carlo simulation. We use a Bayesian probability model for the unknown reward of each alternative and follow a fully sequential sampling policy called the knowledge-gradient policy. This policy myopically optimizes the expected increment in the value of sampling information in each time period. Because the number of alternatives is large, we propose a hierarchical aggregation technique that uses the common features shared by alternatives to learn about many alternatives from even a single measurement, thus greatly reducing the measurement effort required. We demonstrate how this hierarchical knowledge-gradient policy can be applied to efficiently maximize a continuous function and prove that this policy finds a globally optimal alternative in the limit
- …