44,422 research outputs found
Trustworthy Experimentation Under Telemetry Loss
Failure to accurately measure the outcomes of an experiment can lead to bias
and incorrect conclusions. Online controlled experiments (aka AB tests) are
increasingly being used to make decisions to improve websites as well as mobile
and desktop applications. We argue that loss of telemetry data (during upload
or post-processing) can skew the results of experiments, leading to loss of
statistical power and inaccurate or erroneous conclusions. By systematically
investigating the causes of telemetry loss, we argue that it is not practical
to entirely eliminate it. Consequently, experimentation systems need to be
robust to its effects. Furthermore, we note that it is nontrivial to measure
the absolute level of telemetry loss in an experimentation system. In this
paper, we take a top-down approach towards solving this problem. We motivate
the impact of loss qualitatively using experiments in real applications
deployed at scale, and formalize the problem by presenting a theoretical
breakdown of the bias introduced by loss. Based on this foundation, we present
a general framework for quantitatively evaluating the impact of telemetry loss,
and present two solutions to measure the absolute levels of loss. This
framework is used by well-known applications at Microsoft, with millions of
users and billions of sessions. These general principles can be adopted by any
application to improve the overall trustworthiness of experimentation and
data-driven decision making.Comment: Proceedings of the 27th ACM International Conference on Information
and Knowledge Management, October 201
Online Model Evaluation in a Large-Scale Computational Advertising Platform
Online media provides opportunities for marketers through which they can
deliver effective brand messages to a wide range of audiences. Advertising
technology platforms enable advertisers to reach their target audience by
delivering ad impressions to online users in real time. In order to identify
the best marketing message for a user and to purchase impressions at the right
price, we rely heavily on bid prediction and optimization models. Even though
the bid prediction models are well studied in the literature, the equally
important subject of model evaluation is usually overlooked. Effective and
reliable evaluation of an online bidding model is crucial for making faster
model improvements as well as for utilizing the marketing budgets more
efficiently. In this paper, we present an experimentation framework for bid
prediction models where our focus is on the practical aspects of model
evaluation. Specifically, we outline the unique challenges we encounter in our
platform due to a variety of factors such as heterogeneous goal definitions,
varying budget requirements across different campaigns, high seasonality and
the auction-based environment for inventory purchasing. Then, we introduce
return on investment (ROI) as a unified model performance (i.e., success)
metric and explain its merits over more traditional metrics such as
click-through rate (CTR) or conversion rate (CVR). Most importantly, we discuss
commonly used evaluation and metric summarization approaches in detail and
propose a more accurate method for online evaluation of new experimental models
against the baseline. Our meta-analysis-based approach addresses various
shortcomings of other methods and yields statistically robust conclusions that
allow us to conclude experiments more quickly in a reliable manner. We
demonstrate the effectiveness of our evaluation strategy on real campaign data
through some experiments.Comment: Accepted to ICDM201
Generalized Team Draft Interleaving
Interleaving is an online evaluation method that compares
two ranking functions by mixing their results and interpret-
ing the users' click feedback. An important property of
an interleaving method is its sensitivity, i.e. the ability to
obtain reliable comparison outcomes with few user interac-
tions. Several methods have been proposed so far to im-
prove interleaving sensitivity, which can be roughly divided
into two areas: (a) methods that optimize the credit assign-
ment function (how the click feedback is interpreted), and
(b) methods that achieve higher sensitivity by controlling
the interleaving policy (how often a particular interleaved
result page is shown).
In this paper, we propose an interleaving framework that
generalizes the previously studied interleaving methods in
two aspects. First, it achieves a higher sensitivity by per-
forming a joint data-driven optimization of the credit as-
signment function and the interleaving policy. Second, we
formulate the framework to be general w.r.t. the search do-
main where the interleaving experiment is deployed, so that
it can be applied in domains with grid-based presentation,
such as image search. In order to simplify the optimization,
we additionally introduce a stratifed estimate of the exper-
iment outcome. This stratifcation is also useful on its own,
as it reduces the variance of the outcome and thus increases
the interleaving sensitivity.
We perform an extensive experimental study using large-
scale document and image search datasets obtained from
a commercial search engine. The experiments show that
our proposed framework achieves marked improvements in
sensitivity over efective baselines on both datasets
Continuous phase amplification with a Sagnac interferometer
We describe a weak value inspired phase amplification technique in a Sagnac
interferometer. We monitor the relative phase between two paths of a slightly
misaligned interferometer by measuring the average position of a split-Gaussian
mode in the dark port. Although we monitor only the dark port, we show that the
signal varies linearly with phase and that we can obtain similar sensitivity to
balanced homodyne detection. We derive the source of the amplification both
with classical wave optics and as an inverse weak value.Comment: 5 pages, 4 figures, previously submitted for publicatio
- …