4,996 research outputs found
Off-Policy Evaluation of Probabilistic Identity Data in Lookalike Modeling
We evaluate the impact of probabilistically-constructed digital identity data
collected from Sep. to Dec. 2017 (approx.), in the context of
Lookalike-targeted campaigns. The backbone of this study is a large set of
probabilistically-constructed "identities", represented as small bags of
cookies and mobile ad identifiers with associated metadata, that are likely all
owned by the same underlying user. The identity data allows to generate
"identity-based", rather than "identifier-based", user models, giving a fuller
picture of the interests of the users underlying the identifiers. We employ
off-policy techniques to evaluate the potential of identity-powered lookalike
models without incurring the risk of allowing untested models to direct large
amounts of ad spend or the large cost of performing A/B tests. We add to
historical work on off-policy evaluation by noting a significant type of
"finite-sample bias" that occurs for studies combining modestly-sized datasets
and evaluation metrics involving rare events (e.g., conversions). We illustrate
this bias using a simulation study that later informs the handling of inverse
propensity weights in our analyses on real data. We demonstrate significant
lift in identity-powered lookalikes versus an identity-ignorant baseline: on
average ~70% lift in conversion rate. This rises to factors of ~(4-32)x for
identifiers having little data themselves, but that can be inferred to belong
to users with substantial data to aggregate across identifiers. This implies
that identity-powered user modeling is especially important in the context of
identifiers having very short lifespans (i.e., frequently churned cookies). Our
work motivates and informs the use of probabilistically-constructed identities
in marketing. It also deepens the canon of examples in which off-policy
learning has been employed to evaluate the complex systems of the internet
economy.Comment: Accepted by WSDM 201
BayesBeat: A Bayesian Deep Learning Approach for Atrial Fibrillation Detection from Noisy Photoplethysmography Data
The increasing popularity of smartwatches as affordable and longitudinal
monitoring devices enables us to capture photoplethysmography (PPG) sensor data
for detecting Atrial Fibrillation (AF) in real-time. A significant challenge in
AF detection from PPG signals comes from the inherent noise in the smartwatch
PPG signals. In this paper, we propose a novel deep learning based approach,
BayesBeat that leverages the power of Bayesian deep learning to accurately
infer AF risks from noisy PPG signals, and at the same time provide the
uncertainty estimate of the prediction. Bayesbeat is efficient, robust,
flexible, and highly scalable which makes it particularly suitable for
deployment in commercially available wearable devices. Extensive experiments on
a recently published large dataset reveal that our proposed method BayesBeat
substantially outperforms the existing state-of-the-art methods.Comment: 8 pages, 5 figure
The Limits of Trust in Economic Transactions - Investigations of Perfect Reputation Systems
nonetrust, reputation systems, eBay
Hands-on Bayesian Neural Networks -- a Tutorial for Deep Learning Users
Modern deep learning methods constitute incredibly powerful tools to tackle a
myriad of challenging problems. However, since deep learning methods operate as
black boxes, the uncertainty associated with their predictions is often
challenging to quantify. Bayesian statistics offer a formalism to understand
and quantify the uncertainty associated with deep neural network predictions.
This tutorial provides an overview of the relevant literature and a complete
toolset to design, implement, train, use and evaluate Bayesian Neural Networks,
i.e. Stochastic Artificial Neural Networks trained using Bayesian methods.Comment: 35 pages, 15 figure
Machine learning for acquiring knowledge in astro-particle physics
This thesis explores the fundamental aspects of machine learning, which are involved with acquiring knowledge in the research field of astro-particle physics. This research field substantially relies on machine learning methods, which reconstruct the properties of astro-particles from the raw data that specialized telescopes record. These methods are typically trained from resource-intensive simulations, which reflect the existing knowledge about the particlesâknowledge that physicists strive to expand. We study three fundamental machine learning tasks, which emerge from this goal.
First, we address ordinal quantification, the task of estimating the prevalences of ordered classes in sets of unlabeled data. This task emerges from the need for testing the agreement of astro-physical theories with the class prevalences that a telescope observes. To this end, we unify existing methods on quantification, propose an alternative optimization process, and develop regularization techniques to address ordinality in quantification problems, both in and outside of astro-particle physics. These advancements provide more accurate reconstructions of the energy spectra of cosmic gamma ray sources and, hence, support physicists in drawing conclusions from their telescope data.
Second, we address learning under class-conditional label noise. More particularly, we focus on a novel setting, in which one of the class-wise noise rates is known and one is not. This setting emerges from a data acquisition protocol, through which astro-particle telescopes simultaneously observe a region of interest and several background regions. We enable learning under this type of label noise with algorithms for consistent, noise-aware decision thresholding. These algorithms yield binary classifiers, which outperform the existing state-of-the-art in gamma hadron classification with the FACT telescope. Moreover, unlike the state-of-the-art, our classifiers are entirely trained from the real telescope data and thus do not require any resource-intensive simulation.
Third, we address active class selection, the task of actively finding those proportions of classes which optimize the classification performance. In astro-particle physics, this task emerges from the simulation, which produces training data in any desired class proportions. We clarify the implications of this setting from two theoretical perspectives, one of which provides us with bounds of the resulting classification performance. We employ these bounds in a certificate of model robustness, which declares a set of class proportions for which the model is accurate with a high probability. We also employ these bounds in an active strategy for class-conditional data acquisition. Our strategy uniquely considers existing uncertainties about those class proportions that have to be handled during the deployment of the classifier, while being theoretically well-justified
- âŠ