4,996 research outputs found

    Off-Policy Evaluation of Probabilistic Identity Data in Lookalike Modeling

    Full text link
    We evaluate the impact of probabilistically-constructed digital identity data collected from Sep. to Dec. 2017 (approx.), in the context of Lookalike-targeted campaigns. The backbone of this study is a large set of probabilistically-constructed "identities", represented as small bags of cookies and mobile ad identifiers with associated metadata, that are likely all owned by the same underlying user. The identity data allows to generate "identity-based", rather than "identifier-based", user models, giving a fuller picture of the interests of the users underlying the identifiers. We employ off-policy techniques to evaluate the potential of identity-powered lookalike models without incurring the risk of allowing untested models to direct large amounts of ad spend or the large cost of performing A/B tests. We add to historical work on off-policy evaluation by noting a significant type of "finite-sample bias" that occurs for studies combining modestly-sized datasets and evaluation metrics involving rare events (e.g., conversions). We illustrate this bias using a simulation study that later informs the handling of inverse propensity weights in our analyses on real data. We demonstrate significant lift in identity-powered lookalikes versus an identity-ignorant baseline: on average ~70% lift in conversion rate. This rises to factors of ~(4-32)x for identifiers having little data themselves, but that can be inferred to belong to users with substantial data to aggregate across identifiers. This implies that identity-powered user modeling is especially important in the context of identifiers having very short lifespans (i.e., frequently churned cookies). Our work motivates and informs the use of probabilistically-constructed identities in marketing. It also deepens the canon of examples in which off-policy learning has been employed to evaluate the complex systems of the internet economy.Comment: Accepted by WSDM 201

    BayesBeat: A Bayesian Deep Learning Approach for Atrial Fibrillation Detection from Noisy Photoplethysmography Data

    Full text link
    The increasing popularity of smartwatches as affordable and longitudinal monitoring devices enables us to capture photoplethysmography (PPG) sensor data for detecting Atrial Fibrillation (AF) in real-time. A significant challenge in AF detection from PPG signals comes from the inherent noise in the smartwatch PPG signals. In this paper, we propose a novel deep learning based approach, BayesBeat that leverages the power of Bayesian deep learning to accurately infer AF risks from noisy PPG signals, and at the same time provide the uncertainty estimate of the prediction. Bayesbeat is efficient, robust, flexible, and highly scalable which makes it particularly suitable for deployment in commercially available wearable devices. Extensive experiments on a recently published large dataset reveal that our proposed method BayesBeat substantially outperforms the existing state-of-the-art methods.Comment: 8 pages, 5 figure

    Hands-on Bayesian Neural Networks -- a Tutorial for Deep Learning Users

    Full text link
    Modern deep learning methods constitute incredibly powerful tools to tackle a myriad of challenging problems. However, since deep learning methods operate as black boxes, the uncertainty associated with their predictions is often challenging to quantify. Bayesian statistics offer a formalism to understand and quantify the uncertainty associated with deep neural network predictions. This tutorial provides an overview of the relevant literature and a complete toolset to design, implement, train, use and evaluate Bayesian Neural Networks, i.e. Stochastic Artificial Neural Networks trained using Bayesian methods.Comment: 35 pages, 15 figure

    Machine learning for acquiring knowledge in astro-particle physics

    Get PDF
    This thesis explores the fundamental aspects of machine learning, which are involved with acquiring knowledge in the research field of astro-particle physics. This research field substantially relies on machine learning methods, which reconstruct the properties of astro-particles from the raw data that specialized telescopes record. These methods are typically trained from resource-intensive simulations, which reflect the existing knowledge about the particles—knowledge that physicists strive to expand. We study three fundamental machine learning tasks, which emerge from this goal. First, we address ordinal quantification, the task of estimating the prevalences of ordered classes in sets of unlabeled data. This task emerges from the need for testing the agreement of astro-physical theories with the class prevalences that a telescope observes. To this end, we unify existing methods on quantification, propose an alternative optimization process, and develop regularization techniques to address ordinality in quantification problems, both in and outside of astro-particle physics. These advancements provide more accurate reconstructions of the energy spectra of cosmic gamma ray sources and, hence, support physicists in drawing conclusions from their telescope data. Second, we address learning under class-conditional label noise. More particularly, we focus on a novel setting, in which one of the class-wise noise rates is known and one is not. This setting emerges from a data acquisition protocol, through which astro-particle telescopes simultaneously observe a region of interest and several background regions. We enable learning under this type of label noise with algorithms for consistent, noise-aware decision thresholding. These algorithms yield binary classifiers, which outperform the existing state-of-the-art in gamma hadron classification with the FACT telescope. Moreover, unlike the state-of-the-art, our classifiers are entirely trained from the real telescope data and thus do not require any resource-intensive simulation. Third, we address active class selection, the task of actively finding those proportions of classes which optimize the classification performance. In astro-particle physics, this task emerges from the simulation, which produces training data in any desired class proportions. We clarify the implications of this setting from two theoretical perspectives, one of which provides us with bounds of the resulting classification performance. We employ these bounds in a certificate of model robustness, which declares a set of class proportions for which the model is accurate with a high probability. We also employ these bounds in an active strategy for class-conditional data acquisition. Our strategy uniquely considers existing uncertainties about those class proportions that have to be handled during the deployment of the classifier, while being theoretically well-justified
    • 

    corecore