516 research outputs found

    DiPPS: Differentially Private Propensity Scores for Bias Correction

    Full text link
    In surveys, it is typically up to the individuals to decide if they want to participate or not, which leads to participation bias: the individuals willing to share their data might not be representative of the entire population. Similarly, there are cases where one does not have direct access to any data of the target population and has to resort to publicly available proxy data sampled from a different distribution. In this paper, we present Differentially Private Propensity Scores for Bias Correction (DiPPS), a method for approximating the true data distribution of interest in both of the above settings. We assume that the data analyst has access to a dataset D~\tilde{D} that was sampled from the distribution of interest in a biased way. As individuals may be more willing to share their data when given a privacy guarantee, we further assume that the analyst is allowed locally differentially private access to a set of samples DD from the true, unbiased distribution. Each data point from the private, unbiased dataset DD is mapped to a probability distribution over clusters (learned from the biased dataset D~\tilde{D}), from which a single cluster is sampled via the exponential mechanism and shared with the data analyst. This way, the analyst gathers a distribution over clusters, which they use to compute propensity scores for the points in the biased D~\tilde{D}, which are in turn used to reweight the points in D~\tilde{D} to approximate the true data distribution. It is now possible to compute any function on the resulting reweighted dataset without further access to the private DD. In experiments on datasets from various domains, we show that DiPPS successfully brings the distribution of the available dataset closer to the distribution of interest in terms of Wasserstein distance. We further show that this results in improved estimates for different statistics.Comment: 11 pages, 2 figures. Current version: conference versio

    The eyes know it: FakeET -- An Eye-tracking Database to Understand Deepfake Perception

    Full text link
    We present \textbf{FakeET}-- an eye-tracking database to understand human visual perception of \emph{deepfake} videos. Given that the principal purpose of deepfakes is to deceive human observers, FakeET is designed to understand and evaluate the ease with which viewers can detect synthetic video artifacts. FakeET contains viewing patterns compiled from 40 users via the \emph{Tobii} desktop eye-tracker for 811 videos from the \textit{Google Deepfake} dataset, with a minimum of two viewings per video. Additionally, EEG responses acquired via the \emph{Emotiv} sensor are also available. The compiled data confirms (a) distinct eye movement characteristics for \emph{real} vs \emph{fake} videos; (b) utility of the eye-track saliency maps for spatial forgery localization and detection, and (c) Error Related Negativity (ERN) triggers in the EEG responses, and the ability of the \emph{raw} EEG signal to distinguish between \emph{real} and \emph{fake} videos.Comment: 8 page

    Learning to Measure: Distance Metric Learning with Structured Sparsity

    Get PDF
    Many important machine learning and data mining algorithms rely on a measure to provide a notion of distance or dissimilarity. Naive metrics such as the Euclidean distance are incapable of leveraging task-specific information, and consider all features as equal. A learned distance metric can become much more effective by honing in on structure specific to a task. Additionally, it is often extremely desirable for a metric to be sparse, as this vastly increases the ability to interpret or explain the measures produced by the distance metric. In this dissertation, we explore several current problems in distance metric learning and put forth solutions which make use of structured sparsity. The contributions of this dissertation may be broadly divided into two portions. In the first portion (chapter 2) we begin with a classic approach in distance metric learning and address a scenario where distance metric learning is typically inapplicable, i.e., the case of learning on heterogeneous data in a high-dimensional input space. We construct a projection-free distance metric learning algorithm which utilizes structured sparse updates and successfully demonstrate its application to learn a metric with over a billion parameters. The second portion (chapters 3 & 4) of this dissertation focuses on a new and intriguing regression-based approach to distance metric learning. Under this regression approach there are two sets of parameters to learn; those which parameterize the metric, and those defining the so-called ``virtual points''. We begin with an exploration of the metric parameterization and develop a structured sparse approach to robustify the metric to noisy, corrupted, or irrelevant data. We then focus on the virtual points and develop a new method for learning the metric and constraints together in a simultaneous manner. We demonstrate through empirical means that our approach results in a distance metric which is much more effective than the current state of-the-art. Machine learning algorithms have recently become ingrained in an incredibly diverse amount of technology. The primary focus of this dissertation is to develop more effective techniques to learn a distance metric. We believe that this work has the potential for rather broad-reaching impacts, as learning a more effective metric typically results in more accurate metric-based machine learning algorithms
    • …
    corecore