2,776 research outputs found

    XRay: Enhancing the Web's Transparency with Differential Correlation

    Get PDF
    Today's Web services - such as Google, Amazon, and Facebook - leverage user data for varied purposes, including personalizing recommendations, targeting advertisements, and adjusting prices. At present, users have little insight into how their data is being used. Hence, they cannot make informed choices about the services they choose. To increase transparency, we developed XRay, the first fine-grained, robust, and scalable personal data tracking system for the Web. XRay predicts which data in an arbitrary Web account (such as emails, searches, or viewed products) is being used to target which outputs (such as ads, recommended products, or prices). XRay's core functions are service agnostic and easy to instantiate for new services, and they can track data within and across services. To make predictions independent of the audited service, XRay relies on the following insight: by comparing outputs from different accounts with similar, but not identical, subsets of data, one can pinpoint targeting through correlation. We show both theoretically, and through experiments on Gmail, Amazon, and YouTube, that XRay achieves high precision and recall by correlating data from a surprisingly small number of extra accounts.Comment: Extended version of a paper presented at the 23rd USENIX Security Symposium (USENIX Security 14

    Modeling Dynamic User Interests: A Neural Matrix Factorization Approach

    Full text link
    In recent years, there has been significant interest in understanding users' online content consumption patterns. But, the unstructured, high-dimensional, and dynamic nature of such data makes extracting valuable insights challenging. Here we propose a model that combines the simplicity of matrix factorization with the flexibility of neural networks to efficiently extract nonlinear patterns from massive text data collections relevant to consumers' online consumption patterns. Our model decomposes a user's content consumption journey into nonlinear user and content factors that are used to model their dynamic interests. This natural decomposition allows us to summarize each user's content consumption journey with a dynamic probabilistic weighting over a set of underlying content attributes. The model is fast to estimate, easy to interpret and can harness external data sources as an empirical prior. These advantages make our method well suited to the challenges posed by modern datasets. We use our model to understand the dynamic news consumption interests of Boston Globe readers over five years. Thorough qualitative studies, including a crowdsourced evaluation, highlight our model's ability to accurately identify nuanced and coherent consumption patterns. These results are supported by our model's superior and robust predictive performance over several competitive baseline methods

    Gender and Interest Targeting for Sponsored Post Advertising at Tumblr

    Full text link
    As one of the leading platforms for creative content, Tumblr offers advertisers a unique way of creating brand identity. Advertisers can tell their story through images, animation, text, music, video, and more, and promote that content by sponsoring it to appear as an advertisement in the streams of Tumblr users. In this paper we present a framework that enabled one of the key targeted advertising components for Tumblr, specifically gender and interest targeting. We describe the main challenges involved in development of the framework, which include creating the ground truth for training gender prediction models, as well as mapping Tumblr content to an interest taxonomy. For purposes of inferring user interests we propose a novel semi-supervised neural language model for categorization of Tumblr content (i.e., post tags and post keywords). The model was trained on a large-scale data set consisting of 6.8 billion user posts, with very limited amount of categorized keywords, and was shown to have superior performance over the bag-of-words model. We successfully deployed gender and interest targeting capability in Yahoo production systems, delivering inference for users that cover more than 90% of daily activities at Tumblr. Online performance results indicate advantages of the proposed approach, where we observed 20% lift in user engagement with sponsored posts as compared to untargeted campaigns.Comment: 10 pages, 9 figures, Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2015), Sydney, Australi
    • …
    corecore