2,776 research outputs found
XRay: Enhancing the Web's Transparency with Differential Correlation
Today's Web services - such as Google, Amazon, and Facebook - leverage user
data for varied purposes, including personalizing recommendations, targeting
advertisements, and adjusting prices. At present, users have little insight
into how their data is being used. Hence, they cannot make informed choices
about the services they choose. To increase transparency, we developed XRay,
the first fine-grained, robust, and scalable personal data tracking system for
the Web. XRay predicts which data in an arbitrary Web account (such as emails,
searches, or viewed products) is being used to target which outputs (such as
ads, recommended products, or prices). XRay's core functions are service
agnostic and easy to instantiate for new services, and they can track data
within and across services. To make predictions independent of the audited
service, XRay relies on the following insight: by comparing outputs from
different accounts with similar, but not identical, subsets of data, one can
pinpoint targeting through correlation. We show both theoretically, and through
experiments on Gmail, Amazon, and YouTube, that XRay achieves high precision
and recall by correlating data from a surprisingly small number of extra
accounts.Comment: Extended version of a paper presented at the 23rd USENIX Security
Symposium (USENIX Security 14
Modeling Dynamic User Interests: A Neural Matrix Factorization Approach
In recent years, there has been significant interest in understanding users'
online content consumption patterns. But, the unstructured, high-dimensional,
and dynamic nature of such data makes extracting valuable insights challenging.
Here we propose a model that combines the simplicity of matrix factorization
with the flexibility of neural networks to efficiently extract nonlinear
patterns from massive text data collections relevant to consumers' online
consumption patterns. Our model decomposes a user's content consumption journey
into nonlinear user and content factors that are used to model their dynamic
interests. This natural decomposition allows us to summarize each user's
content consumption journey with a dynamic probabilistic weighting over a set
of underlying content attributes. The model is fast to estimate, easy to
interpret and can harness external data sources as an empirical prior. These
advantages make our method well suited to the challenges posed by modern
datasets. We use our model to understand the dynamic news consumption interests
of Boston Globe readers over five years. Thorough qualitative studies,
including a crowdsourced evaluation, highlight our model's ability to
accurately identify nuanced and coherent consumption patterns. These results
are supported by our model's superior and robust predictive performance over
several competitive baseline methods
Gender and Interest Targeting for Sponsored Post Advertising at Tumblr
As one of the leading platforms for creative content, Tumblr offers
advertisers a unique way of creating brand identity. Advertisers can tell their
story through images, animation, text, music, video, and more, and promote that
content by sponsoring it to appear as an advertisement in the streams of Tumblr
users. In this paper we present a framework that enabled one of the key
targeted advertising components for Tumblr, specifically gender and interest
targeting. We describe the main challenges involved in development of the
framework, which include creating the ground truth for training gender
prediction models, as well as mapping Tumblr content to an interest taxonomy.
For purposes of inferring user interests we propose a novel semi-supervised
neural language model for categorization of Tumblr content (i.e., post tags and
post keywords). The model was trained on a large-scale data set consisting of
6.8 billion user posts, with very limited amount of categorized keywords, and
was shown to have superior performance over the bag-of-words model. We
successfully deployed gender and interest targeting capability in Yahoo
production systems, delivering inference for users that cover more than 90% of
daily activities at Tumblr. Online performance results indicate advantages of
the proposed approach, where we observed 20% lift in user engagement with
sponsored posts as compared to untargeted campaigns.Comment: 10 pages, 9 figures, Proceedings of the 21th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (KDD 2015), Sydney,
Australi
Recommended from our members
The Design and Implementation of Low-Latency Prediction Serving Systems
Machine learning is being deployed in a growing number of applications which demand real- time, accurate, and cost-efficient predictions under heavy query load. These applications employ a variety of machine learning frameworks and models, often composing several models within the same application. However, most machine learning frameworks and systems are optimized for model training and not deployment.In this thesis, I discuss three prediction serving systems designed to meet the needs of modern interactive machine learning applications. The key idea in this work is to utilize a decoupled, layered design that interposes systems on top of training frameworks to build low-latency, scalable serving systems. Velox introduced this decoupled architecture to enable fast online learning and model personalization in response to feedback. Clipper generalized this system architecture to be framework-agnostic and introduced a set of optimizations to reduce and bound prediction latency and improve prediction throughput, accuracy, and robustness without modifying the underlying machine learning frameworks. And InferLine provisions and manages the individual stages of prediction pipelines to minimize cost while meeting end-to-end tail latency constraints
- …