104 research outputs found
Recommending with an Agenda: Active Learning of Private Attributes using Matrix Factorization
Recommender systems leverage user demographic information, such as age,
gender, etc., to personalize recommendations and better place their targeted
ads. Oftentimes, users do not volunteer this information due to privacy
concerns, or due to a lack of initiative in filling out their online profiles.
We illustrate a new threat in which a recommender learns private attributes of
users who do not voluntarily disclose them. We design both passive and active
attacks that solicit ratings for strategically selected items, and could thus
be used by a recommender system to pursue this hidden agenda. Our methods are
based on a novel usage of Bayesian matrix factorization in an active learning
setting. Evaluations on multiple datasets illustrate that such attacks are
indeed feasible and use significantly fewer rated items than static inference
methods. Importantly, they succeed without sacrificing the quality of
recommendations to users.Comment: This is the extended version of a paper that appeared in ACM RecSys
201
Privacy Tradeoffs in Predictive Analytics
Online services routinely mine user data to predict user preferences, make
recommendations, and place targeted ads. Recent research has demonstrated that
several private user attributes (such as political affiliation, sexual
orientation, and gender) can be inferred from such data. Can a
privacy-conscious user benefit from personalization while simultaneously
protecting her private attributes? We study this question in the context of a
rating prediction service based on matrix factorization. We construct a
protocol of interactions between the service and users that has remarkable
optimality properties: it is privacy-preserving, in that no inference algorithm
can succeed in inferring a user's private attribute with a probability better
than random guessing; it has maximal accuracy, in that no other
privacy-preserving protocol improves rating prediction; and, finally, it
involves a minimal disclosure, as the prediction accuracy strictly decreases
when the service reveals less information. We extensively evaluate our protocol
using several rating datasets, demonstrating that it successfully blocks the
inference of gender, age and political affiliation, while incurring less than
5% decrease in the accuracy of rating prediction.Comment: Extended version of the paper appearing in SIGMETRICS 201
Efficient Online Crowdsourcing with Complex Annotations
Crowdsourcing platforms use various truth discovery algorithms to aggregate
annotations from multiple labelers. In an online setting, however, the main
challenge is to decide whether to ask for more annotations for each item to
efficiently trade off cost (i.e., the number of annotations) for quality of the
aggregated annotations. In this paper, we propose a novel approach for general
complex annotation (such as bounding boxes and taxonomy paths), that works in
an online crowdsourcing setting. We prove that the expected average similarity
of a labeler is linear in their accuracy \emph{conditional on the reported
label}. This enables us to infer reported label accuracy in a broad range of
scenarios. We conduct extensive evaluations on real-world crowdsourcing data
from Meta and show the effectiveness of our proposed online algorithms in
improving the cost-quality trade-off.Comment: full version of a paper accepted to AAAI'2
GraphSC: Parallel secure computation made easy
Abstract-We propose introducing modern parallel programming paradigms to secure computation, enabling their secure execution on large datasets. To address this challenge, we present GraphSC, a framework that (i) provides a programming paradigm that allows non-cryptography experts to write secure code; (ii) brings parallelism to such secure implementations; and (iii) meets the needs for obliviousness, thereby not leaking any private information. Using GraphSC, developers can efficiently implement an oblivious version of graph-based algorithms (including sophisticated data mining and machine learning algorithms) that execute in parallel with minimal communication overhead. Importantly, our secure version of graph-based algorithms incurs a small logarithmic overhead in comparison with the non-secure parallel version. We build GraphSC and demonstrate, using several algorithms as examples, that secure computation can be brought into the realm of practicality for big data analysis. Our secure matrix factorization implementation can process 1 million ratings in 13 hours, which is a multiple order-of-magnitude improvement over the only other existing attempt, which requires 3 hours to process 16K ratings
Understanding Malvertising Through Ad-Injecting Browser Extensions
Malvertising is a malicious activity that leverages advertising to distribute various forms of malware. Because advertising is the key revenue generator for numerous Internet companies, large ad networks, such as Google, Yahoo and Microsoft, invest a lot of effort to mitigate malicious ads from their ad networks. This drives adversaries to look for alternative methods to deploy malvertising. In this paper, we show that browser extensions that use ads as their monetization strategy often facilitate the deployment of malver-tising. Moreover, while some extensions simply serve ads from ad networks that support malvertising, other extensions maliciously alter the content of visited webpages to force users into installing malware. To measure the extent of these behaviors we developed Expector, a system that automatically inspects and identifies browser extensions that inject ads, and then classifies these ads as malicious or benign based on their landing pages. Using Expector, we auto-matically inspected over 18,000 Chrome browser extensions. We found 292 extensions that inject ads, and detected 56 extensions that participate in malvertising using 16 different ad networks and with a total user base of 602,417
- …