138 research outputs found
Recommending with an Agenda: Active Learning of Private Attributes using Matrix Factorization
Recommender systems leverage user demographic information, such as age,
gender, etc., to personalize recommendations and better place their targeted
ads. Oftentimes, users do not volunteer this information due to privacy
concerns, or due to a lack of initiative in filling out their online profiles.
We illustrate a new threat in which a recommender learns private attributes of
users who do not voluntarily disclose them. We design both passive and active
attacks that solicit ratings for strategically selected items, and could thus
be used by a recommender system to pursue this hidden agenda. Our methods are
based on a novel usage of Bayesian matrix factorization in an active learning
setting. Evaluations on multiple datasets illustrate that such attacks are
indeed feasible and use significantly fewer rated items than static inference
methods. Importantly, they succeed without sacrificing the quality of
recommendations to users.Comment: This is the extended version of a paper that appeared in ACM RecSys
201
ShutUp: End-to-End Containment of Unwanted Traffic
While the majority of Denial-of-Service (DoS) defense proposals
assume a purely infrastructure-based architecture, some recent
proposals suggest that the attacking endhost may be enlisted
as part of the solution, through tamper-proof software,
network-imposed incentives, or user altruism. While intriguing,
these proposals ultimately raise the deployment bar by requiring
both the infrastructure and endhosts to cooperate. In this
paper, we explore the design of a pure end-to-end architecture
based on tamper-proof endhost software implemented for instance
with trusted platforms and virtual machines. We present
the design of a ?Shutup Service?, whereby the recipient of unwanted
traffic can ask the sender to slowdown or stop. We show
that this service is effective in stopping DoS attacks, and in significantly
slowing down other types of unwanted traffic such as
worms. The Shutup service is incrementally deployable with
buy-in from OS or antivirus vendors, requiring only minimal
changes to the endhost software stack and no changes to the protocol
stack. We show through experimentation that the service
is effective and has little impact on legitimate traffic
Private Decayed Sum Estimation under Continual Observation
In monitoring applications, recent data is more important than distant data.
How does this affect privacy of data analysis? We study a general class of data
analyses - computing predicate sums - with privacy. Formally, we study the
problem of estimating predicate sums {\em privately}, for sliding windows (and
other well-known decay models of data, i.e. exponential and polynomial decay).
We extend the recently proposed continual privacy model of Dwork et al.
We present algorithms for decayed sum which are \eps-differentially
private, and are accurate. For window and exponential decay sums, our
algorithms are accurate up to additive 1/\eps and polylog terms in the range
of the computed function; for polynomial decay sums which are technically more
challenging because partial solutions do not compose easily, our algorithms
incur additional relative error. Further, we show lower bounds, tight within
polylog factors and tight with respect to the dependence on the probability of
error
Privacy Tradeoffs in Predictive Analytics
Online services routinely mine user data to predict user preferences, make
recommendations, and place targeted ads. Recent research has demonstrated that
several private user attributes (such as political affiliation, sexual
orientation, and gender) can be inferred from such data. Can a
privacy-conscious user benefit from personalization while simultaneously
protecting her private attributes? We study this question in the context of a
rating prediction service based on matrix factorization. We construct a
protocol of interactions between the service and users that has remarkable
optimality properties: it is privacy-preserving, in that no inference algorithm
can succeed in inferring a user's private attribute with a probability better
than random guessing; it has maximal accuracy, in that no other
privacy-preserving protocol improves rating prediction; and, finally, it
involves a minimal disclosure, as the prediction accuracy strictly decreases
when the service reveals less information. We extensively evaluate our protocol
using several rating datasets, demonstrating that it successfully blocks the
inference of gender, age and political affiliation, while incurring less than
5% decrease in the accuracy of rating prediction.Comment: Extended version of the paper appearing in SIGMETRICS 201
Analysis of OD Flows (Raw Data)
In a recent paper, Structural Analysis of Network Traffic Flows, we analyzed the set of Origin Destination traffic flows from the Sprint-Europe and Abilene backbone networks. This report presents the complete set of results from analyzing data from both networks. The results in this report are specific to the Sprint-1 and Abilene datasets studied in the above paper. The following results are presented here:
1 Rows of Principal Matrix (V) 2
1.1 Sprint-1 Dataset ................................ 2
1.2 Abilene Dataset.................................. 9
2 Set of Eigenflows 14
2.1 Sprint-1 Dataset.................................. 14
2.2 Abilene Dataset................................... 21
3 Classifying Eigenflows 26
3.1 Sprint-1 Dataset.................................. 26
3.2 Abilene Datase.................................... 44Centre National de la Recherche Scientifique (CNRS) France; Sprint Labs; Office of Naval Research (N000140310043); National Science Foundation (ANI-9986397, CCR-0325701
Impact of IT Monoculture on Behavioral End Host Intrusion Detection
International audienceIn this paper, we study the impact of today's IT policies, defined based upon a monoculture approach, on the performance of endhost anomaly detectors. This approach leads to the uniform configuration of Host intrusion detection systems (HIDS) across all hosts in an enterprise networks. We assess the performance impact this policy has from the individual's point of view by analyzing network traces collected from 350 enterprise users. We uncover a great deal of diversity in the user population in terms of the “tail†behavior, i.e., the component which matters for anomaly detection systems. We demonstrate that the monoculture approach to HIDS configuration results in users that experience wildly different false positive and false negatives rates. We then introduce new policies, based upon leveraging this diversity and show that not only do they dramatically improve performance for the vast majority of users, but they also reduce the number of false positives arriving in centralized IT operation centers, and can reduce attack strength
Performance of Networked Applications: The Challenges in Capturing the User's Perception
International audienceThere is much interest recently in doing automated performance diagnosis on user laptops or desktops. One interesting aspect of performance diagnosis that has received little attention is the user perspective on performance. To conduct research on both end-host performance diagnosis and user perception of network and application performance, we designed an end-host data collection tool, called HostView. HostView not only collects network, application and machine level data, but also gathers feedback directly from users. User feedback is obtained via two mechanisms, a system-triggered questionnaire and a user-triggered feedback form, that for example asks users to rate the performance of their network and applications. In this paper, we describe our experience with the first deployment of HostView. Using data from 40 users, we illustrate the diversity of our users, articulate the challenges in this line of research, and report on initial findings in correlating user data to system-level data
HostView: Annotating end-host performance measurements with user feedback
International audienceNetwork disruptions can adversely impact a users' web browsing, cause video/audio interruptions, or render web sites and services unreachable. Such problems are frustrating to Internet users, who are oblivious to the underlying problems, but completely exposed to the service degrada- tions. Ideally users' end systems would have diagnostic tools that can automatically detect, diagnose and possibly repair, performance degradations. Hopefully, this can be done without user intervention. Clearly, the first step for any such (end-host) diagnostic tool is a methodology to automatically detect performance degradations in the network that can affect a user's perception of application performance
- …