11,723 research outputs found
Synthetic sequence generator for recommender systems - memory biased random walk on sequence multilayer network
Personalized recommender systems rely on each user's personal usage data in
the system, in order to assist in decision making. However, privacy policies
protecting users' rights prevent these highly personal data from being publicly
available to a wider researcher audience. In this work, we propose a memory
biased random walk model on multilayer sequence network, as a generator of
synthetic sequential data for recommender systems. We demonstrate the
applicability of the synthetic data in training recommender system models for
cases when privacy policies restrict clickstream publishing.Comment: The new updated version of the pape
Privacy and Confidentiality in an e-Commerce World: Data Mining, Data Warehousing, Matching and Disclosure Limitation
The growing expanse of e-commerce and the widespread availability of online
databases raise many fears regarding loss of privacy and many statistical
challenges. Even with encryption and other nominal forms of protection for
individual databases, we still need to protect against the violation of privacy
through linkages across multiple databases. These issues parallel those that
have arisen and received some attention in the context of homeland security.
Following the events of September 11, 2001, there has been heightened attention
in the United States and elsewhere to the use of multiple government and
private databases for the identification of possible perpetrators of future
attacks, as well as an unprecedented expansion of federal government data
mining activities, many involving databases containing personal information. We
present an overview of some proposals that have surfaced for the search of
multiple databases which supposedly do not compromise possible pledges of
confidentiality to the individuals whose data are included. We also explore
their link to the related literature on privacy-preserving data mining. In
particular, we focus on the matching problem across databases and the concept
of ``selective revelation'' and their confidentiality implications.Comment: Published at http://dx.doi.org/10.1214/088342306000000240 in the
Statistical Science (http://www.imstat.org/sts/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Location Privacy in Spatial Crowdsourcing
Spatial crowdsourcing (SC) is a new platform that engages individuals in
collecting and analyzing environmental, social and other spatiotemporal
information. With SC, requesters outsource their spatiotemporal tasks to a set
of workers, who will perform the tasks by physically traveling to the tasks'
locations. This chapter identifies privacy threats toward both workers and
requesters during the two main phases of spatial crowdsourcing, tasking and
reporting. Tasking is the process of identifying which tasks should be assigned
to which workers. This process is handled by a spatial crowdsourcing server
(SC-server). The latter phase is reporting, in which workers travel to the
tasks' locations, complete the tasks and upload their reports to the SC-server.
The challenge is to enable effective and efficient tasking as well as reporting
in SC without disclosing the actual locations of workers (at least until they
agree to perform a task) and the tasks themselves (at least to workers who are
not assigned to those tasks). This chapter aims to provide an overview of the
state-of-the-art in protecting users' location privacy in spatial
crowdsourcing. We provide a comparative study of a diverse set of solutions in
terms of task publishing modes (push vs. pull), problem focuses (tasking and
reporting), threats (server, requester and worker), and underlying technical
approaches (from pseudonymity, cloaking, and perturbation to exchange-based and
encryption-based techniques). The strengths and drawbacks of the techniques are
highlighted, leading to a discussion of open problems and future work
Measuring and mitigating AS-level adversaries against Tor
The popularity of Tor as an anonymity system has made it a popular target for
a variety of attacks. We focus on traffic correlation attacks, which are no
longer solely in the realm of academic research with recent revelations about
the NSA and GCHQ actively working to implement them in practice.
Our first contribution is an empirical study that allows us to gain a high
fidelity snapshot of the threat of traffic correlation attacks in the wild. We
find that up to 40% of all circuits created by Tor are vulnerable to attacks by
traffic correlation from Autonomous System (AS)-level adversaries, 42% from
colluding AS-level adversaries, and 85% from state-level adversaries. In
addition, we find that in some regions (notably, China and Iran) there exist
many cases where over 95% of all possible circuits are vulnerable to
correlation attacks, emphasizing the need for AS-aware relay-selection.
To mitigate the threat of such attacks, we build Astoria--an AS-aware Tor
client. Astoria leverages recent developments in network measurement to perform
path-prediction and intelligent relay selection. Astoria reduces the number of
vulnerable circuits to 2% against AS-level adversaries, under 5% against
colluding AS-level adversaries, and 25% against state-level adversaries. In
addition, Astoria load balances across the Tor network so as to not overload
any set of relays.Comment: Appearing at NDSS 201
- …