17,276 research outputs found
Synthetic sequence generator for recommender systems - memory biased random walk on sequence multilayer network
Personalized recommender systems rely on each user's personal usage data in
the system, in order to assist in decision making. However, privacy policies
protecting users' rights prevent these highly personal data from being publicly
available to a wider researcher audience. In this work, we propose a memory
biased random walk model on multilayer sequence network, as a generator of
synthetic sequential data for recommender systems. We demonstrate the
applicability of the synthetic data in training recommender system models for
cases when privacy policies restrict clickstream publishing.Comment: The new updated version of the pape
Knowing Your Population: Privacy-Sensitive Mining of Massive Data
Location and mobility patterns of individuals are important to environmental
planning, societal resilience, public health, and a host of commercial
applications. Mining telecommunication traffic and transactions data for such
purposes is controversial, in particular raising issues of privacy. However,
our hypothesis is that privacy-sensitive uses are possible and often beneficial
enough to warrant considerable research and development efforts. Our work
contends that peoples behavior can yield patterns of both significant
commercial, and research, value. For such purposes, methods and algorithms for
mining telecommunication data to extract commonly used routes and locations,
articulated through time-geographical constructs, are described in a case study
within the area of transportation planning and analysis. From the outset, these
were designed to balance the privacy of subscribers and the added value of
mobility patterns derived from their mobile communication traffic and
transactions data. Our work directly contrasts the current, commonly held
notion that value can only be added to services by directly monitoring the
behavior of individuals, such as in current attempts at location-based
services. We position our work within relevant legal frameworks for privacy and
data protection, and show that our methods comply with such requirements and
also follow best-practice
All liaisons are dangerous when all your friends are known to us
Online Social Networks (OSNs) are used by millions of users worldwide.
Academically speaking, there is little doubt about the usefulness of
demographic studies conducted on OSNs and, hence, methods to label unknown
users from small labeled samples are very useful. However, from the general
public point of view, this can be a serious privacy concern. Thus, both topics
are tackled in this paper: First, a new algorithm to perform user profiling in
social networks is described, and its performance is reported and discussed.
Secondly, the experiments --conducted on information usually considered
sensitive-- reveal that by just publicizing one's contacts privacy is at risk
and, thus, measures to minimize privacy leaks due to social graph data mining
are outlined.Comment: 10 pages, 5 table
Injecting Uncertainty in Graphs for Identity Obfuscation
Data collected nowadays by social-networking applications create fascinating
opportunities for building novel services, as well as expanding our
understanding about social structures and their dynamics. Unfortunately,
publishing social-network graphs is considered an ill-advised practice due to
privacy concerns. To alleviate this problem, several anonymization methods have
been proposed, aiming at reducing the risk of a privacy breach on the published
data, while still allowing to analyze them and draw relevant conclusions. In
this paper we introduce a new anonymization approach that is based on injecting
uncertainty in social graphs and publishing the resulting uncertain graphs.
While existing approaches obfuscate graph data by adding or removing edges
entirely, we propose using a finer-grained perturbation that adds or removes
edges partially: this way we can achieve the same desired level of obfuscation
with smaller changes in the data, thus maintaining higher utility. Our
experiments on real-world networks confirm that at the same level of identity
obfuscation our method provides higher usefulness than existing randomized
methods that publish standard graphs.Comment: VLDB201
Privacy-Preserving Data Publishing in the Cloud: A Multi-level Utility Controlled Approach
Conventional private data publication schemes are targeted at publication of sensitive datasets with the objective of retaining as much utility as possible for statistical (aggregate) queries while ensuring the privacy of individuals' information. However, such an approach to data publishing is no longer applicable in shared multi-tenant cloud scenarios where users often have different levels of access to the same data. In this paper, we present a privacy-preserving data publishing framework for publishing large datasets with the goals of providing different levels of utility to the users based on their access privileges. We design and implement our proposed multi-level utility-controlled data anonymization schemes in the context of large association graphs considering three levels of user utility namely: (i) users having access to only the graph structure (ii) users having access to graph structure and aggregate query results and (iii) users having access to graph structure, aggregate query results as well as individual associations. Our experiments on real large association graphs show that the proposed techniques are effective, scalable and yield the required level of privacy and utility for user-specific utility and access privilege levels
Big Data and the Internet of Things
Advances in sensing and computing capabilities are making it possible to
embed increasing computing power in small devices. This has enabled the sensing
devices not just to passively capture data at very high resolution but also to
take sophisticated actions in response. Combined with advances in
communication, this is resulting in an ecosystem of highly interconnected
devices referred to as the Internet of Things - IoT. In conjunction, the
advances in machine learning have allowed building models on this ever
increasing amounts of data. Consequently, devices all the way from heavy assets
such as aircraft engines to wearables such as health monitors can all now not
only generate massive amounts of data but can draw back on aggregate analytics
to "improve" their performance over time. Big data analytics has been
identified as a key enabler for the IoT. In this chapter, we discuss various
avenues of the IoT where big data analytics either is already making a
significant impact or is on the cusp of doing so. We also discuss social
implications and areas of concern.Comment: 33 pages. draft of upcoming book chapter in Japkowicz and Stefanowski
(eds.) Big Data Analysis: New algorithms for a new society, Springer Series
on Studies in Big Data, to appea
- …