21 research outputs found
Anomaly Detection in Streaming Sensor Data
In this chapter we consider a cell phone network as a set of automatically
deployed sensors that records movement and interaction patterns of the
population. We discuss methods for detecting anomalies in the streaming data
produced by the cell phone network. We motivate this discussion by describing
the Wireless Phone Based Emergency Response (WIPER) system, a proof-of-concept
decision support system for emergency response managers. We also discuss some
of the scientific work enabled by this type of sensor data and the related
privacy issues. We describe scientific studies that use the cell phone data set
and steps we have taken to ensure the security of the data. We describe the
overall decision support system and discuss three methods of anomaly detection
that we have applied to the data.Comment: 35 pages. Book chapter to appear in "Intelligent Techniques for
Warehousing and Mining Sensor Network Data" (IGI Global), edited by A.
Cuzzocre
Nonadaptive Mastermind Algorithms for String and Vector Databases, with Case Studies
In this paper, we study sparsity-exploiting Mastermind algorithms for
attacking the privacy of an entire database of character strings or vectors,
such as DNA strings, movie ratings, or social network friendship data. Based on
reductions to nonadaptive group testing, our methods are able to take advantage
of minimal amounts of privacy leakage, such as contained in a single bit that
indicates if two people in a medical database have any common genetic
mutations, or if two people have any common friends in an online social
network. We analyze our Mastermind attack algorithms using theoretical
characterizations that provide sublinear bounds on the number of queries needed
to clone the database, as well as experimental tests on genomic information,
collaborative filtering data, and online social networks. By taking advantage
of the generally sparse nature of these real-world databases and modulating a
parameter that controls query sparsity, we demonstrate that relatively few
nonadaptive queries are needed to recover a large majority of each database
DPWeka: Achieving Differential Privacy in WEKA
Organizations belonging to the government, commercial, and non-profit industries collect and store large amounts of sensitive data, which include medical, financial, and personal information. They use data mining methods to formulate business strategies that yield high long-term and short-term financial benefits. While analyzing such data, the private information of the individuals present in the data must be protected for moral and legal reasons. Current practices such as redacting sensitive attributes, releasing only the aggregate values, and query auditing do not provide sufficient protection against an adversary armed with auxiliary information. In the presence of additional background information, the privacy protection framework, differential privacy, provides mathematical guarantees against adversarial attacks.
Existing platforms for differential privacy employ specific mechanisms for limited applications of data mining. Additionally, widely used data mining tools do not contain differentially private data mining algorithms. As a result, for analyzing sensitive data, the cognizance of differentially private methods is currently limited outside the research community.
This thesis examines various mechanisms to realize differential privacy in practice and investigates methods to integrate them with a popular machine learning toolkit, WEKA. We present DPWeka, a package that provides differential privacy capabilities to WEKA, for practical data mining. DPWeka includes a suite of differential privacy preserving algorithms which support a variety of data mining tasks including attribute selection and regression analysis. It has provisions for users to control privacy and model parameters, such as privacy mechanism, privacy budget, and other algorithm specific variables. We evaluate private algorithms on real-world datasets, such as genetic data and census data, to demonstrate the practical applicability of DPWeka
Privacy-preserving Context-aware Recommender Systems: Analysis and New Solutions
Nowadays, recommender systems have become an indispensable part of our daily life and provide personalized services for almost everything. However, nothing is for free -- such systems have also upset the society with severe privacy concerns because they accumulate a lot of personal information in order to provide recommendations. In this work, we construct privacy-preserving recommendation protocols by incorporating cryptographic techniques and the inherent data characteristics in recommender systems. We first revisit the protocols by Jeckmans et al. at ESORICS 2013 and show a number of security and usability issues. Then, we propose two privacy-preserving protocols, which compute predicted ratings for a user based on inputs from both the user\u27s friends and a set of randomly chosen strangers. A user has the flexibility to retrieve either a predicted rating for an unrated item or the Top-N unrated items. The proposed protocols prevent information leakage from both protocol executions and the protocol outputs: a somewhat homomorphic encryption scheme is used to make all computations run in encrypted form, and inputs from the randomly-chosen strangers guarantee that the inputs of a user\u27s friends will not be compromised even if this user\u27s outputs are leaked. Finally, we use the well-known MovieLens 100k dataset to evaluate the performances for different parameter sizes
Efficient distributed privacy preserving clustering
With recent growing concerns about data privacy, researchers have focused their attention to developing new algorithms to perform privacy preserving data mining. However, methods proposed until now are either very inefficient to deal with large datasets, or compromise privacy with accuracy of data mining results. Secure multiparty computation helps researchers develop privacy preserving data mining algorithms without having to compromise quality of data mining results with data privacy. Also it provides formal guarantees about privacy. On the other hand, algorithms based on secure multiparty computation often rely on computationally expensive cryptographic operations, thus making them infeasible to use in real world scenarios. In this thesis, we study the problem of privacy preserving distributed clustering and propose an efficient and secure algorithm for this problem based on secret sharing and compare it to the state of the art. Experiments show that our algorithm has a lower communication overhead and a much lower computation overhead than the state of the art
Technical Privacy Metrics: a Systematic Survey
The file attached to this record is the author's final peer reviewed versionThe goal of privacy metrics is to measure the degree of privacy enjoyed by users in a system and the amount of protection offered by privacy-enhancing technologies. In this way, privacy metrics contribute to improving user privacy in the digital world. The diversity and complexity of privacy metrics in the literature makes an informed choice of metrics challenging. As a result, instead of using existing metrics, new metrics are proposed frequently, and privacy studies are often incomparable. In this survey we alleviate these problems by structuring the landscape of privacy metrics. To this end, we explain and discuss a selection of over eighty privacy metrics and introduce categorizations based on the aspect of privacy they measure, their required inputs, and the type of data that needs protection. In addition, we present a method on how to choose privacy metrics based on nine questions that help identify the right privacy metrics for a given scenario, and highlight topics where additional work on privacy metrics is needed. Our survey spans multiple privacy domains and can be understood as a general framework for privacy measurement