Search CORE

47 research outputs found

Privacy Protection in Data Mining

Author: Li Jinquan
Lin Fu-ren
Shaw Michael
Publication venue: AIS Electronic Library (AISeL)
Publication date: 31/12/2003
Field of study

A Data Perturbation Approach to Privacy Protection in Data Mining

Author: Li Xiao-Bai
Sarkar Sumit
Publication venue: AIS Electronic Library (AISeL)
Publication date: 31/12/2004
Field of study

Advances in data mining techniques have raised growing concerns about privacy of personal information. Organizations that use their customers’ records in data mining activities are forced to take actions to protect the privacy of the individuals involved. A common practice for many organizations today is to remove the identity-reated attributes from customer records before releasing them to data miners or analysts. In this study, we investigate the effect of this practice and demonstrate that a majority of the records in a dataset can be uniquely identified even after identity related attributes are removed. We propose a data perturbation method that can be used by organizations to prevent such unique identification of individual records, while providing the data to analysts for data mining. The proposed method attempts to preserve the statistical properties of the data based on privacy protection parameters specified by the organization. We show that the problem can be solved in two phases, with a linear programming formulation in phase one (to preserve the marginal distribution), followed by a simple Bayes-based swapping procedure in phase two (to preserve the joint distribution). The proposed method is compared with a random perturbation method in classification performance on two real-world datasets. The results of the experiments indicate that it significantly outperforms the random method

AIS Electronic Library (AISeL)

The misty crystal ball: Efficient concealment of privacy-sensitive attributes in predictive analytics

Author: Banholzer Nicolas
Feuerriegel Stefan
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2018
Field of study

Individuals are becoming increasingly concerned with privacy. This curtails their willingness to share sensitive attributes like age, gender or personal preferences; yet firms largely rely upon customer data in any type of predictive analytics. Hence, organizations are confronted with a dilemma in which they need to make a tradeoff between a sparse use of data and the utility from better predictive analytics. This paper proposes a masking mechanism that obscures sensitive attributes while maintaining a large degree of predictive power. More precisely, we efficiently identify data partitions that are best suited for (i) shuffling, (ii) swapping and, as a form of randomization, (iii) perturbing attributes by conditional replacement. By operating on data partitions that are derived from a predictive algorithm, we achieve the objective of masking privacy-sensitive attributes with marginal downsides for predictive modeling. The resulting trade-off between masking and predictive utility is empirically evaluated in the context of customer churn where, for instance, a stratified shuffling of attribute values impedes predictive accuracy rarely by more than a percentage point. Our proposed framework entails direct managerial implications as a growing share of firms adopts predictive analytics and thus requires mechanisms that better adhere to user demands for information privacy

Repository for Publications and Research Data

AIS Electronic Library (AISeL)

Identity Disclosure Protection: A Data Reconstruction Approach for Preserving Privacy in Data Mining

Author: Li Xiao-Bai
Wu Shuning
Zhu Dan
Publication venue: AIS Electronic Library (AISeL)
Publication date: 31/12/2007
Field of study

AIS Electronic Library (AISeL)

User's Privacy in Recommendation Systems Applying Online Social Network Data, A Survey and Taxonomy

Author: Aghasian Erfan
Garg Saurabh
Montgomery James
Publication venue
Publication date: 20/06/2018
Field of study

Recommender systems have become an integral part of many social networks and extract knowledge from a user's personal and sensitive data both explicitly, with the user's knowledge, and implicitly. This trend has created major privacy concerns as users are mostly unaware of what data and how much data is being used and how securely it is used. In this context, several works have been done to address privacy concerns for usage in online social network data and by recommender systems. This paper surveys the main privacy concerns, measurements and privacy-preserving techniques used in large-scale online social networks and recommender systems. It is based on historical works on security, privacy-preserving, statistical modeling, and datasets to provide an overview of the technical difficulties and problems associated with privacy preserving in online social networks.Comment: 26 pages, IET book chapter on big data recommender system

arXiv.org e-Print Archive

Crossref

University of Tasmania Open Access Repository

WHY THEY SELF-DISCLOSE？EXAMINING FACTORS INFLUENCING PEOPLE\u27S PERSONAL INFORMATION DISCLOSURE IN ONLINE HEALTHCARE COMMUNITIES RESEARCH-IN-PROGRESS

Author: Pan Yong
Zhou Junjie
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2014
Field of study

Online healthcare communities (OHCs) encourage people to disclose their personal information with others to seek support and to accelerate research and help create better treatments. However, disclosing personal information might cause privacy disclosure and some risks. This paper aims to explore what factors and how those factors affect people’s personal information disclosure intention in OHCs. Based on “risk-motivation” perspective, we identify perceived usefulness as extrinsic motivation and social support as intrinsic motivation, and distinguish four kinds of risks to test those motivation and risk factors’ effects on people’s personal information disclose intention in OHCs. As two constructs describing the characteristics of OHCs, expected disease severe extent and common identity are supposed having moderating effects’ on motivation and risk factors’ effects. The theoretical contribution of this paper is offering a model to explain people’s personal information disclose intention in OHCs and integrate constructs to describe the characteristic of OHCs; the practical implications is providing insight on OHC managers’ operation for communities’ viability and people’s privacy protection. Finally, limitations and future works also are presented

AIS Electronic Library (AISeL)

Sharing Patient Disease Data with Privacy Preservation

Author: Franklin Patricia
Li Wenjun
Li Xiao-Bai
Liu Xiaoping
Motiwalla Luvai
Zheng Hua
Publication venue: AIS Electronic Library (AISeL)
Publication date: 13/12/2015
Field of study

When patient data are shared for studying a specific disease, a privacy disclosure occurs as long as an individual is known to be in the shared data. Individuals in such specific disease data are thus subject to higher disclosure risk than those in datasets with different diseases. This problem has been overlooked in privacy research and practice. In this study, we analyze disclosure risks for this problem and identify appropriate risk measures. An efficient algorithm is developed for anonymizing the data. An experimental study is conducted to demonstrate the effectiveness of the proposed approach

AIS Electronic Library (AISeL)

Customer-Base Analysis using Repeated Cross-Sectional Summary (RCSS) Data

Author: Fader Peter S
Hardie Bruce G. S
Jerath Kinshuk
Publication venue: ScholarlyCommons
Publication date: 01/02/2016
Field of study

We address a critical question that many firms are facing today: Can customer data be stored and analyzed in an easy-to-manage and scalable manner without significantly compromising the inferences that can be made about the customers’ transaction activity? We address this question in the context of customer-base analysis. A number of researchers have developed customer-base analysis models that perform very well given detailed individual-level data. We explore the possibility of estimating these models using aggregated data summaries alone, namely repeated cross-sectional summaries (RCSS) of the transaction data. Such summaries are easy to create, visualize, and distribute, irrespective of the size of the customer base. An added advantage of the RCSS data structure is that individual customers cannot be identified, which makes it desirable from a data privacy and security viewpoint as well. We focus on the widely used Pareto/NBD model and carry out a comprehensive simulation study covering a vast spectrum of market scenarios. We find that the RCSS format of four quarterly histograms serves as a suitable substitute for individual-level data. We confirm the results of the simulations on a real dataset of purchasing from an online fashion retailer

Crossref

ScholarlyCommons@Penn

London Business School (LBS) Research

Protecting Privacy Against Regression Attacks in Predictive Data Mining

Author: Li Xiao-Bai
Sarkar Sumit
Publication venue: AIS Electronic Library (AISeL)
Publication date: 05/12/2011
Field of study

Regression techniques can be used not only for legitimate data analysis, but also to infer private information about individuals. In this paper, we demonstrate that regression trees, a popular data-mining technique, can be used to effectively reveal individuals\u27 sensitive data. This problem, which we call a regression attack, has been overlooked in the literature. Existing privacy-preserving techniques are not appropriate in coping with this problem. We propose a new approach to counter regression attacks. To protect against privacy disclosure, our approach adopts a novel measure which considers the tradeoff between disclosure risk and data utility in a regression tree pruning process. We also propose a dynamic value-concatenation method, which overcomes the limitation of requiring a user-defined generalization hierarchy in traditional k-anonymity approaches. Our approach can be used for anonymizing both numeric and categorical data. An experimental study is conducted to demonstrate the effectiveness of the proposed approach

AIS Electronic Library (AISeL)