10 research outputs found

    Exploring Privacy-Preserving Methods via Perturbation Data Mining Employing Diverse Noise Strategies

    Get PDF
    Knowledge discovery from data, commonly referred to as data mining. it involves the extraction of significant information, which may be previously unknown, concealed, or relevant, from extensive data sets or databases through the utilization of statistical methodologies. With the introduction of enhanced hardware technologies, there has been a proliferation in the storage and recording of personal data pertaining to individuals. Sophisticated organizations employ data mining algorithms to uncover hidden patterns or insights within data. Data mining techniques find application in diverse fields such as marketing, medical diagnosis, forecasting system, and national security. However, in scenarios where data privacy is paramount, mining certain types of data without violating the privacy of data owners presents a formidable challenge, sparking growing concerns among privacy advocates. To address these concerns, it is imperative to advance data mining procedures that are complex to individual privacy considerations. Perturbation of data plays a pivotal role in Privacy-Preserving Data Mining (PPDM). Additive data safeguard data privacy. In contrast, multiplicative data perturbation involves a series of transformations, including rotation, translation, and the addition of noise components to the perturbed data copy

    Hybrid microaggregation for privacy preserving data mining

    Get PDF
    k-Anonymity by microaggregation is one of the most commonly used anonymization techniques. This success is owe to the achievement of a worth of interest trade-off between information loss and identity disclosure risk. However, this method may have some drawbacks. On the disclosure limitation side, there is a lack of protection against attribute disclosure. On the data utility side, dealing with a real datasets is a challenging task to achieve. Indeed, the latter are characterized by their large number of attributes and the presence of noisy data, such that outliers or, even, data with missing values. Generating an anonymous individual data useful for data mining tasks, while decreasing the influence of noisy data is a compelling task to achieve. In this paper, we introduce a new microaggregation method, called HM-pfsom, based on fuzzy possibilistic clustering. Our proposed method operates through an hybrid manner. This means that the anonymization process is applied per block of similar data. Thus, we can help to decrease the information loss during the anonymization process. The HM-pfsom approach proposes to study the distribution of confidential attributes within each sub-dataset. Then, according to the latter distribution, the privacy parameter k is determined, in such a way to preserve the diversity of confidential attributes within the anonymized microdata. This allows to decrease the disclosure risk of confidential information

    Privacy by Design in Distributed Mobility Data

    Get PDF
    Movement data are sensitive, because people’s whereabouts may allow re- identification of individuals in a de-identified database and thus can poten- tially reveal intimate personal traits, such as religious or sexual preferences. In this thesis, we focus on a distributed setting in which movement data from individual vehicles are collected and aggregated by a centralized station. We propose a novel approach to privacy-preserving analytical processing within such a distributed setting, and tackle the problem of obtaining aggregated traffic information while preventing privacy leakage from data collection and aggregation. We study and analyze three different solutions based on the differential privacy model and on sketching techniques for efficient data compression. Each solution achieves different a trade-off between privacy protection and utility of the transformed data. Using real-life data, we demonstrate the effectiveness of our approaches in terms of data utility preserved by the data transformation, thus bringing empirical evidence to the fact that the privacy-by-design paradigm in big data analysis has the potential of delivering high data protection combined with high quality even in massively distributed techno-social systems

    Non-Metric Multi-Dimensional Scaling for Distance-Based Privacy-Preserving Data Mining

    Get PDF
    Recent advances in the field of data mining have led to major concerns about privacy. Sharing data with external parties for analysis puts private information at risk. The original data are often perturbed before external release to protect private information. However, data perturbation can decrease the utility of the output. A good perturbation technique requires balance between privacy and utility. This study proposes a new method for data perturbation in the context of distance-based data mining. We propose the use of non-metric multi-dimensional scaling (MDS) as a suitable technique to perturb data that are intended for distance-based data mining. The basic premise of this approach is to transform the original data into a lower dimensional space and generate new data that protect private details while maintaining good utility for distance-based data mining analysis. We investigate the extent the perturbed data are able to preserve useful statistics for distance-based analysis and to provide protection against malicious attacks. We demonstrate that our method provides an adequate alternative to data randomisation approaches and other dimensionality reduction approaches. Testing is conducted on a wide range of benchmarked datasets and against some existing perturbation methods. The results confirm that our method has very good overall performance, is competitive with other techniques, and produces clustering and classification results at least as good, and in some cases better, than the results obtained from the original data

    Enforcing privacy via access control and data perturbation.

    Get PDF
    With the increasing availability of large collections of personal and sensitive information to a wide range of user communities, services should take more responsibility for data privacy when disseminating information, which requires data sharing control. In most cases, data are stored in a repository at the site of the domain server, which takes full responsibility for their management. The data can be provided to known recipients, or published without restriction on recipients. To ensure that such data is used without breaching privacy, proper access control models and privacy protection methods are needed. This thesis presents an approach to protect personal and sensitive information that is stored on one or more data servers. There are three main privacy requirements that need to be considered when designing a system for privacy-preserving data access. The first requirement is privacy-aware access control. In traditional privacy-aware contexts, built-in conditions or granular access control are used to assign user privileges at a fine-grained level. Very frequently, users and their privileges are diverse. Hence, it is necessary to deploy proper access control on both subject and object servers that impose the conditions on carrying out user operations. This thesis defines a dual privacy-aware access control model, consisting of a subject server that manages user privileges and an object server that deals with granular data. Both servers extract user operations and server conditions from the original requests and convert them to privacy labels that contain access control attributes. In cross-domain cases, traditional solutions adopt roaming tables to support multiple-domain access. However, building roaming tables for all domains is costly and maintaining these tables can become an issue. Furthermore, when roaming occurs, the party responsible for multi-domain data management has to be clearly identified. In this thesis, a roaming adjustment mechanism is presented for both subject and object servers. By defining such a dual server control model and request process flow, the responsibility for data administration can be properly managed. The second requirement is the consideration of access purpose, namely why the subject requests access to the object and how the subject is going to use the object. The existing solutions overlook the different interpretations of purposes in distinct domains. This thesis proposes a privilege-oriented, purpose-based method that enhances the privacy-aware access control model mentioned in the previous paragraph. It includes a component that interprets the subject's intention and the conditions imposed by the servers on operations; and a component that caters for object types and object owner's intention. The third requirement is maintaining data utility while protecting privacy when data are shared without restriction on recipients. Most existing approaches achieve a high level of privacy at the expense of data usability. To the best of our knowledge, there is no solution that is able to keep both. This thesis combines data privacy protection with data utility by building a framework that defines a privacy protection process flow. It also includes two data privacy protection algorithms that are based on Chebyshev polynomials and fractal sequences, respectively. Experiments show that the both algorithms are resistant to two main data privacy attacks, but with little loss of accuracy

    A Survey of Multiplicative Perturbation for Privacy-Preserving Data Mining

    No full text
    The major challenge of data perturbation is to achieve the desired balance between the level of privacy guarantee and the level of data utility. Data privacy and data utility are commonly considered as a pair of conflicting requirements in privacy-preserving data mining systems and applications. Multiplicative perturbation algorithms aim at improving data privacy while maintaining the desired level of data utility by selectively preserving the mining task and model specific information during the data perturbation process. By preserving the task and model specific information, a set of “transformation-invariant data mining models” can be applied to the perturbed data directly, achieving the required model accuracy. Often a multiplicative perturbation algorithm may find multiple data transformations that preserve the required data utility. Thus the next major challenge is to find a good transformation that provides a satisfactory level of privacy guarantee. In this chapter, we review three representative multiplicative perturbation methods: rotation perturbation, projection perturbation, and geometric perturbation, and discuss the technical issues and research challenges. We first describe the mining task and model specific information for a class of data mining models, and the transformations that can (approximately) preserve the information. Then we discuss the design of appropriate privacy evaluation models for multiplicative perturbations, and give an overview of how we use the privacy evaluation model to measure the level of privacy guarantee in the context of different types of attacks

    A Survey of Multiplicative Perturbation for Privacy-Preserving Data Mining

    No full text
    The major challenge of data perturbation is to achieve the desired balance between the level of privacy guarantee and the level of data utility. Data privacy and data utility are commonly considered as a pair of conflicting requirements in privacy-preserving data mining systems and applications. Multiplicative perturbation algorithms aim at improving data privacy while maintaining the desired level of data utility by selectively preserving the mining task and model specific information during the data perturbation process. By preserving the task and model specific information, a set of “transformation-invariant data mining models” can be applied to the perturbed data directly, achieving the required model accuracy. Often a multiplicative perturbation algorithm may find multiple data transformations that preserve the required data utility. Thus the next major challenge is to find a good transformation that provides a satisfactory level of privacy guarantee. In this chapter, we review three representative multiplicative perturbation methods: rotation perturbation, projection perturbation, and geometric perturbation, and discuss the technical issues and research challenges. We first describe the mining task and model specific information for a class of data mining models, and the transformations that can (approximately) preserve the information. Then we discuss the design of appropriate privacy evaluation models for multiplicative perturbations, and give an overview of how we use the privacy evaluation model to measure the level of privacy guarantee in the context of different types of attacks
    corecore