10 research outputs found

    Concealment Conserving the Data Mining of Groups & Individual

    Get PDF
    We present an overview of privacy preserving data mining, one of the most popular directions in the data mining research community. In the first part of the chapter, we presented approaches that have been proposed for the protection of either the sensitive data itself in the course of data mining or the sensitive data mining results, in the context of traditional (relational) datasets. Following that, in the second part of the chapter, we focused our attention on one of the most recent as well as prominent directions in privacy preserving data mining: the mining of user mobility data. Although still in its infancy, privacy preserving data mining of mobility data has attracted a lot of research attention and already counts a number of methodologies both with respect to sensitive data protection and to sensitive knowledge hiding. Finally, in the end of the chapter, we provided some roadmap along the field of privacy preserving mobility data mining as well as the area of privacy preserving data mining at large

    CẢI THIỆN THUẬT GIẢI CUCKOO TRONG VẤN ĐỀ ẨN LUẬT KẾT HỢP

    Get PDF
    Nowadays, the problem of data security in the process of data mining receives more attention. The question is how to balance between exploiting legal data and avoiding revealing sensitive information. There have been many approaches, and one remarkable approach is privacy preservation in association rule mining to hide sensitive rules. Recently, a meta-heuristic algorithm is relatively effective for this purpose, which is cuckoo optimization algorithm (COA4ARH). In this paper, an improved version of COA4ARH is presented for calculating the minimum number of sensitive items which should be removed to hide sensitive rules, as well as limit the loss of non-sensitive rules. The experimental results gained from three real datasets showed that the proposed method has better results compared to the original algorithm in several cases.Hiện nay, vấn đề bảo mật dữ liệu ngày càng được quan tâm hơn trong quá trình khai thác dữ liệu. Làm sao để vừa có thể khai thác hợp pháp mà vừa tránh lộ ra các thông tin nhạy cảm. Có rất nhiều hướng tiếp cận nhưng nổi trội trong số đó là khai thác luật kết hợp đảm bảo sự riêng tư nhằm ẩn các luật nhạy cảm. Gần đây, có một thuật toán meta heuristic khá hiệu quả để đạt mục đích này, đó là thuật toán tối ưu hóa Cuckoo (COA4ARH). Trong bài báo này, một đề xuất cải tiến của COA4ARH được đưa ra để tính toán số lượng tối thiểu các item nhạy cảm cần được xóa để ẩn luật, từ đó hạn chế việc mất các luật không nhạy cảm. Các kết quả thực nghiệm tiến hành trên ba tập dữ liệu thực cho thấy trong một số trường hợp thì cải tiến đề xuất có kết quả khá tốt so với thuật toán ban đầu

    State of the Art in Privacy Preserving Data Mining

    Get PDF
    Privacy is one of the most important properties an information system must satisfy. A relatively new trend shows that classical access control techniques are not sufficient to guarantee privacy when Data Mining techniques are used. Such a trend, especially in the context of public databases, or in the context of sensible information related to critical infrastructures, represents, nowadays a not negligible thread. Privacy Preserving Data Mining (PPDM) algorithms have been recently introduced with the aim of modifying the database in such a way to prevent the discovery of sensible information. This is a very complex task and there exist in the scientific literature some different approaches to the problem. In this work we present a "Survey" of the current PPDM methodologies which seem promising for the future.JRC.G.6-Sensors, radar technologies and cybersecurit

    Using unknowns to prevent discovery of association rules

    Get PDF
    Data mining technology has given us new capabilities to identify correlations in large data sets. This introduces risks when the data is to be made public, but the correlations are private. We introduce a method for selectively removing individual values from a database to prevent the discovery of a set of rules, while preserving the data for other applications. The efficacy and complexity of this method are discussed. We also present an experiment showing an example of this methodology

    Privacy Preserving Data Mining, Evaluation Methodologies

    Get PDF
    Privacy is one of the most important properties an information system must satisfy. A relatively new trend shows that classical access control techniques are not sufficient to guarantee privacy when datamining techniques are used. Privacy Preserving Data Mining (PPDM) algorithms have been recently introduced with the aim of modifying the database in such a way to prevent the discovery of sensible information. Due to the large amount of possible techniques that can be used to achieve this goal, it is necessary to provide some standard evaluation metrics to determine the best algorithms for a specific application or context. Currently, however, there is no common set of parameters that can be used for this purpose. Moreover, because sanitization modifies the data, an important issue, especially for critical data, is to preserve the quality of data. However, to the best of our knowledge, no approaches have been developed dealing with the issue of data quality in the context of PPDM algorithms. This report explores the problem of PPDM algorithm evaluation, starting from the key goal of preserving of data quality. To achieve such goal, we propose a formal definition of data quality specifically tailored for use in the context of PPDM algorithms, a set of evaluation parameters and an evaluation algorithm. Moreover, because of the "environment related" nature of data quality, a structure to represent constraints and information relevance related to data is presented. The resulting evaluation core process is then presented as a part of a more general three step evaluation framework, taking also into account other aspects of the algorithm evaluation such as efficiency, scalability and level of privacy.JRC.G.6-Sensors, radar technologies and cybersecurit

    Privacy Preserving Data Mining, A Data Quality Approach

    Get PDF
    Privacy is one of the most important properties an information system must satisfy. A relatively new trend shows that classical access control techniques are not sufficient to guarantee privacy when datamining techniques are used. Privacy Preserving Data Mining (PPDM) algorithms have been recently introduced with the aim of sanitizing the database in such a way to prevent the discovery of sensible information (e.g. association rules). A drawback of such algorithms is that the introduced sanitization may disrupt the quality of data itself. In this report we introduce a new methodology and algorithms for performing useful PPDM operations, while preserving the data quality of the underlying database.JRC.G.6-Sensors, radar technologies and cybersecurit

    Parsimonious Downgrading and Decision Trees Applied to the Inference Problem

    No full text
    In this paper we present our new paradigm for dealing with the inference problem which arises from downgrading. Our new paradigm has two main parts: the application of decision tree analysis to the inference problem, and the concept of parsimonious downgrading. We also include a new thermodynamically motivated way of dealing with the deduction of inference rules from partial data. Keywords Data mining, inference, downgrading, rules. 1. A NEW PARADIGM Our new paradigm is a combination of decision tree analysis and parsimonious downgrading. Decision tree analysis has existed in the field of AI since the 1980's [3]. In brief, decision trees are graphs associated to data, with the goal of deducing rules from the data. Our new paradigm applies decision trees to the inference problem. In this paper we introduce the new concept of parsimonious downgrading. When High wishes to downgrade a set of data to Low, it may be necessary, because of inference channels, to trim the set. Parsimonious..

    The inference problem in multilevel secure databases

    Get PDF
    Conventional access control models, such as role-based access control, protect sensitive data from unauthorized disclosure via direct accesses, however, they fail to prevent unauthorized disclosure happening through indirect accesses. Indirect data disclosure via inference channels occurs when sensitive information can be inferred from nonsensitive data and metadata, which is also known as “the inference problem”. This problem has draw n much attention from researcher in the database community due to its great compromise of data security. It has been studied under four settings according to where it occurs. They are statistical databases, multilevel secure databases, data mining, and web-based applications. This thesis investigates previous efforts dedicated to inference problems in multilevel secure databases, and presents the latest findings of our research on this problem. Our contribution includes two methods. One is a dynamic control over this problem, which designs a set of accessing key distribution schemes to remove inference after all inference channels in the database has been identified. The other combines rough sets and entropies to form a computational solution to detect and remove inferences, which for the first time provides an integrated solution to the inference problem. Comparison with previous work has also been done, and we have proved both of them are effective and easy to implement. Since the inference problem is described as a problem of detecting and removing inference channels, this thesis contains two main parts: inference detecting techniques and inference removing techniques. In both two aspects, some techniques are selectively but extensively examined

    Privacy by Design in Data Mining

    Get PDF
    Privacy is ever-growing concern in our society: the lack of reliable privacy safeguards in many current services and devices is the basis of a diffusion that is often more limited than expected. Moreover, people feel reluctant to provide true personal data, unless it is absolutely necessary. Thus, privacy is becoming a fundamental aspect to take into account when one wants to use, publish and analyze data involving sensitive information. Many recent research works have focused on the study of privacy protection: some of these studies aim at individual privacy, i.e., the protection of sensitive individual data, while others aim at corporate privacy, i.e., the protection of strategic information at organization level. Unfortunately, it is in- creasingly hard to transform the data in a way that it protects sensitive information: we live in the era of big data characterized by unprecedented opportunities to sense, store and analyze complex data which describes human activities in great detail and resolution. As a result anonymization simply cannot be accomplished by de-identification. In the last few years, several techniques for creating anonymous or obfuscated versions of data sets have been proposed, which essentially aim to find an acceptable trade-off between data privacy on the one hand and data utility on the other. So far, the common result obtained is that no general method exists which is capable of both dealing with “generic personal data” and preserving “generic analytical results”. In this thesis we propose the design of technological frameworks to counter the threats of undesirable, unlawful effects of privacy violation, without obstructing the knowledge discovery opportunities of data mining technologies. Our main idea is to inscribe privacy protection into the knowledge discovery technol- ogy by design, so that the analysis incorporates the relevant privacy requirements from the start. Therefore, we propose the privacy-by-design paradigm that sheds a new light on the study of privacy protection: once specific assumptions are made about the sensitive data and the target mining queries that are to be answered with the data, it is conceivable to design a framework to: a) transform the source data into an anonymous version with a quantifiable privacy guarantee, and b) guarantee that the target mining queries can be answered correctly using the transformed data instead of the original ones. This thesis investigates on two new research issues which arise in modern Data Mining and Data Privacy: individual privacy protection in data publishing while preserving specific data mining analysis, and corporate privacy protection in data mining outsourcing
    corecore