47 research outputs found

    Exploring the Existing and Unknown Side Effects of Privacy Preserving Data Mining Algorithms

    Get PDF
    The data mining sanitization process involves converting the data by masking the sensitive data and then releasing it to public domain. During the sanitization process, side effects such as hiding failure, missing cost and artificial cost of the data were observed. Privacy Preserving Data Mining (PPDM) algorithms were developed for the sanitization process to overcome information loss and yet maintain data integrity. While these PPDM algorithms did provide benefits for privacy preservation, they also made sure to solve the side effects that occurred during the sanitization process. Many PPDM algorithms were developed to reduce these side effects. There are several PPDM algorithms created based on different PPDM techniques. However, previous studies have not explored or justified why non-traditional side effects were not given much importance. This study reported the findings of the side effects for the PPDM algorithms in a newly created web repository. The research methodology adopted for this study was Design Science Research (DSR). This research was conducted in four phases, which were as follows. The first phase addressed the characteristics, similarities, differences, and relationships of existing side effects. The next phase found the characteristics of non-traditional side effects. The third phase used the Privacy Preservation and Security Framework (PPSF) tool to test if non-traditional side effects occur in PPDM algorithms. This phase also attempted to find additional unknown side effects which have not been found in prior studies. PPDM algorithms considered were Greedy, POS2DT, SIF_IDF, cpGA2DT, pGA2DT, sGA2DT. PPDM techniques associated were anonymization, perturbation, randomization, condensation, heuristic, reconstruction, and cryptography. The final phase involved creating a new online web repository to report all the side effects found for the PPDM algorithms. A Web repository was created using full stack web development. AngularJS, Spring, Spring Boot and Hibernate frameworks were used to build the web application. The results of the study implied various PPDM algorithms and their side effects. Additionally, the relationship and impact that hiding failure, missing cost, and artificial cost have on each other was also understood. Interestingly, the side effects and their relationship with the type of data (sensitive or non-sensitive or new) was observed. As the web repository acts as a quick reference domain for PPDM algorithms. Developing, improving, inventing, and reporting PPDM algorithms is necessary. This study will influence researchers or organizations to report, use, reuse, or develop better PPDM algorithms

    Association rule hiding using integer linear programming

    Get PDF
    Privacy preserving data mining has become the focus of attention of government statistical agencies and database security research community who are concerned with preventing privacy disclosure during data mining. Repositories of large datasets include sensitive rules that need to be concealed from unauthorized access. Hence, association rule hiding emerged as one of the powerful techniques for hiding sensitive knowledge that exists in data before it is published. In this paper, we present a constraint-based optimization approach for hiding a set of sensitive association rules, using a well-structured integer linear program formulation. The proposed approach reduces the database sanitization problem to an instance of the integer linear programming problem. The solution of the integer linear program determines the transactions that need to be sanitized in order to conceal the sensitive rules while minimizing the impact of sanitization on the non-sensitive rules. We also present a heuristic sanitization algorithm that performs hiding by reducing the support or the confidence of the sensitive rules. The results of the experimental evaluation of the proposed approach on real-life datasets indicate the promising performance of the approach in terms of side effects on the original database

    A Survey on Particle Swarm Optimization for Association Rule Mining

    Get PDF
    Association rule mining (ARM) is one of the core techniques of data mining to discover potentially valuable association relationships from mixed datasets. In the current research, various heuristic algorithms have been introduced into ARM to address the high computation time of traditional ARM. Although a more detailed review of the heuristic algorithms based on ARM is available, this paper differs from the existing reviews in that we expected it to provide a more comprehensive and multi-faceted survey of emerging research, which could provide a reference for researchers in the field to help them understand the state-of-the-art PSO-based ARM algorithms. In this paper, we review the existing research results. Heuristic algorithms for ARM were divided into three main groups, including biologically inspired, physically inspired, and other algorithms. Additionally, different types of ARM and their evaluation metrics are described in this paper, and the current status of the improvement in PSO algorithms is discussed in stages, including swarm initialization, algorithm parameter optimization, optimal particle update, and velocity and position updates. Furthermore, we discuss the applications of PSO-based ARM algorithms and propose further research directions by exploring the existing problems.publishedVersio

    Sensor data fusion for the industrial artificial intelligence of things

    Get PDF
    The emergence of smart sensors, artificial intelligence, and deep learning technologies yield artificial intelligence of things, also known as the AIoT. Sophisticated cooperation of these technologies is vital for the effective processing of industrial sensor data. This paper introduces a new framework for addressing the different challenges of the AIoT applications. The proposed framework is an intelligent combination of multi-agent systems, knowledge graphs and deep learning. Deep learning architectures are used to create models from different sensor-based data. Multi-agent systems can be used for simulating the collective behaviours of the smart sensors using IoT settings. The communication among different agents is realized by integrating knowledge graphs. Different optimizations based on constraint satisfaction as well as evolutionary computation are also investigated. Experimental analysis is undertaken to compare the methodology presented to state-of-the-art AIoT technologies. We show through experimentation that our designed framework achieves good performance compared to baseline solutions.publishedVersio

    Privacy preserving association rule mining using attribute-identity mapping

    Get PDF
    Association rule mining uncovers hidden yet important patterns in data. Discovery of the patterns helps data owners to make right decision to enhance efficiency, increase profit and reduce loss. However, there is privacy concern especially when the data owner is not the miner or when many parties are involved. This research studied privacy preserving association rule mining (PPARM) of horizontally partitioned and outsourced data. Existing research works in the area concentrated mainly on the privacy issue and paid very little attention to data quality issue. Meanwhile, the more the data quality, the more accurate and reliable will the association rules be. Consequently, this research proposed Attribute-Identity Mapping (AIM) as a PPARM technique to address the data quality issue. Given a dataset, AIM identifies set of attributes, attribute values for each attribute. It then assigns ‘unique’ identity for each of the attributes and their corresponding values. It then generates sanitized dataset by replacing each attribute and its values with their corresponding identities. For privacy preservation purpose, the sanitization process will be carried out by data owners. They then send the sanitized data, which is made up of only identities, to data miner. When any or all the data owners need(s) ARM result from the aggregate data, they send query to the data miner. The query constitutes attributes (in form of identities), minSup and minConf thresholds and then number of rules they are want. Results obtained show that the PPARM technique maintains 100% data quality without compromising privacy, using Census Income dataset

    Data Mining and Machine Learning for Software Engineering

    Get PDF
    Software engineering is one of the most utilizable research areas for data mining. Developers have attempted to improve software quality by mining and analyzing software data. In any phase of software development life cycle (SDLC), while huge amount of data is produced, some design, security, or software problems may occur. In the early phases of software development, analyzing software data helps to handle these problems and lead to more accurate and timely delivery of software projects. Various data mining and machine learning studies have been conducted to deal with software engineering tasks such as defect prediction, effort estimation, etc. This study shows the open issues and presents related solutions and recommendations in software engineering, applying data mining and machine learning techniques

    A Generalized Wine Quality Prediction Framework by Evolutionary Algorithms

    Get PDF
    Wine is an exciting and complex product with distinctive qualities that makes it different from other manufactured products. Therefore, the testing approach to determine the quality of wine is complex and diverse. Several elements influence wine quality, but the views of experts can cause the most considerable influence on how people view the quality of wine. The views of experts on quality is very subjective, and may not match the taste of consumer. In addition, the experts may not always be available for the wine testing. To overcome this issue, many approaches based on machine learning techniques that get the attention of the wine industry have been proposed to solve it. However, they focused only on using a particular classifier with a specific set of wine dataset. In this paper, we thus firstly propose the generalized wine quality prediction framework to provide a mechanism for finding a useful hybrid model for wine quality prediction. Secondly, based on the framework, the generalized wine quality prediction algorithm using the genetic algorithms is proposed. It first encodes the classifiers as well as their hyperparameters into a chromosome. The fitness of a chromosome is then evaluated by the average accuracy of the employed classifiers. The genetic operations are performed to generate new offspring. The evolution process is continuing until reaching the stop criteria. As a result, the proposed approach can automatically find an appropriate hybrid set of classifiers and their hyperparameters for optimizing the prediction result and independent on the dataset. At last, experiments on the wine datasets were made to show the merits and effectiveness of the proposed approach

    Privacy reinforcement learning for faults detection in the smart grid

    Get PDF
    Recent anticipated advancements in ad hoc Wireless Mesh Networks (WMN) have made them strong natural candidates for Smart Grid’s Neighborhood Area Network (NAN) and the ongoing work on Advanced Metering Infrastructure (AMI). Fault detection in these types of energy systems has recently shown lots of interest in the data science community, where anomalous behavior from energy platforms is identified. This paper develops a new framework based on privacy reinforcement learning to accurately identify anomalous patterns in a distributed and heterogeneous energy environment. The local outlier factor is first performed to derive the local simple anomalous patterns in each site of the distributed energy platform. A reinforcement privacy learning is then established using blockchain technology to merge the local anomalous patterns into global complex anomalous patterns. Besides, different optimization strategies are suggested to improve the whole outlier detection process. To demonstrate the applicability of the proposed framework, intensive experiments have been carried out on well-known CASAS (Center of Advanced Studies in Adaptive Systems) platform. Our results show that our proposed framework outperforms the baseline fault detection solutions.publishedVersio

    Privacy reinforcement learning for faults detection in the smart grid

    Get PDF
    Recent anticipated advancements in ad hoc Wireless Mesh Networks (WMN) have made them strong natural candidates for Smart Grid’s Neighborhood Area Network (NAN) and the ongoing work on Advanced Metering Infrastructure (AMI). Fault detection in these types of energy systems has recently shown lots of interest in the data science community, where anomalous behavior from energy platforms is identified. This paper develops a new framework based on privacy reinforcement learning to accurately identify anomalous patterns in a distributed and heterogeneous energy environment. The local outlier factor is first performed to derive the local simple anomalous patterns in each site of the distributed energy platform. A reinforcement privacy learning is then established using blockchain technology to merge the local anomalous patterns into global complex anomalous patterns. Besides, different optimization strategies are suggested to improve the whole outlier detection process. To demonstrate the applicability of the proposed framework, intensive experiments have been carried out on well-known CASAS (Center of Advanced Studies in Adaptive Systems) platform. Our results show that our proposed framework outperforms the baseline fault detection solutions.publishedVersio
    corecore