9 research outputs found

    Privacy Preserving Utility Mining: A Survey

    Full text link
    In big data era, the collected data usually contains rich information and hidden knowledge. Utility-oriented pattern mining and analytics have shown a powerful ability to explore these ubiquitous data, which may be collected from various fields and applications, such as market basket analysis, retail, click-stream analysis, medical analysis, and bioinformatics. However, analysis of these data with sensitive private information raises privacy concerns. To achieve better trade-off between utility maximizing and privacy preserving, Privacy-Preserving Utility Mining (PPUM) has become a critical issue in recent years. In this paper, we provide a comprehensive overview of PPUM. We first present the background of utility mining, privacy-preserving data mining and PPUM, then introduce the related preliminaries and problem formulation of PPUM, as well as some key evaluation criteria for PPUM. In particular, we present and discuss the current state-of-the-art PPUM algorithms, as well as their advantages and deficiencies in detail. Finally, we highlight and discuss some technical challenges and open directions for future research on PPUM.Comment: 2018 IEEE International Conference on Big Data, 10 page

    Exploring the Existing and Unknown Side Effects of Privacy Preserving Data Mining Algorithms

    Get PDF
    The data mining sanitization process involves converting the data by masking the sensitive data and then releasing it to public domain. During the sanitization process, side effects such as hiding failure, missing cost and artificial cost of the data were observed. Privacy Preserving Data Mining (PPDM) algorithms were developed for the sanitization process to overcome information loss and yet maintain data integrity. While these PPDM algorithms did provide benefits for privacy preservation, they also made sure to solve the side effects that occurred during the sanitization process. Many PPDM algorithms were developed to reduce these side effects. There are several PPDM algorithms created based on different PPDM techniques. However, previous studies have not explored or justified why non-traditional side effects were not given much importance. This study reported the findings of the side effects for the PPDM algorithms in a newly created web repository. The research methodology adopted for this study was Design Science Research (DSR). This research was conducted in four phases, which were as follows. The first phase addressed the characteristics, similarities, differences, and relationships of existing side effects. The next phase found the characteristics of non-traditional side effects. The third phase used the Privacy Preservation and Security Framework (PPSF) tool to test if non-traditional side effects occur in PPDM algorithms. This phase also attempted to find additional unknown side effects which have not been found in prior studies. PPDM algorithms considered were Greedy, POS2DT, SIF_IDF, cpGA2DT, pGA2DT, sGA2DT. PPDM techniques associated were anonymization, perturbation, randomization, condensation, heuristic, reconstruction, and cryptography. The final phase involved creating a new online web repository to report all the side effects found for the PPDM algorithms. A Web repository was created using full stack web development. AngularJS, Spring, Spring Boot and Hibernate frameworks were used to build the web application. The results of the study implied various PPDM algorithms and their side effects. Additionally, the relationship and impact that hiding failure, missing cost, and artificial cost have on each other was also understood. Interestingly, the side effects and their relationship with the type of data (sensitive or non-sensitive or new) was observed. As the web repository acts as a quick reference domain for PPDM algorithms. Developing, improving, inventing, and reporting PPDM algorithms is necessary. This study will influence researchers or organizations to report, use, reuse, or develop better PPDM algorithms

    Association rule hiding using integer linear programming

    Get PDF
    Privacy preserving data mining has become the focus of attention of government statistical agencies and database security research community who are concerned with preventing privacy disclosure during data mining. Repositories of large datasets include sensitive rules that need to be concealed from unauthorized access. Hence, association rule hiding emerged as one of the powerful techniques for hiding sensitive knowledge that exists in data before it is published. In this paper, we present a constraint-based optimization approach for hiding a set of sensitive association rules, using a well-structured integer linear program formulation. The proposed approach reduces the database sanitization problem to an instance of the integer linear programming problem. The solution of the integer linear program determines the transactions that need to be sanitized in order to conceal the sensitive rules while minimizing the impact of sanitization on the non-sensitive rules. We also present a heuristic sanitization algorithm that performs hiding by reducing the support or the confidence of the sensitive rules. The results of the experimental evaluation of the proposed approach on real-life datasets indicate the promising performance of the approach in terms of side effects on the original database

    A Survey on Particle Swarm Optimization for Association Rule Mining

    Get PDF
    Association rule mining (ARM) is one of the core techniques of data mining to discover potentially valuable association relationships from mixed datasets. In the current research, various heuristic algorithms have been introduced into ARM to address the high computation time of traditional ARM. Although a more detailed review of the heuristic algorithms based on ARM is available, this paper differs from the existing reviews in that we expected it to provide a more comprehensive and multi-faceted survey of emerging research, which could provide a reference for researchers in the field to help them understand the state-of-the-art PSO-based ARM algorithms. In this paper, we review the existing research results. Heuristic algorithms for ARM were divided into three main groups, including biologically inspired, physically inspired, and other algorithms. Additionally, different types of ARM and their evaluation metrics are described in this paper, and the current status of the improvement in PSO algorithms is discussed in stages, including swarm initialization, algorithm parameter optimization, optimal particle update, and velocity and position updates. Furthermore, we discuss the applications of PSO-based ARM algorithms and propose further research directions by exploring the existing problems.publishedVersio

    Design and Development of an Energy Efficient Multimedia Cloud Data Center with Minimal SLA Violation

    Get PDF
    Multimedia computing (MC) is rising as a nascent computing paradigm to process multimedia applications and provide efficient multimedia cloud services with optimal Quality of Service (QoS) to the multimedia cloud users. But, the growing popularity of MC is affecting the climate. Because multimedia cloud data centers consume an enormous amount of energy to provide services, it harms the environment due to carbon dioxide emissions. Virtual machine (VM) migration can effectively address this issue; it reduces the energy consumption of multimedia cloud data centers. Due to the reduction of Energy Consumption (EC), the Service Level Agreement violation (SLAV) may increase. An efficient VM selection plays a crucial role in maintaining the stability between EC and SLAV. This work highlights a novel VM selection policy based on identifying the Maximum value among the differences of the Sum of Squares Utilization Rate (MdSSUR) parameter to reduce the EC of multimedia cloud data centers with minimal SLAV. The proposed MdSSUR VM selection policy has been evaluated using real workload traces in CloudSim. The simulation result of the proposed MdSSUR VM selection policy demonstrates the rate of improvements of the EC, the number of VM migrations, and the SLAV by 28.37%, 89.47%, and 79.14%, respectively

    An evolutionary adaptive neuro-fuzzy inference system for estimating field penetration index of tunnel boring machine in rock mass

    Get PDF
    Field penetration index (FPI) is one of the representative key parameters to examine the tunnel boring machine (TBM) performance. Lack of accurate FPI prediction can be responsible for numerous disastrous incidents associated with rock mechanics and engineering. This study aims to predict TBM performance (i.e. FPI) by an efficient and improved adaptive neuro-fuzzy inference system (ANFIS) model. This was done using an evolutionary algorithm, i.e. artificial bee colony (ABC) algorithm mixed with the ANFIS model. The role of ABC algorithm in this system is to find the optimum membership functions (MFs) of ANFIS model to achieve a higher degree of accuracy. The procedure and modeling were conducted on a tunnelling database comprising of more than 150 data samples where brittleness index (BI), fracture spacing, α angle between the plane of weakness and the TBM driven direction, and field single cutter load were assigned as model inputs to approximate FPI values. According to the results obtained by performance indices, the proposed ANFIS_ABC model was able to receive the highest accuracy level in predicting FPI values compared with ANFIS model. In terms of coefficient of determination (R2), the values of 0.951 and 0.901 were obtained for training and testing stages of the proposed ANFIS_ABC model, respectively, which confirm its power and capability in solving TBM performance problem. The proposed model can be used in the other areas of rock mechanics and underground space technologies with similar conditions. © 2021 Institute of Rock and Soil Mechanics, Chinese Academy of Science

    Identification of top-K influential communities in big networks

    Get PDF

    Protection of data privacy based on artificial intelligence in Cyber-Physical Systems

    Full text link
    With the rapid evolution of cyber attack techniques, the security and privacy of Cyber-Physical Systems (CPSs) have become key challenges. CPS environments have several properties that make them unique in efforts to appropriately secure them when compared with the processes, techniques and processes that have evolved for traditional IT networks and platforms. CPS ecosystems are comprised of heterogeneous systems, each with long lifespans. They use multitudes of operating systems and communication protocols and are often designed without security as a consideration. From a privacy perspective, there are also additional challenges. It is hard to capture and filter the heterogeneous data sources of CPSs, especially power systems, as their data should include network traffic and the sensing data of sensors. Protecting such data during the stages of collection, analysis and publication still open the possibility of new cyber threats disrupting the operational loops of power systems. Moreover, while protecting the original data of CPSs, identifying cyberattacks requires intrusion detection that produces high false alarm rates. This thesis significantly contributes to the protection of heterogeneous data sources, along with the high performance of discovering cyber-attacks in CPSs, especially smart power networks (i.e., power systems and their networks). For achieving high data privacy, innovative privacy-preserving techniques based on Artificial Intelligence (AI) are proposed to protect the original and sensitive data generated by CPSs and their networks. For cyber-attack discovery, meanwhile applying privacy-preserving techniques, new anomaly detection algorithms are developed to ensure high performances in terms of data utility and accuracy detection. The first main contribution of this dissertation is the development of a privacy preservation intrusion detection methodology that uses the correlation coefficient, independent component analysis, and Expectation Maximisation (EM) clustering algorithms to select significant data portions and discover cyber attacks against power networks. Before and after applying this technique, machine learning algorithms are used to assess their capabilities to classify normal and suspicious vectors. The second core contribution of this work is the design of a new privacy-preserving anomaly detection technique protecting the confidential information of CPSs and discovering malicious observations. Firstly, a data pre-processing technique filters and transforms data into a new format that accomplishes the aim of preserving privacy. Secondly, an anomaly detection technique using a Gaussian mixture model which fits selected features, and a Kalman filter technique that accurately computes the posterior probabilities of legitimate and anomalous events are employed. The third significant contribution of this thesis is developing a novel privacy-preserving framework for achieving the privacy and security criteria of smart power networks. In the first module, a two-level privacy module is developed, including an enhanced proof of work technique-based blockchain for accomplishing data integrity and a variational autoencoder approach for changing the data to an encoded data format to prevent inference attacks. In the second module, a long short-term memory deep learning algorithm is employed in anomaly detection to train and validate the outputs from the two-level privacy modules

    A sanitization approach for hiding sensitive itemsets based on particle swarm optimization

    No full text
    Privacy-preserving data mining (PPDM) has become an important research field in recent years, as approaches for PPDM can discover important information in databases, while ensuring that sensitive information is not revealed. Several algorithms have been proposed to hide sensitive information in databases. They apply addition and deletion operations to perturb an original database and hide the sensitive information. Finding an appropriate set of transactions/itemsets to be perturbed for hiding sensitive information while preserving other important information is a NP-hard problem. In the past, genetic algorithm (GA)-based approaches were developed to hide sensitive itemsets in an original database through transaction deletion. In this paper, a particle swarm optimization (PSO)-based algorithm called PSO2DT is developed to hide sensitive itemsets while minimizing the side effects of the sanitization process. Each particle in the designed PSO2DT algorithm represents a set of transactions to be deleted. Particles are evaluated using a fitness function that is designed to minimize the side effects of sanitization. The proposed algorithm can also determine the maximum number of transactions to be deleted for efficiently hiding sensitive itemsets, unlike the state-of-the-art GA-based approaches. Besides, an important strength of the proposed approach is that few parameters need to be set, and it can still find better solutions to the sanitization problem than GA-based approaches. Furthermore, the pre-large concept is also adopted in the designed algorithm to speed up the evolution process. Substantial experiments on both real-world and synthetic datasets show that the proposed PSO2DT algorithm performs better than the Greedy algorithm and GA-based algorithms in terms of runtime, fail to be hidden (F-T-H), not to be hidden (N-T-H), and database similarity (DS).Web of Science5318
    corecore