6 research outputs found

    Privacy Preserving Utility Mining: A Survey

    Full text link
    In big data era, the collected data usually contains rich information and hidden knowledge. Utility-oriented pattern mining and analytics have shown a powerful ability to explore these ubiquitous data, which may be collected from various fields and applications, such as market basket analysis, retail, click-stream analysis, medical analysis, and bioinformatics. However, analysis of these data with sensitive private information raises privacy concerns. To achieve better trade-off between utility maximizing and privacy preserving, Privacy-Preserving Utility Mining (PPUM) has become a critical issue in recent years. In this paper, we provide a comprehensive overview of PPUM. We first present the background of utility mining, privacy-preserving data mining and PPUM, then introduce the related preliminaries and problem formulation of PPUM, as well as some key evaluation criteria for PPUM. In particular, we present and discuss the current state-of-the-art PPUM algorithms, as well as their advantages and deficiencies in detail. Finally, we highlight and discuss some technical challenges and open directions for future research on PPUM.Comment: 2018 IEEE International Conference on Big Data, 10 page

    Exploring the Existing and Unknown Side Effects of Privacy Preserving Data Mining Algorithms

    Get PDF
    The data mining sanitization process involves converting the data by masking the sensitive data and then releasing it to public domain. During the sanitization process, side effects such as hiding failure, missing cost and artificial cost of the data were observed. Privacy Preserving Data Mining (PPDM) algorithms were developed for the sanitization process to overcome information loss and yet maintain data integrity. While these PPDM algorithms did provide benefits for privacy preservation, they also made sure to solve the side effects that occurred during the sanitization process. Many PPDM algorithms were developed to reduce these side effects. There are several PPDM algorithms created based on different PPDM techniques. However, previous studies have not explored or justified why non-traditional side effects were not given much importance. This study reported the findings of the side effects for the PPDM algorithms in a newly created web repository. The research methodology adopted for this study was Design Science Research (DSR). This research was conducted in four phases, which were as follows. The first phase addressed the characteristics, similarities, differences, and relationships of existing side effects. The next phase found the characteristics of non-traditional side effects. The third phase used the Privacy Preservation and Security Framework (PPSF) tool to test if non-traditional side effects occur in PPDM algorithms. This phase also attempted to find additional unknown side effects which have not been found in prior studies. PPDM algorithms considered were Greedy, POS2DT, SIF_IDF, cpGA2DT, pGA2DT, sGA2DT. PPDM techniques associated were anonymization, perturbation, randomization, condensation, heuristic, reconstruction, and cryptography. The final phase involved creating a new online web repository to report all the side effects found for the PPDM algorithms. A Web repository was created using full stack web development. AngularJS, Spring, Spring Boot and Hibernate frameworks were used to build the web application. The results of the study implied various PPDM algorithms and their side effects. Additionally, the relationship and impact that hiding failure, missing cost, and artificial cost have on each other was also understood. Interestingly, the side effects and their relationship with the type of data (sensitive or non-sensitive or new) was observed. As the web repository acts as a quick reference domain for PPDM algorithms. Developing, improving, inventing, and reporting PPDM algorithms is necessary. This study will influence researchers or organizations to report, use, reuse, or develop better PPDM algorithms

    Deep Learning for Link Prediction in Dynamic Networks using Weak Estimators

    Full text link
    Link prediction is the task of evaluating the probability that an edge exists in a network, and it has useful applications in many domains. Traditional approaches rely on measuring the similarity between two nodes in a static context. Recent research has focused on extending link prediction to a dynamic setting, predicting the creation and destruction of links in networks that evolve over time. Though a difficult task, the employment of deep learning techniques have shown to make notable improvements to the accuracy of predictions. To this end, we propose the novel application of weak estimators in addition to the utilization of traditional similarity metrics to inexpensively build an effective feature vector for a deep neural network. Weak estimators have been used in a variety of machine learning algorithms to improve model accuracy, owing to their capacity to estimate changing probabilities in dynamic systems. Experiments indicate that our approach results in increased prediction accuracy on several real-world dynamic networks

    Design and Development of an Energy Efficient Multimedia Cloud Data Center with Minimal SLA Violation

    Get PDF
    Multimedia computing (MC) is rising as a nascent computing paradigm to process multimedia applications and provide efficient multimedia cloud services with optimal Quality of Service (QoS) to the multimedia cloud users. But, the growing popularity of MC is affecting the climate. Because multimedia cloud data centers consume an enormous amount of energy to provide services, it harms the environment due to carbon dioxide emissions. Virtual machine (VM) migration can effectively address this issue; it reduces the energy consumption of multimedia cloud data centers. Due to the reduction of Energy Consumption (EC), the Service Level Agreement violation (SLAV) may increase. An efficient VM selection plays a crucial role in maintaining the stability between EC and SLAV. This work highlights a novel VM selection policy based on identifying the Maximum value among the differences of the Sum of Squares Utilization Rate (MdSSUR) parameter to reduce the EC of multimedia cloud data centers with minimal SLAV. The proposed MdSSUR VM selection policy has been evaluated using real workload traces in CloudSim. The simulation result of the proposed MdSSUR VM selection policy demonstrates the rate of improvements of the EC, the number of VM migrations, and the SLAV by 28.37%, 89.47%, and 79.14%, respectively

    Privacy by Design in Data Mining

    Get PDF
    Privacy is ever-growing concern in our society: the lack of reliable privacy safeguards in many current services and devices is the basis of a diffusion that is often more limited than expected. Moreover, people feel reluctant to provide true personal data, unless it is absolutely necessary. Thus, privacy is becoming a fundamental aspect to take into account when one wants to use, publish and analyze data involving sensitive information. Many recent research works have focused on the study of privacy protection: some of these studies aim at individual privacy, i.e., the protection of sensitive individual data, while others aim at corporate privacy, i.e., the protection of strategic information at organization level. Unfortunately, it is in- creasingly hard to transform the data in a way that it protects sensitive information: we live in the era of big data characterized by unprecedented opportunities to sense, store and analyze complex data which describes human activities in great detail and resolution. As a result anonymization simply cannot be accomplished by de-identification. In the last few years, several techniques for creating anonymous or obfuscated versions of data sets have been proposed, which essentially aim to find an acceptable trade-off between data privacy on the one hand and data utility on the other. So far, the common result obtained is that no general method exists which is capable of both dealing with “generic personal data” and preserving “generic analytical results”. In this thesis we propose the design of technological frameworks to counter the threats of undesirable, unlawful effects of privacy violation, without obstructing the knowledge discovery opportunities of data mining technologies. Our main idea is to inscribe privacy protection into the knowledge discovery technol- ogy by design, so that the analysis incorporates the relevant privacy requirements from the start. Therefore, we propose the privacy-by-design paradigm that sheds a new light on the study of privacy protection: once specific assumptions are made about the sensitive data and the target mining queries that are to be answered with the data, it is conceivable to design a framework to: a) transform the source data into an anonymous version with a quantifiable privacy guarantee, and b) guarantee that the target mining queries can be answered correctly using the transformed data instead of the original ones. This thesis investigates on two new research issues which arise in modern Data Mining and Data Privacy: individual privacy protection in data publishing while preserving specific data mining analysis, and corporate privacy protection in data mining outsourcing

    Protection of data privacy based on artificial intelligence in Cyber-Physical Systems

    Full text link
    With the rapid evolution of cyber attack techniques, the security and privacy of Cyber-Physical Systems (CPSs) have become key challenges. CPS environments have several properties that make them unique in efforts to appropriately secure them when compared with the processes, techniques and processes that have evolved for traditional IT networks and platforms. CPS ecosystems are comprised of heterogeneous systems, each with long lifespans. They use multitudes of operating systems and communication protocols and are often designed without security as a consideration. From a privacy perspective, there are also additional challenges. It is hard to capture and filter the heterogeneous data sources of CPSs, especially power systems, as their data should include network traffic and the sensing data of sensors. Protecting such data during the stages of collection, analysis and publication still open the possibility of new cyber threats disrupting the operational loops of power systems. Moreover, while protecting the original data of CPSs, identifying cyberattacks requires intrusion detection that produces high false alarm rates. This thesis significantly contributes to the protection of heterogeneous data sources, along with the high performance of discovering cyber-attacks in CPSs, especially smart power networks (i.e., power systems and their networks). For achieving high data privacy, innovative privacy-preserving techniques based on Artificial Intelligence (AI) are proposed to protect the original and sensitive data generated by CPSs and their networks. For cyber-attack discovery, meanwhile applying privacy-preserving techniques, new anomaly detection algorithms are developed to ensure high performances in terms of data utility and accuracy detection. The first main contribution of this dissertation is the development of a privacy preservation intrusion detection methodology that uses the correlation coefficient, independent component analysis, and Expectation Maximisation (EM) clustering algorithms to select significant data portions and discover cyber attacks against power networks. Before and after applying this technique, machine learning algorithms are used to assess their capabilities to classify normal and suspicious vectors. The second core contribution of this work is the design of a new privacy-preserving anomaly detection technique protecting the confidential information of CPSs and discovering malicious observations. Firstly, a data pre-processing technique filters and transforms data into a new format that accomplishes the aim of preserving privacy. Secondly, an anomaly detection technique using a Gaussian mixture model which fits selected features, and a Kalman filter technique that accurately computes the posterior probabilities of legitimate and anomalous events are employed. The third significant contribution of this thesis is developing a novel privacy-preserving framework for achieving the privacy and security criteria of smart power networks. In the first module, a two-level privacy module is developed, including an enhanced proof of work technique-based blockchain for accomplishing data integrity and a variational autoencoder approach for changing the data to an encoded data format to prevent inference attacks. In the second module, a long short-term memory deep learning algorithm is employed in anomaly detection to train and validate the outputs from the two-level privacy modules
    corecore