10 research outputs found

    Preservation of confidential information privacy and association rule hiding for data mining: a bibliometric review

    Get PDF
    In this era of technology, data of business organizations are growing with acceleration. Mining hidden patterns from this huge database would benefit many industries improving their decision-making processes. Along with the non-sensitive information, these databases also contain some sensitive information about customers. During the mining process, sensitive information about a person can get leaked, resulting in a misuse of the data and causing loss to an individual. The privacy preserving data mining can bring a solution to this problem, helping provide the benefits of mined data along with maintaining the privacy of the sensitive information. Hence, there is a growing interest in the scientific community for developing new approaches to hide the mined sensitive information. In this research, a bibliometric review is carried out during the period 2010 to 2018 to analyze the growth of studies regarding the confidential information privacy preservation through approaches addressed to the hiding of association rules of data

    Practical Secure Aggregation in Federated Learning Using Additive Secret Sharing

    Get PDF
    Federated learning is a machine learning technique where multiple clients with local data collaborate in training a machine learning model. In FedAvg, the main federated learning algorithm, clients train machine learning models locally and share the trained model with the server. While the sensitive data will never be sent to the server, a malicious server can construct the original training data by having access to the clients’ models in each training round. Secure aggregation techniques such as cryptography, trusted execution environment, or differential privacy are used to solve this problem. However, these techniques incur computation and communication overhead or affect the model’s accuracy. In this thesis, we consider a secure multi-party computation setup where clients use additive secret sharing to send their models to multiple servers. Our solution provides secure aggregation as long as there are at least two non-colluding servers. Moreover, we provide mathematical proof to show that the securely aggregated model at the end of each training round is exactly equal to the one provided by FedAvg without affecting accuracy and with efficient communication and computation. In comparison with SCOTCH, the state-of-the-art secure aggregation solution, experimental results show that our approach is 557% faster compared to SCOTCH and at the same time it reduces the communication cost of clients by 25%. Additionally, the accuracy of the trained model is exactly as FedAvg under balanced, unbalanced, IID, and Non-IID data distributions while it is only 8% slower

    Privacy-preserving Data clustering in Cloud Computing based on Fully Homomorphic Encryption

    Get PDF
    Cloud infrastructure with its massive storage and computing power is an ideal platform to perform large scale data analysis tasks to extract knowledge and support decision-making. However, there are critical data privacy and security issues associated with this platform, as the data is stored in a public infrastructure. Recently, fully homomorphic data encryption has been proposed as a solution due to its capabilities in performing computations over encrypted data. However, it is demonstrably slow for practical data mining applications. To address this and related concerns, we introduce a fully homomorphic and distributed data processing framework that utilizes MapReduce to perform distributed computations for data clustering tasks on a large number of cloud Virtual Machines (VMs). We illustrate how a variety of fully homomorphic-based computations can be carried out to accomplish data clustering tasks independently in the cloud and show that the distributed execution of data clustering tasks based on MapReduce can significantly reduce the execution time overhead caused by fully homomorphic computations. To evaluate our framework, we performed experiments using electricity consumption measurement data on the Google cloud platform with 100 VMs. We found the proposed distributed data processing framework to be highly efficient when compared to a centralized approach and as accurate as a plaintext implementation

    Learning structure and schemas from heterogeneous domains in networked systems: a survey

    Get PDF
    The rapidly growing amount of available digital documents of various formats and the possibility to access these through internet-based technologies in distributed environments, have led to the necessity to develop solid methods to properly organize and structure documents in large digital libraries and repositories. Specifically, the extremely large size of document collections make it impossible to manually organize such documents. Additionally, most of the document sexist in an unstructured form and do not follow any schemas. Therefore, research efforts in this direction are being dedicated to automatically infer structure and schemas. This is essential in order to better organize huge collections as well as to effectively and efficiently retrieve documents in heterogeneous domains in networked system. This paper presents a survey of the state-of-the-art methods for inferring structure from documents and schemas in networked environments. The survey is organized around the most important application domains, namely, bio-informatics, sensor networks, social networks, P2Psystems, automation and control, transportation and privacy preserving for which we analyze the recent developments on dealing with unstructured data in such domains.Peer ReviewedPostprint (published version

    BronzeGate: Real-time Transactional Data Obfuscation for GoldenGate

    Get PDF
    ABSTRACT Data privacy laws have appeared recently, such as the HIPAA laws for protecting medical records, and the PCI guidelines for protecting Credit Card information. Data privacy can be defined as maintaining the privacy of Personal Identifiable Information (PII) from unauthorized accessing. . PII includes any piece of data that can be used alone, or in conjunction with additional information, to uniquely identify an individual. Examples of such information include national identification numbers, credit card numbers, as well as financial and medical records. Access control methods and data encryption provide a level of data protection from unauthorized access, however, it is not enough; it does not prohibit identity thefts. It was reported that 70% of the data privacy breaches are internal breaches that involve an employee from the enterprise who has access to some training or testing database replica, which contains all the PII. In addition to access control, we need techniques to obfuscate (i.e., mask or dim) the datasets used for training, testing and analysis purposes. A good data obfuscation technique would, among other features, preserve the data usability while protecting its privacy. This challenge is further complicated when real time requirements are added. In this paper we present BronzeGate: Obfuscated GoldenGate, the GoldenGate's real-time solution for transactional data privacy while maintaining data usability. BronzeGate utilizes different obfuscation functions for different data types to securely obfuscate the data, on real-time, while maintaining its statistical characteristics

    Proxy-secure computation model: application to k-means clustering implementation, analysis and improvements

    Get PDF
    Distributed privacy preserving data mining applications, where data is divided among several parties, require high amounts of network communication. In order to overcome this overhead, we propose a scheme that reduces remote computations in distributed data mining applications into local computations on a trusted hardware. Cell BE is used to realize the trusted hardware acting as a proxy for the parties. We design a secure two-party computation protocol that can be instrumental in realizing non-colluding parties in privacy-preserving data mining applications. Each party is represented with a signed and encrypted thread on a separate core of Cell BE running in an isolated mode, whereby its execution and data are secured by hardware means. Our implementations and experiments demonstrate that a significant speed up is gained through the new scheme. It is also possible to increase the number of non-colluding parties on Cell BE, which extends the proposed technique to implement most distributed privacy-preserving data mining protocols proposed in literature that require several non-colluding parties

    On improving the performance of optimistic distributed simulations

    No full text
    This report investigates means of improving the performance of optimistic distributed simulations without affecting the simulation accuracy. We argue that existing clustering algorithms are not adequate for application in distributed simulations, and outline some characteristics of an ideal algorithm that could be applied in this field. This report is structured as follows. We start by introducing the area of distributed simulation. Following a comparison of the dominant protocols used in distributed simulation, we elaborate on the current approaches of improving the simulation performance, using computation efficient techniques, exploiting the hardware configuration of processors, optimizations that can be derived from the simulation scenario, etc. We introduce the core characteristics of clustering approaches and argue that these cannot be applied in real-life distributed simulation problems. We present a typical distributed simulation setting and elaborate on the reasons that existing clustering approaches are not expected to improve the performance of a distributed simulation. We introduce a prototype distributed simulation platform that has been developed in the scope of this research, focusing on the area of emergency response and specifically building evacuation. We continue by outlining our current work on this issue, and finally, we end this report by outlining next actions which could be made in this field

    Geo-tagging and privacy-preservation in mobile cloud computing

    Get PDF
    With the emerge of the cloud computing service and the explosive growth of the mobile devices and applications, mobile computing technologies and cloud computing technologies have been drawing significant attentions. Mobile cloud computing, with the synergy between the cloud and mobile technologies, has brought us new opportunities to develop novel and practical systems such as mobile multimedia systems and cloud systems that provide collaborative data-mining services for data from disparate owners (e.g., mobile users). However, it also creates new challenges, e.g., the algorithms deployed in the computationally weak mobile device require higher efficiency, and introduces new problems such as the privacy concern when the private data is shared in the cloud for collaborative data-mining. The main objectives of this dissertation are: 1. to develop practical systems based on the unique features of mobile devices (i.e., all-in-one computing platform and sensors) and the powerful computing capability of the cloud; 2. to propose solutions protecting the data privacy when the data from disparate owners are shared in the cloud for collaborative data-mining. We first propose a mobile geo-tagging system. It is a novel, accurate and efficient image and video based remote target localization and tracking system using the Android smartphone. To cope with the smartphones' computational limitation, we design light-weight image/video processing algorithms to achieve a good balance between estimation accuracy and computational complexity. Our system is first of its kind and we provide first hand real-world experimental results, which demonstrate that our system is feasible and practicable. To address the privacy concern when data from disparate owners are shared in the cloud for collaborative data-mining, we then propose a generic compressive sensing (CS) based secure multiparty computation (MPC) framework for privacy-preserving collaborative data-mining in which data mining is performed in the CS domain. We perform the CS transformation and reconstruction processes with MPC protocols. We modify the original orthogonal matching pursuit algorithm and develop new MPC protocols so that the CS reconstruction process can be implemented using MPC. Our analysis and experimental results show that our generic framework is capable of enabling privacy preserving collaborative data-mining. The proposed framework can be applied to many privacy preserving collaborative data-mining and signal processing applications in the cloud. We identify an application scenario that requires simultaneously performing secure watermark detection and privacy preserving multimedia data storage. We further propose a privacy preserving storage and secure watermark detection framework by adopting our generic framework to address such a requirement. In our secure watermark detection framework, the multimedia data and secret watermark pattern are presented to the cloud for secure watermark detection in a compressive sensing domain to protect the privacy. We also give mathematical and statistical analysis to derive the expected watermark detection performance in the compressive sensing domain, based on the target image, watermark pattern and the size of the compressive sensing matrix (but without the actual CS matrix), which means that the watermark detection performance in the CS domain can be estimated during the watermark embedding process. The correctness of the derived performance has been validated by our experiments. Our theoretical analysis and experimental results show that secure watermark detection in the compressive sensing domain is feasible. By taking advantage of our mobile geo-tagging system and compressive sensing based privacy preserving data-mining framework, we develop a mobile privacy preserving collaborative filtering system. In our system, mobile users can share their personal data with each other in the cloud and get daily activity recommendations based on the data-mining results generated by the cloud, without leaking the privacy and secrecy of the data to other parties. Experimental results demonstrate that the proposed system is effective in enabling efficient mobile privacy preserving collaborative filtering services.Includes bibliographical references (pages 126-133)

    Privacy-preserving data analytics in cloud computing

    Get PDF
    The evolution of digital content and rapid expansion of data sources has raised the need for streamlined monitoring, collection, storage and analysis of massive, heterogeneous data to extract useful knowledge and support decision-making mechanisms. In this context, cloud computing o↵ers extensive, cost-e↵ective and on demand computing resources that improve the quality of services for users and also help service providers (enterprises, governments and individuals). Service providers can avoid the expense of acquiring and maintaining IT resources while migrating data and remotely managing processes including aggregation, monitoring and analysis in cloud servers. However, privacy and security concerns of cloud computing services, especially in storing sensitive data (e.g. personal, healthcare and financial) are major challenges to the adoption of these services. To overcome such barriers, several privacy-preserving techniques have been developed to protect outsourced data in the cloud. Cryptography is a well-known mechanism that can ensure data confidentiality in the cloud. Traditional cryptography techniques have the ability to protect the data through encryption in cloud servers and data owners can retrieve and decrypt data for their processing purposes. However, in this case, cloud users can use the cloud resources for data storage but they cannot take full advantage of cloud-based processing services. This raises the need to develop advanced cryptosystems that can protect data privacy, both while in storage and in processing in the cloud. Homomorphic Encryption (HE) has gained attention recently because it can preserve the privacy of data while it is stored and processed in the cloud servers and data owners can retrieve and decrypt their processed data to their own secure side. Therefore, HE o↵ers an end-to-end security mechanism that is a preferable feature in cloud-based applications. In this thesis, we developed innovative privacy-preserving cloud-based models based on HE cryptosystems. This allowed us to build secure and advanced analytic models in various fields. We began by designing and implementing a secure analytic cloud-based model based on a lightweight HE cryptosystem. We used a private resident cloud entity, called ”privacy manager”, as an intermediate communication server between data owners and public cloud servers. The privacy manager handles analytical tasks that cannot be accomplished by the lightweight HE cryptosystem. This model is convenient for several application domains that require real-time responses. Data owners delegate their processing tasks to the privacy manager, which then helps to automate analysis tasks without the need to interact with data owners. We then developed a comprehensive, secure analytical model based on a Fully Homomorphic Encryption (FHE), that has more computational capability than the lightweight HE. Although FHE can automate analysis tasks and avoid the use of the privacy manager entity, it also leads to massive computational overhead. To overcome this issue, we took the advantage of the massive cloud resources by designing a MapReduce model that massively parallelises HE analytical tasks. Our parallelisation approach significantly speeds up the performance of analysis computations based on FHE. We then considered distributed analytic models where the data is generated from distributed heterogeneous sources such as healthcare and industrial sensors that are attached to people or installed in a distributed-based manner. We developed a secure distributed analytic model by re-designing several analytic algorithms (centroid-based and distribution-based clustering) to adapt them into a secure distributed-based models based on FHE. Our distributed analytic model was developed not only for distributed-based applications, but also it eliminates FHE overhead obstacle by achieving high efficiency in FHE computations. Furthermore, the distributed approach is scalable across three factors: analysis accuracy, execution time and the amount of resources used. This scalability feature enables users to consider the requirements of their analysis tasks based on these factors (e.g. users may have limited resources or time constrains to accomplish their analysis tasks). Finally, we designed and implemented two privacy-preserving real-time cloud-based applications to demonstrate the capabilities of HE cryptosystems, in terms of both efficiency and computational capabilities for applications that require timely and reliable delivery of services. First, we developed a secure cloud-based billing model for a sensor-enabled smart grid infrastructure by using lightweight HE. This model handled billing analysis tasks for individual users in a secure manner without the need to interact with any trusted parties. Second, we built a real-time secure health surveillance model for smarter health communities in the cloud. We developed a secure change detection model based on an exponential smoothing technique to predict future changes in health vital signs based on FHE. Moreover, we built an innovative technique to parallelise FHE computations which significantly reduces computational overhead

    Detecting cyberstalking from social media platform(s) using data mining analytics

    Get PDF
    Cybercrime is an increasing activity that leads to cyberstalking whilst making the use of data mining algorithms to detect or prevent cyberstalking from social media platforms imperative for this study. The aim of this study was to determine the prevalence of cyberstalking on the social media platforms using Twitter. To achieve the objective, machine learning models that perform data mining alongside the security metrics were used to detect cyberstalking from social media platforms. The derived security metrics were used to flag up any suspicious cyberstalking content. Two datasets of detailed tweets were analysed using NVivo and R Programming. The dominant occurrence of cyberstalking was assessed with the induction of fifteen unigrams identified from the preliminary dataset such as “abuse”, “annoying”, “creep or creepy”, “fear”, “follow or followers”, “gender”, “harassment”, “messaging”, “relationships p/p”, “scared”, “stalker”, “technology”, “unwanted”, “victim”, and “violent”. Ordinal regression was used to analyse the use of the fifteen unigrams which were categorised according to degree or relationship/link towards cyberstalking on the platform Twitter. Moreover, two lightweight machine learning algorithms were used for the model performance showcasing cyberstalking indicative content. K Nearest Neighbour and K Means Clustering were both coded in R computer language for the extraction, refined, analysation and visualisation process for this research. Results showed the emotional terms like “bad”, “sad” and “hate” were attached to the unigrams being linked to cyberstalking. Each emotional term was flagged up in correspondence with one of the fifteen unigrams in tweets that correlate cyberstalking indicative content, proving one must accompany the other. K Means Clustering results showed the two terms “bad” and “sad” were shown within 100 percent of the clustering results and the term “hate” was only seen within 60 percent of the results. Results also revealed that the accuracy of the KNN algorithm was up to 40% in predicting key terms-based cyberstalking content in a real Twitter dataset consisting of 1m data points. This study emphasises the continuous relationship between the fifteen unigrams, emotional terms, and tweets within numerous datasets portrayed in this research, and reveals a general picture that cyberstalking indicative content in fact happens on Twitter at a vast rate with the corresponding links or relationships within the detection of cyberstalking
    corecore