1,191 research outputs found

    Scalable TPTDS Data Anonymization over Cloud using MapReduce

    Get PDF
    With the rapid advancement of big data digital age, large amount data is collected, mined and published. Data publishing become day today routine activity. Cloud computing is best suitable model to support big data applications. Large number of cloud service need users to share microdata like electronic health records, data containing financial transactions so that they can analyze this data. But one of the major issues in moving toward cloud is privacy threats. Data anonymization techniques are widely used to combat with privacy concerns .Anonymizing data sets using generalization to achieve k-anonymity is one of the privacy preserving techniques. Currently, the scale of data in many cloud applications is increasing massively in accordance with the Big Data tendency, thereby making it a difficult for commonly used software tools to capture, handle, manage and process such large-scale datasets. As a result it is challenge for existing approaches for achieving anonymization for large scale data sets due to their inefficiency to support scalability. This paper presents two phase top down specialization approach to anonymize large scale datasets .This approach uses MapReduce framework on cloud, so that it will be highly scalable and efficient. Here we introduce the scheduling mechanism called Optimized Balanced Scheduling to apply the Anonymization. OBS means individual dataset have the separate sensitive field. Every data set consist of sensitive field and give priority for this sensitive field. Then apply Anonymization on this sensitive field only depending upon the scheduling. DOI: 10.17762/ijritcc2321-8169.15077

    BALANCED AWARE FIREFLY OPTIMIZATION BASED COST-EFFECTIVE PRIVACY PRESERVING APPROACH OF INTERMEDIATE DATA SETS OVER CLOUD COMPUTING

    Get PDF
    Cloud computing is an embryonic archetype with remarkable impetus; however its exclusive facets intensify safety and privacy confronts. In the previous method, the privacy of intermediate data set problems is dealt with which is concentrated to regain privacy sensitive information. Alternatively the previous system contains problem with time and cost intricacy. As well it contains issue with dealing privacy conscious well-organized scheduling of intermediate data sets in cloud by considering privacy preserving. In order to surmount the above stated problems, in the existing system, enhanced balanced scheduling methodology is presented to get better the cost complexity and privacy preservation. Balanced aware FireFly Optimization (BFFO) is used for proficient privacy conscious data set scheduling. This technique is utilized to discover the resolution that carries out best on poise amongst a set of resolutions with similar execution time. Consequently the research system gives superior privacy preservation and enhanced scheduling cost more willingly than the previous method. The encryption technique is used to guarantee the security and end users decrypted the real information with improved privacy. The experimentation outcome show that the presented method confirms superior privacy, lesser cost, lesser time complexity and proficient storage metrics utilizing BFFO methodology compared to the previous Cost based Heuristic (C_HEU) algorithm

    PROVIDING A FLEXIBLE AND COSTLESS SOLUTION FOR TRANSITIONAL SETS

    Get PDF
    Within this paper, we advise a manuscript upper bound privacy leakage constraint-based method of identify which intermediate data sets have to be encoded and that do not, to ensure that privacy-protecting cost could be saved as the privacy needs of information holders can nonetheless be satisfied. To be able to curtail the general expenses by staying away from frequent computation to acquire these data sets. Such situations are very common because data customers frequently reanalyze results, conduct new analysis on intermediate data sets, or share some intermediate results with other people for collaboration. Across the processing of these programs, a sizable amount of intermediate data sets is going to be produced, and frequently stored in order to save the price of computing them. Cloud computing provides massive computation power and storage capacity which enable customers to deploy computation and knowledge-intensive programs without infrastructure investment. However, protecting the privacy of intermediate data sets turns into a challenging problem because opponents may recover privacy-sensitive information by examining multiple intermediate data sets. Evaluation results show the privacy-protecting price of intermediate data sets could be considerably reduced with this approach over existing ones where all data sets are encoded. Encrypting ALL data takes hold cloud is broadly adopted in existing methods to address this concern. But we reason that encrypting all intermediate data sets are neither efficient nor cost-effective since it is very time intensive and pricey for data-intensive programs to en/decrypt data sets frequently while carrying out any operation in it. Finally, we design an operating heuristic formula accordingly to recognize the information sets that should be encoded

    A Protection Layer over MapReduce Framework for Big Data Privacy

    Get PDF
    In many organizations, big data analytics has become a trend in gathering valuable data insights. The framework MapReduce, which is generally used for this purpose, has been accepted by most organizations for its exceptional characteristics. However, because of the availability of significant processing resources, dispersed privacy-sensitive details can be collected quickly, increasing the widespread privacy concerns.  This article reviews some of the existing research articles on the MapReduce framework's privacy issues and proposes an additional layer of privacy protection over the adopted framework. The data is split into bits and processed in the clouds, and two other steps are taken. Hadoop splits the file into bits of a smaller scale. The task tracker then allocates these bits to several mappers. First, the data is split up into key-value pairs, and the intermediate data sets are generated.  The efficiency of the suggested approach may then be effectively interpreted. Overall, the proposed method provides improved scalability. The following figures compare execution time with relation to file size and the number of partitions. As privacy protection technique is used, the loss of data content can be appropriately handled.  It has been demonstrated that MRPL outperforms current methods in terms of CPU optimization, memory usage, and reduced information loss.  Research reveals that the suggested strategy creates significant advantages for Big Data by enhancing privacy and protection. MRPL can considerably solve the privacy issues in Big Data

    Toward efficient and secure public auditing for dynamic big data storage on cloud

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Cloud and Big Data are two of the most attractive ICT research topics that have emerged in recent years. Requirements of big data processing are now everywhere, while the pay-as-you-go model of cloud systems is especially cost efficient in terms of processing big data applications. However, there are still concerns that hinder the proliferation of cloud, and data security/privacy is a top concern for data owners wishing to migrate their applications into the cloud environment. Compared to users of conventional systems, cloud users need to surrender the local control of their data to cloud servers. Another challenge for big data is the data dynamism which exists in most big data applications. Due to the frequent updates, efficiency becomes a major issue in data management. As security always brings compromises in efficiency, it is difficult but nonetheless important to investigate how to efficiently address security challenges over dynamic cloud data. Data integrity is an essential aspect of data security. Except for server-side integrity protection mechanisms, verification from a third-party auditor is of equal importance because this enables users to verify the integrity of their data through the auditors at any user-chosen timeslot. This type of verification is also named 'public auditing' of data. Existing public auditing schemes allow the integrity of a dataset stored in cloud to be externally verified without retrieval of the whole original dataset. However, in practice, there are many challenges that hinder the application of such schemes. To name a few of these, first, the server still has to aggregate a proof with the cloud controller from data blocks that are distributedly stored and processed on cloud instances and this means that encryption and transfer of these data within the cloud will become time-consuming. Second, security flaws exist in the current designs. The verification processes are insecure against various attacks and this leads to concerns about deploying these schemes in practice. Third, when the dataset is large, auditing of dynamic data becomes costly in terms of communication and storage. This is especially the case for a large number of small data updates and data updates on multi-replica cloud data storage. In this thesis, the research problem of dynamic public data auditing in cloud is systematically investigated. After analysing the research problems, we systematically address the problems regarding secure and efficient public auditing of dynamic big data in cloud by developing, testing and publishing a series of security schemes and algorithms for secure and efficient public auditing of dynamic big data storage on cloud. Specifically, our work focuses on the following aspects: cloud internal authenticated key exchange, authorisation on third-party auditor, fine-grained update support, index verification, and efficient multi-replica public auditing of dynamic data. To the best of our knowledge, this thesis presents the first series of work to systematically analysis and to address this research problem. Experimental results and analyses show that the solutions that are presented in this thesis are suitable for auditing dynamic big data storage on cloud. Furthermore, our solutions represent significant improvements in cloud efficiency and security

    Privacy and security in cyber-physical systems

    Get PDF
    Data privacy has attracted increasing attention in the past decade due to the emerging technologies that require our data to provide utility. Service providers (SPs) encourage users to share their personal data in return for a better user experience. However, users' raw data usually contains implicit sensitive information that can be inferred by a third party. This raises great concern about users' privacy. In this dissertation, we develop novel techniques to achieve a better privacy-utility trade-off (PUT) in various applications. We first consider smart meter (SM) privacy and employ physical resources to minimize the information leakage to the SP through SM readings. We measure privacy using information-theoretic metrics and find private data release policies (PDRPs) by formulating the problem as a Markov decision process (MDP). We also propose noise injection techniques for time-series data privacy. We characterize optimal PDRPs measuring privacy via mutual information (MI) and utility loss via added distortion. Reformulating the problem as an MDP, we solve it using deep reinforcement learning (DRL) for real location trace data. We also consider a scenario for hiding an underlying ``sensitive'' variable and revealing a ``useful'' variable for utility by periodically selecting from among sensors to share the measurements with an SP. We formulate this as an optimal stopping problem and solve using DRL. We then consider privacy-aware communication over a wiretap channel. We maximize the information delivered to the legitimate receiver, while minimizing the information leakage from the sensitive attribute to the eavesdropper. We propose using a variational-autoencoder (VAE) and validate our approach with colored and annotated MNIST dataset. Finally, we consider defenses against active adversaries in the context of security-critical applications. We propose an adversarial example (AE) generation method exploiting the data distribution. We perform adversarial training using the proposed AEs and evaluate the performance against real-world adversarial attacks.Open Acces

    Distributed Query Execution With Strong Privacy Guarantees

    Get PDF
    As the Internet evolves, we find more applications that involve data originating from multiple sources, and spanning machines located all over the world. Such wide distribution of sensitive data increases the risk of information leakage, and may sometimes inhibit useful applications. For instance, even though banks could share data to detect systemic threats in the US financial network, they hesitate to do so because it can leak business secrets to their competitors. Encryption is an effective way to preserve data confidentiality, but eliminates all processing capabilities. Some approaches enable processing on encrypted data, but they usually have security weaknesses, such as data leakage through side-channels, or require expensive cryptographic computations. In this thesis, we present techniques that address the above limitations. First, we present an efficient symmetric homomorphic encryption scheme, which can aggregate encrypted data at an unprecedented scale. Second, we present a way to efficiently perform secure computations on distributed graphs. To accomplish this, we express large computations as a series of small, parallelizable vertex programs, whose state is safely transferred between vertices using a new cryptographic protocol. Finally, we propose using differential privacy to strengthen the security of trusted processors: noise is added to the side-channels, so that no adversary can extract useful information about individual users. Our experimental results suggest that the presented techniques achieve order-of-magnitude performance improvements over previous approaches, in scenarios such as the business intelligence application of a large corporation and the detection of systemic threats in the US financial network
    • …
    corecore