881 research outputs found

    When private set intersection meets big data : an efficient and scalable protocol

    Get PDF
    Large scale data processing brings new challenges to the design of privacy-preserving protocols: how to meet the increasing requirements of speed and throughput of modern applications, and how to scale up smoothly when data being protected is big. Efficiency and scalability become critical criteria for privacy preserving protocols in the age of Big Data. In this paper, we present a new Private Set Intersection (PSI) protocol that is extremely efficient and highly scalable compared with existing protocols. The protocol is based on a novel approach that we call oblivious Bloom intersection. It has linear complexity and relies mostly on efficient symmetric key operations. It has high scalability due to the fact that most operations can be parallelized easily. The protocol has two versions: a basic protocol and an enhanced protocol, the security of the two variants is analyzed and proved in the semi-honest model and the malicious model respectively. A prototype of the basic protocol has been built. We report the result of performance evaluation and compare it against the two previously fastest PSI protocols. Our protocol is orders of magnitude faster than these two protocols. To compute the intersection of two million-element sets, our protocol needs only 41 seconds (80-bit security) and 339 seconds (256-bit security) on moderate hardware in parallel mode

    Big Data Privacy Context: Literature Effects On Secure Informational Assets

    Get PDF
    This article's objective is the identification of research opportunities in the current big data privacy domain, evaluating literature effects on secure informational assets. Until now, no study has analyzed such relation. Its results can foster science, technologies and businesses. To achieve these objectives, a big data privacy Systematic Literature Review (SLR) is performed on the main scientific peer reviewed journals in Scopus database. Bibliometrics and text mining analysis complement the SLR. This study provides support to big data privacy researchers on: most and least researched themes, research novelty, most cited works and authors, themes evolution through time and many others. In addition, TOPSIS and VIKOR ranks were developed to evaluate literature effects versus informational assets indicators. Secure Internet Servers (SIS) was chosen as decision criteria. Results show that big data privacy literature is strongly focused on computational aspects. However, individuals, societies, organizations and governments face a technological change that has just started to be investigated, with growing concerns on law and regulation aspects. TOPSIS and VIKOR Ranks differed in several positions and the only consistent country between literature and SIS adoption is the United States. Countries in the lowest ranking positions represent future research opportunities.Comment: 21 pages, 9 figure

    Analysis of outsourcing data to the cloud using autonomous key generation

    Get PDF
    Cloud computing, a technology that enables users to store and manage their data at a low cost and high availability, has been emerging for the past few decades because of the many services it provides. One of the many services cloud computing provides to its users is data storage. The majority of the users of this service are still concerned to outsource their data due to the integrity and confidentiality issues, as well as performance and cost issues, that come along with it. These issues make it necessary to encrypt data prior to outsourcing it to the cloud. However, encrypting data prior to outsourcing makes searching the data obsolete, lowering the functionality of the cloud. Most existing cloud storage schemes often prioritize security over performance and functionality, or vice versa. In this thesis, the cloud storage service is explored, and the aspects of security, performance, and functionality are analyzed in order to investigate the trade-offs of the service. DSB-SEIS, a scheme with encryption intensity selection, an autonomous key generation algorithm that allows users to control the encryption intensity of their files, as well as other features is developed in order to find a balance between performance, security, and functionality. The features that DSB-SEIS contains are deduplication, assured deletion, and searchable encryption. The effect of encryption intensity selection on encryption, decryption, and key generation is explored, and the performance and security of DSB-SEIS are evaluated. The MapReduce framework is also used to investigate the DSB-SEIS algorithm performance with big data. Analysis demonstrates that the encryption intensity selection algorithm generates a manageable number of encryption keys based on the confidentiality of data while not adding significant overhead on encryption or decryption --Abstract, page iii

    Efficient Multi-way Theta-Join Processing Using MapReduce

    Full text link
    Multi-way Theta-join queries are powerful in describing complex relations and therefore widely employed in real practices. However, existing solutions from traditional distributed and parallel databases for multi-way Theta-join queries cannot be easily extended to fit a shared-nothing distributed computing paradigm, which is proven to be able to support OLAP applications over immense data volumes. In this work, we study the problem of efficient processing of multi-way Theta-join queries using MapReduce from a cost-effective perspective. Although there have been some works using the (key,value) pair-based programming model to support join operations, efficient processing of multi-way Theta-join queries has never been fully explored. The substantial challenge lies in, given a number of processing units (that can run Map or Reduce tasks), mapping a multi-way Theta-join query to a number of MapReduce jobs and having them executed in a well scheduled sequence, such that the total processing time span is minimized. Our solution mainly includes two parts: 1) cost metrics for both single MapReduce job and a number of MapReduce jobs executed in a certain order; 2) the efficient execution of a chain-typed Theta-join with only one MapReduce job. Comparing with the query evaluation strategy proposed in [23] and the widely adopted Pig Latin and Hive SQL solutions, our method achieves significant improvement of the join processing efficiency.Comment: VLDB201

    SensorCloud: Towards the Interdisciplinary Development of a Trustworthy Platform for Globally Interconnected Sensors and Actuators

    Get PDF
    Although Cloud Computing promises to lower IT costs and increase users' productivity in everyday life, the unattractive aspect of this new technology is that the user no longer owns all the devices which process personal data. To lower scepticism, the project SensorCloud investigates techniques to understand and compensate these adoption barriers in a scenario consisting of cloud applications that utilize sensors and actuators placed in private places. This work provides an interdisciplinary overview of the social and technical core research challenges for the trustworthy integration of sensor and actuator devices with the Cloud Computing paradigm. Most importantly, these challenges include i) ease of development, ii) security and privacy, and iii) social dimensions of a cloud-based system which integrates into private life. When these challenges are tackled in the development of future cloud systems, the attractiveness of new use cases in a sensor-enabled world will considerably be increased for users who currently do not trust the Cloud.Comment: 14 pages, 3 figures, published as technical report of the Department of Computer Science of RWTH Aachen Universit

    Misusability Measure Based Sanitization of Big Data for Privacy Preserving MapReduce Programming

    Get PDF
    Leakage and misuse of sensitive data is a challenging problem to enterprises. It has become more serious problem with the advent of cloud and big data. The rationale behind this is the increase in outsourcing of data to public cloud and publishing data for wider visibility. Therefore Privacy Preserving Data Publishing (PPDP), Privacy Preserving Data Mining (PPDM) and Privacy Preserving Distributed Data Mining (PPDM) are crucial in the contemporary era. PPDP and PPDM can protect privacy at data and process levels respectively. Therefore, with big data privacy to data became indispensable due to the fact that data is stored and processed in semi-trusted environment. In this paper we proposed a comprehensive methodology for effective sanitization of data based on misusability measure for preserving privacy to get rid of data leakage and misuse. We followed a hybrid approach that caters to the needs of privacy preserving MapReduce programming. We proposed an algorithm known as Misusability Measure-Based Privacy serving Algorithm (MMPP) which considers level of misusability prior to choosing and application of appropriate sanitization on big data. Our empirical study with Amazon EC2 and EMR revealed that the proposed methodology is useful in realizing privacy preserving Map Reduce programming
    • ā€¦
    corecore