690 research outputs found

    Security and Privacy Aspects in MapReduce on Clouds: A Survey

    Full text link
    MapReduce is a programming system for distributed processing large-scale data in an efficient and fault tolerant manner on a private, public, or hybrid cloud. MapReduce is extensively used daily around the world as an efficient distributed computation tool for a large class of problems, e.g., search, clustering, log analysis, different types of join operations, matrix multiplication, pattern matching, and analysis of social networks. Security and privacy of data and MapReduce computations are essential concerns when a MapReduce computation is executed in public or hybrid clouds. In order to execute a MapReduce job in public and hybrid clouds, authentication of mappers-reducers, confidentiality of data-computations, integrity of data-computations, and correctness-freshness of the outputs are required. Satisfying these requirements shield the operation from several types of attacks on data and MapReduce computations. In this paper, we investigate and discuss security and privacy challenges and requirements, considering a variety of adversarial capabilities, and characteristics in the scope of MapReduce. We also provide a review of existing security and privacy protocols for MapReduce and discuss their overhead issues.Comment: Accepted in Elsevier Computer Science Revie

    Data protection by means of fragmentation in various different distributed storage systems - a survey

    Full text link
    This paper analyzes various distributed storage systems that use data fragmentation and dispersal as a way of protection.Existing solutions have been organized into two categories: bitwise and structurewise. Systems from the bitwise category are operating on unstructured data and in a uniform environment. Those having structured input data with predefined confidentiality level and disposing of a heterogeneous environment in terms of machine trustworthiness were classified as structurewise. Furthermore, we outline high-level requirements and desirable architecture traits of an eficient data fragmentation system, which will address performance (including latency), availability, resilience and scalability.Comment: arXiv admin note: text overlap with arXiv:1512.0295

    CloudMine: Multi-Party Privacy-Preserving Data Analytics Service

    Full text link
    An increasing number of businesses are replacing their data storage and computation infrastructure with cloud services. Likewise, there is an increased emphasis on performing analytics based on multiple datasets obtained from different data sources. While ensuring security of data and computation outsourced to a third party cloud is in itself challenging, supporting analytics using data distributed across multiple, independent clouds is even further from trivial. In this paper we present CloudMine, a cloud-based service which allows multiple data owners to perform privacy-preserved computation over the joint data using their clouds as delegates. CloudMine protects data privacy with respect to semi-honest data owners and semi-honest clouds. It furthermore ensures the privacy of the computation outputs from the curious clouds. It allows data owners to reliably detect if their cloud delegates have been lazy when carrying out the delegated computation. CloudMine can run as a centralized service on a single cloud, or as a distributed service over multiple, independent clouds. CloudMine supports a set of basic computations that can be used to construct a variety of highly complex, distributed privacy-preserving data analytics. We demonstrate how a simple instance of CloudMine (secure sum service) is used to implement three classical data mining tasks (classification, association rule mining and clustering) in a cloud environment. We experiment with a prototype of the service, the results of which suggest its practicality for supporting privacy-preserving data analytics as a (multi) cloud-based service

    Panda: Partitioned Data Security on Outsourced Sensitive and Non-sensitive Data

    Full text link
    Despite extensive research on cryptography, secure and efficient query processing over outsourced data remains an open challenge. This paper continues along with the emerging trend in secure data processing that recognizes that the entire dataset may not be sensitive, and hence, non-sensitivity of data can be exploited to overcome limitations of existing encryption-based approaches. We, first, provide a new security definition, entitled partitioned data security for guaranteeing that the joint processing of non-sensitive data (in cleartext) and sensitive data (in encrypted form) does not lead to any leakage. Then, this paper proposes a new secure approach, entitled query binning (QB) that allows secure execution of queries over non-sensitive and sensitive parts of the data. QB maps a query to a set of queries over the sensitive and non-sensitive data in a way that no leakage will occur due to the joint processing over sensitive and non-sensitive data. In particular, we propose secure algorithms for selection, range, and join queries to be executed over encrypted sensitive and cleartext non-sensitive datasets. Interestingly, in addition to improving performance, we show that QB actually strengthens the security of the underlying cryptographic technique by preventing size, frequency-count, and workload-skew attacks.Comment: This version has been accepted in ACM Transactions on Management Information Systems. The final published version of this paper may differ from this accepted version. A preliminary version of this paper [arXiv:1812.09233] was accepted and presented in IEEE ICDE 201

    A Fast Fragmentation Algorithm For Data Protection In a Multi-Cloud Environment

    Full text link
    Data fragmentation and dispersal over multiple clouds is a way of data protection against honest-but-curious storage or service providers. In this paper, we introduce a novel algorithm for data fragmentation that is particularly well adapted to be used in a multi-cloud environment. An empirical security analysis was performed on data sets provided by a large enterprise and shows that the scheme achieves good data protection. A performance comparison with published related works demonstrates it can be more than twice faster than the fastest of the relevant fragmentation techniques, while producing reasonable storage overhead

    IBBE-SGX: Cryptographic Group Access Control using Trusted Execution Environments

    Full text link
    While many cloud storage systems allow users to protect their data by making use of encryption, only few support collaborative editing on that data. A major challenge for enabling such collaboration is the need to enforce cryptographic access control policies in a secure and efficient manner. In this paper, we introduce IBBE-SGX, a new cryptographic access control extension that is efficient both in terms of computation and storage even when processing large and dynamic workloads of membership operations, while at the same time offering zero knowledge guarantees. IBBE-SGX builds upon Identity-Based Broadcasting Encryption (IBBE). We address IBBE's impracticality for cloud deployments by exploiting Intel Software Guard Extensions (SGX) to derive cuts in the computational complexity. Moreover, we propose a group partitioning mechanism such that the computational cost of membership update is bound to a fixed constant partition size rather than the size of the whole group. We have implemented and evaluated our new access control extension. Results highlight that IBBE-SGX performs membership changes 1.2 orders of magnitude faster than the traditional approach of Hybrid Encryption (HE), producing group metadata that are 6 orders of magnitude smaller than HE, while at the same time offering zero knowledge guarantees

    Exploiting Data Sensitivity on Partitioned Data

    Full text link
    Several researchers have proposed solutions for secure data outsourcing on the public clouds based on encryption, secret-sharing, and trusted hardware. Existing approaches, however, exhibit many limitations including high computational complexity, imperfect security, and information leakage. This chapter describes an emerging trend in secure data processing that recognizes that an entire dataset may not be sensitive, and hence, non-sensitivity of data can be exploited to overcome some of the limitations of existing encryption-based approaches. In particular, data and computation can be partitioned into sensitive or non-sensitive datasets - sensitive data can either be encrypted prior to outsourcing or stored/processed locally on trusted servers. The non-sensitive dataset, on the other hand, can be outsourced and processed in the cleartext. While partitioned computing can bring new efficiencies since it does not incur (expensive) encrypted data processing costs on non-sensitive data, it can lead to information leakage. We study partitioned computing in two contexts - first, in the context of the hybrid cloud where local resources are integrated with public cloud resources to form an effective and secure storage and computational platform for enterprise data. In the hybrid cloud, sensitive data is stored on the private cloud to prevent leakage and a computation is partitioned between private and public clouds. Care must be taken that the public cloud cannot infer any information about sensitive data from inter-cloud data access during query processing. We then consider partitioned computing in a public cloud only setting, where sensitive data is encrypted before outsourcing. We formally define a partitioned security criterion that any approach to partitioned computing on public clouds must ensure in order to not introduce any new vulnerabilities to the existing secure solution.Comment: This chapter will appear in the book titled: From Database to Cyber Security: Essays Dedicated to Sushil Jajodia on the Occasion of His 70th Birthda

    Top-k Query Processing on Encrypted Databases with Strong Security Guarantees

    Full text link
    Privacy concerns in outsourced cloud databases have become more and more important recently and many efficient and scalable query processing methods over encrypted data have been proposed. However, there is very limited work on how to securely process top-k ranking queries over encrypted databases in the cloud. In this paper, we focus exactly on this problem: secure and efficient processing of top-k queries over outsourced databases. In particular, we propose the first efficient and provable secure top-k query processing construction that achieves adaptively CQA security. We develop an encrypted data structure called EHL and describe several secure sub-protocols under our security model to answer top-k queries. Furthermore, we optimize our query algorithms for both space and time efficiency. Finally, in the experiments, we empirically analyze our protocol using real world datasets and demonstrate that our construction is efficient and practical

    Secure outsourced calculations with homomorphic encryption

    Full text link
    With the rapid development of cloud computing, the privacy security incidents occur frequently, especially data security issues. Cloud users would like to upload their sensitive information to cloud service providers in encrypted form rather than the raw data, and to prevent the misuse of data. The main challenge is to securely process or analyze these encrypted data without disclosing any useful information, and to achieve the rights management efficiently. In this paper, we propose the encrypted data processing protocols for cloud computing by utilizing additively homomorphic encryption and proxy cryptography. For the traditional homomorphic encryption schemes with many limitations, which are not suitable for cloud computing applications. We simulate a cloud computing scenario with flexible access control and extend the original homomorphic cryptosystem to suit our scenario by supporting various arithmetical calculations. We also prove the correctness and security of our protocols, and analyze the advantages and performance by comparing with some latest works

    Scalable Privacy-Preserving Distributed Learning

    Full text link
    In this paper, we address the problem of privacy-preserving distributed learning and the evaluation of machine-learning models by analyzing it in the widespread MapReduce abstraction that we extend with privacy constraints. We design SPINDLE (Scalable Privacy-preservINg Distributed LEarning), the first distributed and privacy-preserving system that covers the complete ML workflow by enabling the execution of a cooperative gradient-descent and the evaluation of the obtained model and by preserving data and model confidentiality in a passive-adversary model with up to N-1 colluding parties. SPINDLE uses multiparty homomorphic encryption to execute parallel high-depth computations on encrypted data without significant overhead. We instantiate SPINDLE for the training and evaluation of generalized linear models on distributed datasets and show that it is able to accurately (on par with non-secure centrally-trained models) and efficiently (due to a multi-level parallelization of the computations) train models that require a high number of iterations on large input data with thousands of features, distributed among hundreds of data providers. For instance, it trains a logistic-regression model on a dataset of one million samples with 32 features distributed among 160 data providers in less than three minutes.Comment: Published at the 21st Privacy Enhancing Technologies Symposium (PETS 2021
    • …
    corecore