690 research outputs found
Security and Privacy Aspects in MapReduce on Clouds: A Survey
MapReduce is a programming system for distributed processing large-scale data
in an efficient and fault tolerant manner on a private, public, or hybrid
cloud. MapReduce is extensively used daily around the world as an efficient
distributed computation tool for a large class of problems, e.g., search,
clustering, log analysis, different types of join operations, matrix
multiplication, pattern matching, and analysis of social networks. Security and
privacy of data and MapReduce computations are essential concerns when a
MapReduce computation is executed in public or hybrid clouds. In order to
execute a MapReduce job in public and hybrid clouds, authentication of
mappers-reducers, confidentiality of data-computations, integrity of
data-computations, and correctness-freshness of the outputs are required.
Satisfying these requirements shield the operation from several types of
attacks on data and MapReduce computations. In this paper, we investigate and
discuss security and privacy challenges and requirements, considering a variety
of adversarial capabilities, and characteristics in the scope of MapReduce. We
also provide a review of existing security and privacy protocols for MapReduce
and discuss their overhead issues.Comment: Accepted in Elsevier Computer Science Revie
Data protection by means of fragmentation in various different distributed storage systems - a survey
This paper analyzes various distributed storage systems that use data
fragmentation and dispersal as a way of protection.Existing solutions have been
organized into two categories: bitwise and structurewise. Systems from the
bitwise category are operating on unstructured data and in a uniform
environment. Those having structured input data with predefined confidentiality
level and disposing of a heterogeneous environment in terms of machine
trustworthiness were classified as structurewise. Furthermore, we outline
high-level requirements and desirable architecture traits of an eficient data
fragmentation system, which will address performance (including latency),
availability, resilience and scalability.Comment: arXiv admin note: text overlap with arXiv:1512.0295
CloudMine: Multi-Party Privacy-Preserving Data Analytics Service
An increasing number of businesses are replacing their data storage and
computation infrastructure with cloud services. Likewise, there is an increased
emphasis on performing analytics based on multiple datasets obtained from
different data sources. While ensuring security of data and computation
outsourced to a third party cloud is in itself challenging, supporting
analytics using data distributed across multiple, independent clouds is even
further from trivial. In this paper we present CloudMine, a cloud-based service
which allows multiple data owners to perform privacy-preserved computation over
the joint data using their clouds as delegates. CloudMine protects data privacy
with respect to semi-honest data owners and semi-honest clouds. It furthermore
ensures the privacy of the computation outputs from the curious clouds. It
allows data owners to reliably detect if their cloud delegates have been lazy
when carrying out the delegated computation. CloudMine can run as a centralized
service on a single cloud, or as a distributed service over multiple,
independent clouds. CloudMine supports a set of basic computations that can be
used to construct a variety of highly complex, distributed privacy-preserving
data analytics. We demonstrate how a simple instance of CloudMine (secure sum
service) is used to implement three classical data mining tasks
(classification, association rule mining and clustering) in a cloud
environment. We experiment with a prototype of the service, the results of
which suggest its practicality for supporting privacy-preserving data analytics
as a (multi) cloud-based service
Panda: Partitioned Data Security on Outsourced Sensitive and Non-sensitive Data
Despite extensive research on cryptography, secure and efficient query
processing over outsourced data remains an open challenge. This paper continues
along with the emerging trend in secure data processing that recognizes that
the entire dataset may not be sensitive, and hence, non-sensitivity of data can
be exploited to overcome limitations of existing encryption-based approaches.
We, first, provide a new security definition, entitled partitioned data
security for guaranteeing that the joint processing of non-sensitive data (in
cleartext) and sensitive data (in encrypted form) does not lead to any leakage.
Then, this paper proposes a new secure approach, entitled query binning (QB)
that allows secure execution of queries over non-sensitive and sensitive parts
of the data. QB maps a query to a set of queries over the sensitive and
non-sensitive data in a way that no leakage will occur due to the joint
processing over sensitive and non-sensitive data. In particular, we propose
secure algorithms for selection, range, and join queries to be executed over
encrypted sensitive and cleartext non-sensitive datasets. Interestingly, in
addition to improving performance, we show that QB actually strengthens the
security of the underlying cryptographic technique by preventing size,
frequency-count, and workload-skew attacks.Comment: This version has been accepted in ACM Transactions on Management
Information Systems. The final published version of this paper may differ
from this accepted version. A preliminary version of this paper
[arXiv:1812.09233] was accepted and presented in IEEE ICDE 201
A Fast Fragmentation Algorithm For Data Protection In a Multi-Cloud Environment
Data fragmentation and dispersal over multiple clouds is a way of data
protection against honest-but-curious storage or service providers. In this
paper, we introduce a novel algorithm for data fragmentation that is
particularly well adapted to be used in a multi-cloud environment. An empirical
security analysis was performed on data sets provided by a large enterprise and
shows that the scheme achieves good data protection. A performance comparison
with published related works demonstrates it can be more than twice faster than
the fastest of the relevant fragmentation techniques, while producing
reasonable storage overhead
IBBE-SGX: Cryptographic Group Access Control using Trusted Execution Environments
While many cloud storage systems allow users to protect their data by making
use of encryption, only few support collaborative editing on that data. A major
challenge for enabling such collaboration is the need to enforce cryptographic
access control policies in a secure and efficient manner. In this paper, we
introduce IBBE-SGX, a new cryptographic access control extension that is
efficient both in terms of computation and storage even when processing large
and dynamic workloads of membership operations, while at the same time offering
zero knowledge guarantees. IBBE-SGX builds upon Identity-Based Broadcasting
Encryption (IBBE). We address IBBE's impracticality for cloud deployments by
exploiting Intel Software Guard Extensions (SGX) to derive cuts in the
computational complexity. Moreover, we propose a group partitioning mechanism
such that the computational cost of membership update is bound to a fixed
constant partition size rather than the size of the whole group. We have
implemented and evaluated our new access control extension. Results highlight
that IBBE-SGX performs membership changes 1.2 orders of magnitude faster than
the traditional approach of Hybrid Encryption (HE), producing group metadata
that are 6 orders of magnitude smaller than HE, while at the same time offering
zero knowledge guarantees
Exploiting Data Sensitivity on Partitioned Data
Several researchers have proposed solutions for secure data outsourcing on
the public clouds based on encryption, secret-sharing, and trusted hardware.
Existing approaches, however, exhibit many limitations including high
computational complexity, imperfect security, and information leakage. This
chapter describes an emerging trend in secure data processing that recognizes
that an entire dataset may not be sensitive, and hence, non-sensitivity of data
can be exploited to overcome some of the limitations of existing
encryption-based approaches. In particular, data and computation can be
partitioned into sensitive or non-sensitive datasets - sensitive data can
either be encrypted prior to outsourcing or stored/processed locally on trusted
servers. The non-sensitive dataset, on the other hand, can be outsourced and
processed in the cleartext. While partitioned computing can bring new
efficiencies since it does not incur (expensive) encrypted data processing
costs on non-sensitive data, it can lead to information leakage. We study
partitioned computing in two contexts - first, in the context of the hybrid
cloud where local resources are integrated with public cloud resources to form
an effective and secure storage and computational platform for enterprise data.
In the hybrid cloud, sensitive data is stored on the private cloud to prevent
leakage and a computation is partitioned between private and public clouds.
Care must be taken that the public cloud cannot infer any information about
sensitive data from inter-cloud data access during query processing. We then
consider partitioned computing in a public cloud only setting, where sensitive
data is encrypted before outsourcing. We formally define a partitioned security
criterion that any approach to partitioned computing on public clouds must
ensure in order to not introduce any new vulnerabilities to the existing secure
solution.Comment: This chapter will appear in the book titled: From Database to Cyber
Security: Essays Dedicated to Sushil Jajodia on the Occasion of His 70th
Birthda
Top-k Query Processing on Encrypted Databases with Strong Security Guarantees
Privacy concerns in outsourced cloud databases have become more and more
important recently and many efficient and scalable query processing methods
over encrypted data have been proposed. However, there is very limited work on
how to securely process top-k ranking queries over encrypted databases in the
cloud. In this paper, we focus exactly on this problem: secure and efficient
processing of top-k queries over outsourced databases. In particular, we
propose the first efficient and provable secure top-k query processing
construction that achieves adaptively CQA security. We develop an encrypted
data structure called EHL and describe several secure sub-protocols under our
security model to answer top-k queries. Furthermore, we optimize our query
algorithms for both space and time efficiency. Finally, in the experiments, we
empirically analyze our protocol using real world datasets and demonstrate that
our construction is efficient and practical
Secure outsourced calculations with homomorphic encryption
With the rapid development of cloud computing, the privacy security incidents
occur frequently, especially data security issues. Cloud users would like to
upload their sensitive information to cloud service providers in encrypted form
rather than the raw data, and to prevent the misuse of data. The main challenge
is to securely process or analyze these encrypted data without disclosing any
useful information, and to achieve the rights management efficiently. In this
paper, we propose the encrypted data processing protocols for cloud computing
by utilizing additively homomorphic encryption and proxy cryptography. For the
traditional homomorphic encryption schemes with many limitations, which are not
suitable for cloud computing applications. We simulate a cloud computing
scenario with flexible access control and extend the original homomorphic
cryptosystem to suit our scenario by supporting various arithmetical
calculations. We also prove the correctness and security of our protocols, and
analyze the advantages and performance by comparing with some latest works
Scalable Privacy-Preserving Distributed Learning
In this paper, we address the problem of privacy-preserving distributed
learning and the evaluation of machine-learning models by analyzing it in the
widespread MapReduce abstraction that we extend with privacy constraints. We
design SPINDLE (Scalable Privacy-preservINg Distributed LEarning), the first
distributed and privacy-preserving system that covers the complete ML workflow
by enabling the execution of a cooperative gradient-descent and the evaluation
of the obtained model and by preserving data and model confidentiality in a
passive-adversary model with up to N-1 colluding parties. SPINDLE uses
multiparty homomorphic encryption to execute parallel high-depth computations
on encrypted data without significant overhead. We instantiate SPINDLE for the
training and evaluation of generalized linear models on distributed datasets
and show that it is able to accurately (on par with non-secure
centrally-trained models) and efficiently (due to a multi-level parallelization
of the computations) train models that require a high number of iterations on
large input data with thousands of features, distributed among hundreds of data
providers. For instance, it trains a logistic-regression model on a dataset of
one million samples with 32 features distributed among 160 data providers in
less than three minutes.Comment: Published at the 21st Privacy Enhancing Technologies Symposium (PETS
2021
- …