8,833 research outputs found
CloudMine: Multi-Party Privacy-Preserving Data Analytics Service
An increasing number of businesses are replacing their data storage and
computation infrastructure with cloud services. Likewise, there is an increased
emphasis on performing analytics based on multiple datasets obtained from
different data sources. While ensuring security of data and computation
outsourced to a third party cloud is in itself challenging, supporting
analytics using data distributed across multiple, independent clouds is even
further from trivial. In this paper we present CloudMine, a cloud-based service
which allows multiple data owners to perform privacy-preserved computation over
the joint data using their clouds as delegates. CloudMine protects data privacy
with respect to semi-honest data owners and semi-honest clouds. It furthermore
ensures the privacy of the computation outputs from the curious clouds. It
allows data owners to reliably detect if their cloud delegates have been lazy
when carrying out the delegated computation. CloudMine can run as a centralized
service on a single cloud, or as a distributed service over multiple,
independent clouds. CloudMine supports a set of basic computations that can be
used to construct a variety of highly complex, distributed privacy-preserving
data analytics. We demonstrate how a simple instance of CloudMine (secure sum
service) is used to implement three classical data mining tasks
(classification, association rule mining and clustering) in a cloud
environment. We experiment with a prototype of the service, the results of
which suggest its practicality for supporting privacy-preserving data analytics
as a (multi) cloud-based service
Privacy Preserving Association Rule Mining Revisited
The privacy preserving data mining (PPDM) has been one of the most
interesting, yet challenging, research issues. In the PPDM, we seek to
outsource our data for data mining tasks to a third party while maintaining its
privacy. In this paper, we revise one of the recent PPDM schemes (i.e., FS)
which is designed for privacy preserving association rule mining (PP-ARM). Our
analysis shows some limitations of the FS scheme in term of its storage
requirements guaranteeing a reasonable privacy standard and the high
computation as well. On the other hand, we introduce a robust definition of
privacy that considers the average case privacy and motivates the study of a
weakness in the structure of FS (i.e., fake transactions filtering). In order
to overcome this limit, we introduce a hybrid scheme that considers both
privacy and resources guidelines. Experimental results show the efficiency of
our proposed scheme over the previously introduced one and opens directions for
further development.Comment: 15 pages, to appear in proceeding of WISA 200
Parallel and Distributed Collaborative Filtering: A Survey
Collaborative filtering is amongst the most preferred techniques when
implementing recommender systems. Recently, great interest has turned towards
parallel and distributed implementations of collaborative filtering algorithms.
This work is a survey of the parallel and distributed collaborative filtering
implementations, aiming not only to provide a comprehensive presentation of the
field's development, but also to offer future research orientation by
highlighting the issues that need to be further developed.Comment: 46 page
A Survey on the Security of Pervasive Online Social Networks (POSNs)
Pervasive Online Social Networks (POSNs) are the extensions of Online Social
Networks (OSNs) which facilitate connectivity irrespective of the domain and
properties of users. POSNs have been accumulated with the convergence of a
plethora of social networking platforms with a motivation of bridging their
gap. Over the last decade, OSNs have visually perceived an altogether
tremendous amount of advancement in terms of the number of users as well as
technology enablers. A single OSN is the property of an organization, which
ascertains smooth functioning of its accommodations for providing a quality
experience to their users. However, with POSNs, multiple OSNs have coalesced
through communities, circles, or only properties, which make
service-provisioning tedious and arduous to sustain. Especially, challenges
become rigorous when the focus is on the security perspective of cross-platform
OSNs, which are an integral part of POSNs. Thus, it is of utmost paramountcy to
highlight such a requirement and understand the current situation while
discussing the available state-of-the-art. With the modernization of OSNs and
convergence towards POSNs, it is compulsory to understand the impact and reach
of current solutions for enhancing the security of users as well as associated
services. This survey understands this requisite and fixates on different sets
of studies presented over the last few years and surveys them for their
applicability to POSNs...Comment: 39 Pages, 10 Figure
A Survey of Parallel Sequential Pattern Mining
With the growing popularity of shared resources, large volumes of complex
data of different types are collected automatically. Traditional data mining
algorithms generally have problems and challenges including huge memory cost,
low processing speed, and inadequate hard disk space. As a fundamental task of
data mining, sequential pattern mining (SPM) is used in a wide variety of
real-life applications. However, it is more complex and challenging than other
pattern mining tasks, i.e., frequent itemset mining and association rule
mining, and also suffers from the above challenges when handling the
large-scale data. To solve these problems, mining sequential patterns in a
parallel or distributed computing environment has emerged as an important issue
with many applications. In this paper, an in-depth survey of the current status
of parallel sequential pattern mining (PSPM) is investigated and provided,
including detailed categorization of traditional serial SPM approaches, and
state of the art parallel SPM. We review the related work of parallel
sequential pattern mining in detail, including partition-based algorithms for
PSPM, Apriori-based PSPM, pattern growth based PSPM, and hybrid algorithms for
PSPM, and provide deep description (i.e., characteristics, advantages,
disadvantages and summarization) of these parallel approaches of PSPM. Some
advanced topics for PSPM, including parallel quantitative / weighted / utility
sequential pattern mining, PSPM from uncertain data and stream data, hardware
acceleration for PSPM, are further reviewed in details. Besides, we review and
provide some well-known open-source software of PSPM. Finally, we summarize
some challenges and opportunities of PSPM in the big data era.Comment: Accepted by ACM Trans. on Knowl. Discov. Data, 33 page
Federated Machine Learning: Concept and Applications
Today's AI still faces two major challenges. One is that in most industries,
data exists in the form of isolated islands. The other is the strengthening of
data privacy and security. We propose a possible solution to these challenges:
secure federated learning. Beyond the federated learning framework first
proposed by Google in 2016, we introduce a comprehensive secure federated
learning framework, which includes horizontal federated learning, vertical
federated learning and federated transfer learning. We provide definitions,
architectures and applications for the federated learning framework, and
provide a comprehensive survey of existing works on this subject. In addition,
we propose building data networks among organizations based on federated
mechanisms as an effective solution to allow knowledge to be shared without
compromising user privacy
Privacy Preserving Utility Mining: A Survey
In big data era, the collected data usually contains rich information and
hidden knowledge. Utility-oriented pattern mining and analytics have shown a
powerful ability to explore these ubiquitous data, which may be collected from
various fields and applications, such as market basket analysis, retail,
click-stream analysis, medical analysis, and bioinformatics. However, analysis
of these data with sensitive private information raises privacy concerns. To
achieve better trade-off between utility maximizing and privacy preserving,
Privacy-Preserving Utility Mining (PPUM) has become a critical issue in recent
years. In this paper, we provide a comprehensive overview of PPUM. We first
present the background of utility mining, privacy-preserving data mining and
PPUM, then introduce the related preliminaries and problem formulation of PPUM,
as well as some key evaluation criteria for PPUM. In particular, we present and
discuss the current state-of-the-art PPUM algorithms, as well as their
advantages and deficiencies in detail. Finally, we highlight and discuss some
technical challenges and open directions for future research on PPUM.Comment: 2018 IEEE International Conference on Big Data, 10 page
A Survey on Geographically Distributed Big-Data Processing using MapReduce
Hadoop and Spark are widely used distributed processing frameworks for
large-scale data processing in an efficient and fault-tolerant manner on
private or public clouds. These big-data processing systems are extensively
used by many industries, e.g., Google, Facebook, and Amazon, for solving a
large class of problems, e.g., search, clustering, log analysis, different
types of join operations, matrix multiplication, pattern matching, and social
network analysis. However, all these popular systems have a major drawback in
terms of locally distributed computations, which prevent them in implementing
geographically distributed data processing. The increasing amount of
geographically distributed massive data is pushing industries and academia to
rethink the current big-data processing systems. The novel frameworks, which
will be beyond state-of-the-art architectures and technologies involved in the
current system, are expected to process geographically distributed data at
their locations without moving entire raw datasets to a single location. In
this paper, we investigate and discuss challenges and requirements in designing
geographically distributed data processing frameworks and protocols. We
classify and study batch processing (MapReduce-based systems), stream
processing (Spark-based systems), and SQL-style processing geo-distributed
frameworks, models, and algorithms with their overhead issues.Comment: IEEE Transactions on Big Data; Accepted June 2017. 20 page
Algorithm and approaches to handle large Data- A Survey
Data mining environment produces a large amount of data, that need to be
analyzed, patterns have to be extracted from that to gain knowledge. In this
new era with boom of data both structured and unstructured, in the field of
genomics, meteorology, biology, environmental research and many others, it has
become difficult to process, manage and analyze patterns using traditional
databases and architectures. So, a proper architecture should be understood to
gain knowledge about the Big Data. This paper presents a review of various
algorithms from 1994-2013 necessary for handling such large data set. These
algorithms define various structures and methods implemented to handle Big
Data, also in the paper are listed various tool that were developed for
analyzing them.Comment: 5 page
Privacy in Social Media: Identification, Mitigation and Applications
The increasing popularity of social media has attracted a huge number of
people to participate in numerous activities on a daily basis. This results in
tremendous amounts of rich user-generated data. This data provides
opportunities for researchers and service providers to study and better
understand users' behaviors and further improve the quality of the personalized
services. Publishing user-generated data risks exposing individuals' privacy.
Users privacy in social media is an emerging task and has attracted increasing
attention in recent years. These works study privacy issues in social media
from the two different points of views: identification of vulnerabilities, and
mitigation of privacy risks. Recent research has shown the vulnerability of
user-generated data against the two general types of attacks, identity
disclosure and attribute disclosure. These privacy issues mandate social media
data publishers to protect users' privacy by sanitizing user-generated data
before publishing it. Consequently, various protection techniques have been
proposed to anonymize user-generated social media data. There is a vast
literature on privacy of users in social media from many perspectives. In this
survey, we review the key achievements of user privacy in social media. In
particular, we review and compare the state-of-the-art algorithms in terms of
the privacy leakage attacks and anonymization algorithms. We overview the
privacy risks from different aspects of social media and categorize the
relevant works into five groups 1) graph data anonymization and
de-anonymization, 2) author identification, 3) profile attribute disclosure, 4)
user location and privacy, and 5) recommender systems and privacy issues. We
also discuss open problems and future research directions for user privacy
issues in social media.Comment: This survey is currently under revie
- …