Search CORE

8,833 research outputs found

CloudMine: Multi-Party Privacy-Preserving Data Analytics Service

Author: Anh Dinh Tien Tuan
Datta Anwitaman
Thanh Quach Vinh
Publication venue
Publication date: 01/10/2013
Field of study

An increasing number of businesses are replacing their data storage and computation infrastructure with cloud services. Likewise, there is an increased emphasis on performing analytics based on multiple datasets obtained from different data sources. While ensuring security of data and computation outsourced to a third party cloud is in itself challenging, supporting analytics using data distributed across multiple, independent clouds is even further from trivial. In this paper we present CloudMine, a cloud-based service which allows multiple data owners to perform privacy-preserved computation over the joint data using their clouds as delegates. CloudMine protects data privacy with respect to semi-honest data owners and semi-honest clouds. It furthermore ensures the privacy of the computation outputs from the curious clouds. It allows data owners to reliably detect if their cloud delegates have been lazy when carrying out the delegated computation. CloudMine can run as a centralized service on a single cloud, or as a distributed service over multiple, independent clouds. CloudMine supports a set of basic computations that can be used to construct a variety of highly complex, distributed privacy-preserving data analytics. We demonstrate how a simple instance of CloudMine (secure sum service) is used to implement three classical data mining tasks (classification, association rule mining and clustering) in a cloud environment. We experiment with a prototype of the service, the results of which suggest its practicality for supporting privacy-preserving data analytics as a (multi) cloud-based service

arXiv.org e-Print Archive

Privacy Preserving Association Rule Mining Revisited

Author: Hong Dowon
Mohaisen Abedelaziz
Publication venue
Publication date: 23/08/2008
Field of study

The privacy preserving data mining (PPDM) has been one of the most interesting, yet challenging, research issues. In the PPDM, we seek to outsource our data for data mining tasks to a third party while maintaining its privacy. In this paper, we revise one of the recent PPDM schemes (i.e., FS) which is designed for privacy preserving association rule mining (PP-ARM). Our analysis shows some limitations of the FS scheme in term of its storage requirements guaranteeing a reasonable privacy standard and the high computation as well. On the other hand, we introduce a robust definition of privacy that considers the average case privacy and motivates the study of a weakness in the structure of FS (i.e., fake transactions filtering). In order to overcome this limit, we introduce a hybrid scheme that considers both privacy and resources guidelines. Experimental results show the efficiency of our proposed scheme over the previously introduced one and opens directions for further development.Comment: 15 pages, to appear in proceeding of WISA 200

arXiv.org e-Print Archive

Parallel and Distributed Collaborative Filtering: A Survey

Author: Karydi Efthalia
Margaritis Konstantinos G.
Publication venue
Publication date: 09/09/2014
Field of study

Collaborative filtering is amongst the most preferred techniques when implementing recommender systems. Recently, great interest has turned towards parallel and distributed implementations of collaborative filtering algorithms. This work is a survey of the parallel and distributed collaborative filtering implementations, aiming not only to provide a comprehensive presentation of the field's development, but also to offer future research orientation by highlighting the issues that need to be further developed.Comment: 46 page

arXiv.org e-Print Archive

A Survey on the Security of Pervasive Online Social Networks (POSNs)

Author: Choudhary Gaurav
Gupta Takshi
Sharma Vishal
Publication venue
Publication date: 19/06/2018
Field of study

Pervasive Online Social Networks (POSNs) are the extensions of Online Social Networks (OSNs) which facilitate connectivity irrespective of the domain and properties of users. POSNs have been accumulated with the convergence of a plethora of social networking platforms with a motivation of bridging their gap. Over the last decade, OSNs have visually perceived an altogether tremendous amount of advancement in terms of the number of users as well as technology enablers. A single OSN is the property of an organization, which ascertains smooth functioning of its accommodations for providing a quality experience to their users. However, with POSNs, multiple OSNs have coalesced through communities, circles, or only properties, which make service-provisioning tedious and arduous to sustain. Especially, challenges become rigorous when the focus is on the security perspective of cross-platform OSNs, which are an integral part of POSNs. Thus, it is of utmost paramountcy to highlight such a requirement and understand the current situation while discussing the available state-of-the-art. With the modernization of OSNs and convergence towards POSNs, it is compulsory to understand the impact and reach of current solutions for enhancing the security of users as well as associated services. This survey understands this requisite and fixates on different sets of studies presented over the last few years and surveys them for their applicability to POSNs...Comment: 39 Pages, 10 Figure

arXiv.org e-Print Archive

A Survey of Parallel Sequential Pattern Mining

Author: Chao Han-Chieh
Fournier-Viger Philippe
Gan Wensheng
Lin Jerry Chun-Wei
Yu Philip S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 04/04/2019
Field of study

With the growing popularity of shared resources, large volumes of complex data of different types are collected automatically. Traditional data mining algorithms generally have problems and challenges including huge memory cost, low processing speed, and inadequate hard disk space. As a fundamental task of data mining, sequential pattern mining (SPM) is used in a wide variety of real-life applications. However, it is more complex and challenging than other pattern mining tasks, i.e., frequent itemset mining and association rule mining, and also suffers from the above challenges when handling the large-scale data. To solve these problems, mining sequential patterns in a parallel or distributed computing environment has emerged as an important issue with many applications. In this paper, an in-depth survey of the current status of parallel sequential pattern mining (PSPM) is investigated and provided, including detailed categorization of traditional serial SPM approaches, and state of the art parallel SPM. We review the related work of parallel sequential pattern mining in detail, including partition-based algorithms for PSPM, Apriori-based PSPM, pattern growth based PSPM, and hybrid algorithms for PSPM, and provide deep description (i.e., characteristics, advantages, disadvantages and summarization) of these parallel approaches of PSPM. Some advanced topics for PSPM, including parallel quantitative / weighted / utility sequential pattern mining, PSPM from uncertain data and stream data, hardware acceleration for PSPM, are further reviewed in details. Besides, we review and provide some well-known open-source software of PSPM. Finally, we summarize some challenges and opportunities of PSPM in the big data era.Comment: Accepted by ACM Trans. on Knowl. Discov. Data, 33 page

arXiv.org e-Print Archive

Federated Machine Learning: Concept and Applications

Author: Chen Tianjian
Liu Yang
Tong Yongxin
Yang Qiang
Publication venue
Publication date: 13/02/2019
Field of study

Today's AI still faces two major challenges. One is that in most industries, data exists in the form of isolated islands. The other is the strengthening of data privacy and security. We propose a possible solution to these challenges: secure federated learning. Beyond the federated learning framework first proposed by Google in 2016, we introduce a comprehensive secure federated learning framework, which includes horizontal federated learning, vertical federated learning and federated transfer learning. We provide definitions, architectures and applications for the federated learning framework, and provide a comprehensive survey of existing works on this subject. In addition, we propose building data networks among organizations based on federated mechanisms as an effective solution to allow knowledge to be shared without compromising user privacy

arXiv.org e-Print Archive

Privacy Preserving Utility Mining: A Survey

Author: Chao Han-Chieh
Gan Wensheng
Lin Jerry Chun-Wei
Wang Shyue-Liang
Yu Philip S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/11/2018
Field of study

In big data era, the collected data usually contains rich information and hidden knowledge. Utility-oriented pattern mining and analytics have shown a powerful ability to explore these ubiquitous data, which may be collected from various fields and applications, such as market basket analysis, retail, click-stream analysis, medical analysis, and bioinformatics. However, analysis of these data with sensitive private information raises privacy concerns. To achieve better trade-off between utility maximizing and privacy preserving, Privacy-Preserving Utility Mining (PPUM) has become a critical issue in recent years. In this paper, we provide a comprehensive overview of PPUM. We first present the background of utility mining, privacy-preserving data mining and PPUM, then introduce the related preliminaries and problem formulation of PPUM, as well as some key evaluation criteria for PPUM. In particular, we present and discuss the current state-of-the-art PPUM algorithms, as well as their advantages and deficiencies in detail. Finally, we highlight and discuss some technical challenges and open directions for future research on PPUM.Comment: 2018 IEEE International Conference on Big Data, 10 page

arXiv.org e-Print Archive

A Survey on Geographically Distributed Big-Data Processing using MapReduce

Author: Dolev Shlomi
Florissi Patricia
Gudes Ehud
Sharma Shantanu
Singer Ido
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/07/2017
Field of study

Hadoop and Spark are widely used distributed processing frameworks for large-scale data processing in an efficient and fault-tolerant manner on private or public clouds. These big-data processing systems are extensively used by many industries, e.g., Google, Facebook, and Amazon, for solving a large class of problems, e.g., search, clustering, log analysis, different types of join operations, matrix multiplication, pattern matching, and social network analysis. However, all these popular systems have a major drawback in terms of locally distributed computations, which prevent them in implementing geographically distributed data processing. The increasing amount of geographically distributed massive data is pushing industries and academia to rethink the current big-data processing systems. The novel frameworks, which will be beyond state-of-the-art architectures and technologies involved in the current system, are expected to process geographically distributed data at their locations without moving entire raw datasets to a single location. In this paper, we investigate and discuss challenges and requirements in designing geographically distributed data processing frameworks and protocols. We classify and study batch processing (MapReduce-based systems), stream processing (Spark-based systems), and SQL-style processing geo-distributed frameworks, models, and algorithms with their overhead issues.Comment: IEEE Transactions on Big Data; Accepted June 2017. 20 page

arXiv.org e-Print Archive

Algorithm and approaches to handle large Data- A Survey

Author: Kumar Manoj
Wang Shuliang
Yadav Chanchal
Publication venue
Publication date: 20/07/2013
Field of study

Data mining environment produces a large amount of data, that need to be analyzed, patterns have to be extracted from that to gain knowledge. In this new era with boom of data both structured and unstructured, in the field of genomics, meteorology, biology, environmental research and many others, it has become difficult to process, manage and analyze patterns using traditional databases and architectures. So, a proper architecture should be understood to gain knowledge about the Big Data. This paper presents a review of various algorithms from 1994-2013 necessary for handling such large data set. These algorithms define various structures and methods implemented to handle Big Data, also in the paper are listed various tool that were developed for analyzing them.Comment: 5 page

arXiv.org e-Print Archive

Privacy in Social Media: Identification, Mitigation and Applications

Author: Beigi Ghazaleh
Liu Huan
Publication venue
Publication date: 06/08/2018
Field of study

The increasing popularity of social media has attracted a huge number of people to participate in numerous activities on a daily basis. This results in tremendous amounts of rich user-generated data. This data provides opportunities for researchers and service providers to study and better understand users' behaviors and further improve the quality of the personalized services. Publishing user-generated data risks exposing individuals' privacy. Users privacy in social media is an emerging task and has attracted increasing attention in recent years. These works study privacy issues in social media from the two different points of views: identification of vulnerabilities, and mitigation of privacy risks. Recent research has shown the vulnerability of user-generated data against the two general types of attacks, identity disclosure and attribute disclosure. These privacy issues mandate social media data publishers to protect users' privacy by sanitizing user-generated data before publishing it. Consequently, various protection techniques have been proposed to anonymize user-generated social media data. There is a vast literature on privacy of users in social media from many perspectives. In this survey, we review the key achievements of user privacy in social media. In particular, we review and compare the state-of-the-art algorithms in terms of the privacy leakage attacks and anonymization algorithms. We overview the privacy risks from different aspects of social media and categorize the relevant works into five groups 1) graph data anonymization and de-anonymization, 2) author identification, 3) profile attribute disclosure, 4) user location and privacy, and 5) recommender systems and privacy issues. We also discuss open problems and future research directions for user privacy issues in social media.Comment: This survey is currently under revie

arXiv.org e-Print Archive