Search CORE

117 research outputs found

Federated Analytics-Empowered Frequent Pattern Mining for Decentralized Web 3.0 Applications

Author: Han Zhu
Wang Dan
Wang Zibo
Zhu Yifei
Publication venue
Publication date: 15/02/2024
Field of study

The emerging Web 3.0 paradigm aims to decentralize existing web services, enabling desirable properties such as transparency, incentives, and privacy preservation. However, current Web 3.0 applications supported by blockchain infrastructure still cannot support complex data analytics tasks in a scalable and privacy-preserving way. This paper introduces the emerging federated analytics (FA) paradigm into the realm of Web 3.0 services, enabling data to stay local while still contributing to complex web analytics tasks in a privacy-preserving way. We propose FedWeb, a tailored FA design for important frequent pattern mining tasks in Web 3.0. FedWeb remarkably reduces the number of required participating data owners to support privacy-preserving Web 3.0 data analytics based on a novel distributed differential privacy technique. The correctness of mining results is guaranteed by a theoretically rigid candidate filtering scheme based on Hoeffding's inequality and Chebychev's inequality. Two response budget saving solutions are proposed to further reduce participating data owners. Experiments on three representative Web 3.0 scenarios show that FedWeb can improve data utility by ~25.3% and reduce the participating data owners by ~98.4%.Comment: Accepted by IEEE International Conference on Computer Communications (INFOCOM'24

arXiv.org e-Print Archive

Exploring the Existing and Unknown Side Effects of Privacy Preserving Data Mining Algorithms

Author: Sadashiva Reddy Hima Bindu
Publication venue: NSUWorks
Publication date: 01/01/2022
Field of study

The data mining sanitization process involves converting the data by masking the sensitive data and then releasing it to public domain. During the sanitization process, side effects such as hiding failure, missing cost and artificial cost of the data were observed. Privacy Preserving Data Mining (PPDM) algorithms were developed for the sanitization process to overcome information loss and yet maintain data integrity. While these PPDM algorithms did provide benefits for privacy preservation, they also made sure to solve the side effects that occurred during the sanitization process. Many PPDM algorithms were developed to reduce these side effects. There are several PPDM algorithms created based on different PPDM techniques. However, previous studies have not explored or justified why non-traditional side effects were not given much importance. This study reported the findings of the side effects for the PPDM algorithms in a newly created web repository. The research methodology adopted for this study was Design Science Research (DSR). This research was conducted in four phases, which were as follows. The first phase addressed the characteristics, similarities, differences, and relationships of existing side effects. The next phase found the characteristics of non-traditional side effects. The third phase used the Privacy Preservation and Security Framework (PPSF) tool to test if non-traditional side effects occur in PPDM algorithms. This phase also attempted to find additional unknown side effects which have not been found in prior studies. PPDM algorithms considered were Greedy, POS2DT, SIF_IDF, cpGA2DT, pGA2DT, sGA2DT. PPDM techniques associated were anonymization, perturbation, randomization, condensation, heuristic, reconstruction, and cryptography. The final phase involved creating a new online web repository to report all the side effects found for the PPDM algorithms. A Web repository was created using full stack web development. AngularJS, Spring, Spring Boot and Hibernate frameworks were used to build the web application. The results of the study implied various PPDM algorithms and their side effects. Additionally, the relationship and impact that hiding failure, missing cost, and artificial cost have on each other was also understood. Interestingly, the side effects and their relationship with the type of data (sensitive or non-sensitive or new) was observed. As the web repository acts as a quick reference domain for PPDM algorithms. Developing, improving, inventing, and reporting PPDM algorithms is necessary. This study will influence researchers or organizations to report, use, reuse, or develop better PPDM algorithms

NSU Works

Secure information sharing on Decentralized Social Networks.

Author: Albertini Davide Alberto
Publication venue: country:Italy
Publication date: 01/01/2017
Field of study

Decentralized Social Networks (DSNs) are web-based platforms built on distributed systems (federations) composed of multiple providers (pods) that run the same social networking service. DSNs have been presented as a valid alternative to Online Social Networks (OSNs), replacing the centralized paradigm of OSNs with a decentralized distribution of the features o\u21b5ered by the social networking platform. Similarly to commercial OSNs, DSNs o\u21b5er to their subscribed users a number of distinctive features, such as the possibility to share resources with other subscribed users or the possibility to establish virtual relationships with other DSN users. On the other hand, each DSN user takes part in the service, choosing to store personal data on his/her own trusted provider inside the federation or to deploy his/her own provider on a private machine. This, thus, gives each DSN user direct control of his/hers data and prevents the social network provider from performing data mining analysis over these information. Unfortunately, the deployment of a personal DSN pod is not as simple as it sounds. Indeed, each pod\u2019s owner has to maintain the security, integrity, and reliability of all the data stored in that provider. Furthermore, given the amount of data produced each day in a social network service, it is reasonable to assume that the majority of users cannot a\u21b5ord the upkeep of an hardware capable of handling such amount of information. As a result, it has been shown that most of DSN users prefer to subscribe to an existing provider despite setting up a new one, bringing to an indirect centralization of data that leads DSNs to su\u21b5er of the same issues as centralized social network services. In order to overcome this issue in this thesis we have investigated the possibility for DSN providers to lean on modern cloud-based storage services so as to o\u21b5er a cloudbased information sharing service. This has required to deal with many challenges. As such, we have investigated the definition of cryptographic protocols enabling DSN users to securely store their resources in the public cloud, along with the definition of communication protocols ensuring that decryption keys are distributed only to authorized users, that is users that satisfy at least one of the access control policies specified by data owner according to Relationship-based access control model (RelBAC) [20, 34]. In addition, it has emerged that even DSN users have the same difficulties as OSN users in defining RelBAC rules that properly express their attitude towards their own privacy. Indeed, it is nowadays well accepted that the definition of access control policies is an error-prone task. Then, since misconfigured RelBAC policies may lead to harmful data release and may expose the privacy of others as well, we believe that DSN users should be assisted in the RelBAC policy definition process. At this purpose, we have designed a RelBAC policy recommendation system such that it can learn from DSN users their own attitude towards privacy, and exploits all the learned data to assist DSN users in the definition of RelBAC policies by suggesting customized privacy rules. Nevertheless, despite the presence of the above mentioned policy recommender, it is reasonable to assume that misconfigured RelBAC rules may appear in the system. However, rather than considering all misconfigured policies as leading to potentially harmful situations, we have considered that they might even lead to an exacerbated data restriction that brings to a loss of utility to DSN users. As an example, assuming that a low resolution and an high resolution version of the same picture are uploaded in the network, we believe that the low-res version should be granted to all those users who are granted to access the hi-res version, even though, due to a misconfiurated system, no policy explicitly authorizes them on the low-res picture. As such, we have designed a technique capable of exploiting all the existing data dependencies (i.e., any correlation between data) as a mean for increasing the system utility, that is, the number of queries that can be safely answered. Then, we have defined a query rewriting technique capable of extending defined access control policy authorizations by exploiting data dependencies, in order to authorize unauthorized but inferable data. In this thesis we present a complete description of the above mentioned proposals, along with the experimental results of the tests that have been carried out so as to verify the feasibility of the presented techniques

InsubriaSPACE - Thesis PhD Repository

Archivio istituzionale della ricerca - Università dell'Insubria

Secure information sharing on Decentralized Social Networks.

Author: Albertini Davide Alberto
Publication venue: Italy
Publication date
Field of study

Decentralized Social Networks (DSNs) are web-based platforms built on distributed systems (federations) composed of multiple providers (pods) that run the same social networking service. DSNs have been presented as a valid alternative to Online Social Networks (OSNs), replacing the centralized paradigm of OSNs with a decentralized distribution of the features o↵ered by the social networking platform. Similarly to commercial OSNs, DSNs o↵er to their subscribed users a number of distinctive features, such as the possibility to share resources with other subscribed users or the possibility to establish virtual relationships with other DSN users. On the other hand, each DSN user takes part in the service, choosing to store personal data on his/her own trusted provider inside the federation or to deploy his/her own provider on a private machine. This, thus, gives each DSN user direct control of his/hers data and prevents the social network provider from performing data mining analysis over these information. Unfortunately, the deployment of a personal DSN pod is not as simple as it sounds. Indeed, each pod’s owner has to maintain the security, integrity, and reliability of all the data stored in that provider. Furthermore, given the amount of data produced each day in a social network service, it is reasonable to assume that the majority of users cannot a↵ord the upkeep of an hardware capable of handling such amount of information. As a result, it has been shown that most of DSN users prefer to subscribe to an existing provider despite setting up a new one, bringing to an indirect centralization of data that leads DSNs to su↵er of the same issues as centralized social network services. In order to overcome this issue in this thesis we have investigated the possibility for DSN providers to lean on modern cloud-based storage services so as to o↵er a cloudbased information sharing service. This has required to deal with many challenges. As such, we have investigated the definition of cryptographic protocols enabling DSN users to securely store their resources in the public cloud, along with the definition of communication protocols ensuring that decryption keys are distributed only to authorized users, that is users that satisfy at least one of the access control policies specified by data owner according to Relationship-based access control model (RelBAC) [20, 34]. In addition, it has emerged that even DSN users have the same difficulties as OSN users in defining RelBAC rules that properly express their attitude towards their own privacy. Indeed, it is nowadays well accepted that the definition of access control policies is an error-prone task. Then, since misconfigured RelBAC policies may lead to harmful data release and may expose the privacy of others as well, we believe that DSN users should be assisted in the RelBAC policy definition process. At this purpose, we have designed a RelBAC policy recommendation system such that it can learn from DSN users their own attitude towards privacy, and exploits all the learned data to assist DSN users in the definition of RelBAC policies by suggesting customized privacy rules. Nevertheless, despite the presence of the above mentioned policy recommender, it is reasonable to assume that misconfigured RelBAC rules may appear in the system. However, rather than considering all misconfigured policies as leading to potentially harmful situations, we have considered that they might even lead to an exacerbated data restriction that brings to a loss of utility to DSN users. As an example, assuming that a low resolution and an high resolution version of the same picture are uploaded in the network, we believe that the low-res version should be granted to all those users who are granted to access the hi-res version, even though, due to a misconfiurated system, no policy explicitly authorizes them on the low-res picture. As such, we have designed a technique capable of exploiting all the existing data dependencies (i.e., any correlation between data) as a mean for increasing the system utility, that is, the number of queries that can be safely answered. Then, we have defined a query rewriting technique capable of extending defined access control policy authorizations by exploiting data dependencies, in order to authorize unauthorized but inferable data. In this thesis we present a complete description of the above mentioned proposals, along with the experimental results of the tests that have been carried out so as to verify the feasibility of the presented techniques

InsubriaSPACE - Thesis PhD Repository

Distributed data mining and agents

Author: Chris Giannella
Clifton
Clifton
Evfimievski
Farkas
Forman
Han
Hand
Hastie
Hillol Kargupta
Jajodia
Johnson
Josenildo C. da Silva
Kargupta
Kargupta
Kargupta
Klusch
Matthias Klusch
Oliveira
Park
Pinkas
Provost
Ruchita Bhargava
Samatova
Saygin
Strehl
Thuraisingham
Vaidya
Verykios
Witten
Zaki
Zaki
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

On Privacy-Enhanced Distributed Analytics in Online Social Networks

Author: Wainakh Aidmar
Publication venue
Publication date: 01/01/2022
Field of study

More than half of the world's population benefits from online social network (OSN) services. A considerable part of these services is mainly based on applying analytics on user data to infer their preferences and enrich their experience accordingly. At the same time, user data is monetized by service providers to run their business models. Therefore, providers tend to extensively collect (personal) data about users. However, this data is oftentimes used for various purposes without informed consent of the users. Providers share this data in different forms with third parties (e.g., data brokers). Moreover, user sensitive data was repeatedly a subject of unauthorized access by malicious parties. These issues have demonstrated the insufficient commitment of providers to user privacy, and consequently, raised users' concerns. Despite the emergence of privacy regulations (e.g., GDPR and CCPA), recent studies showed that user personal data collection and sharing sensitive data are still continuously increasing. A number of privacy-friendly OSNs have been proposed to enhance user privacy by reducing the need for central service providers. However, this improvement in privacy protection usually comes at the cost of losing social connectivity and many analytics-based services of the wide-spread OSNs. This dissertation addresses this issue by first proposing an approach to privacy-friendly OSNs that maintains established social connections. Second, approaches that allow users to collaboratively apply distributed analytics while preserving their privacy are presented. Finally, the dissertation contributes to better assessment and mitigation of the risks associated with distributed analytics. These three research directions are treated through the following six contributions. Conceptualizing Hybrid Online Social Networks: We conceptualize a hybrid approach to privacy-friendly OSNs, HOSN. This approach combines the benefits of using COSNs and DOSN. Users can maintain their social experience in their preferred COSN while being provided with additional means to enhance their privacy. Users can seamlessly post public content or private content that is accessible only by authorized users (friends) beyond the reach of the service providers. Improving the Trustworthiness of HOSNs: We conceptualize software features to address users' privacy concerns in OSNs. We prototype these features in our HOSN}approach and evaluate their impact on the privacy concerns and the trustworthiness of the approach. Also, we analyze the relationships between four important aspects that influence users' behavior in OSNs: privacy concerns, trust beliefs, risk beliefs, and the willingness to use. Privacy-Enhanced Association Rule Mining: We present an approach to enable users to apply efficiently privacy-enhanced association rule mining on distributed data. This approach can be employed in DOSN and HOSN to generate recommendations. We leverage a privacy-enhanced distributed graph sampling method to reduce the data required for the mining and lower the communication and computational overhead. Then, we apply a distributed frequent itemset mining algorithm in a privacy-friendly manner. Privacy Enhancements on Federated Learning (FL): We identify several privacy-related issues in the emerging distributed machine learning technique, FL. These issues are mainly due to the centralized nature of this technique. We discuss tackling these issues by applying FL in a hierarchical architecture. The benefits of this approach include a reduction in the centralization of control and the ability to place defense and verification methods more flexibly and efficiently within the hierarchy. Systematic Analysis of Threats in Federated Learning: We conduct a critical study of the existing attacks in FL to better understand the actual risk of these attacks under real-world scenarios. First, we structure the literature in this field and show the research foci and gaps. Then, we highlight a number of issues in (1) the assumptions commonly made by researchers and (2) the evaluation practices. Finally, we discuss the implications of these issues on the applicability of the proposed attacks and recommend several remedies. Label Leakage from Gradients: We identify a risk of information leakage when sharing gradients in FL. We demonstrate the severity of this risk by proposing a novel attack that extracts the user annotations that describe the data (i.e., ground-truth labels) from gradients. We show the high effectiveness of the attack under different settings such as different datasets and model architectures. We also test several defense mechanisms to mitigate this attack and conclude the effective ones

TUbiblio

tuprints

Artificial intelligence of medical things for disease detection using ensemble deep learning and attention mechanism

Author: Belhadi Asma
Djenouri Youcef
Lin Jerry Chun-Wei
Srivastava Gautam
Yazidi Anis
Publication venue: 'Wiley'
Publication date: 01/01/2022
Field of study

In this paper, we present a novel paradigm for disease detection. We build an artificial intelligence based system where various biomedical data are retrieved from distributed and homogeneous sensors. We use different deep learning architectures (VGG16, RESNET, and DenseNet) with ensemble learning and attention mechanisms to study the interactions between different biomedical data to detect and diagnose diseases. We conduct extensive testing on biomedical data. The results show the benefits of using deep learning technologies in the field of artificial intelligence of medical things to diagnose diseases in the healthcare decision-making process. For example, the disease detection rate using the proposed methodology achieves 92%, which is greatly improved compared to the higher-level disease detection models.publishedVersio

SINTEF Open