2,091 research outputs found

    Privacy preserving recommender systems

    Get PDF
    The recommender systems help users find suitable and interesting products and contents from the huge amount of information that are available in the internet. There are various types of recommender systems available which have been providing recommendation services to users. For example Collaborative Filtering (CF) based recommendations, Content based (CB) recommendations, context aware recommendations and so on. Despite the fact that these recommender systems are very useful to solve the information overload problem by filtering interesting information, they suffer from huge privacy issues. In order to generate user personalized recommendations, the recommendation service providers need to acquire the information related to attributes, preferences, experiences as well as demands, which are related to users' confidential information. Usually the more information available to the service providers, the more accurate recommendations can be generated. However, the service providers are not always trustworthy to share personal information for recommendation purposes since they may cause serious privacy threats to users' privacy by leaking them to other parties or providing false recommendations. Therefore the user information must be protected prior to share them to any third party service provider to ensure the privacy of users. To overcome the privacy issues of recommender systems several techniques have been proposed which can be categorized into decentralization, randomization and secure computations based approaches. In decentralization based approach, the central service providers are removed and the main controls of recommendation services are given to participant users. The main issue with this kind of approach is that to generate recommendations, the users need to be dependant to other users' availability in online services. If any user becomes offline, her information can not be used in the system. The randomization based techniques add noises to users data to obfuscate them from learning the true information. However the main issue is that adding noise affects recommendation accuracy. On the contrary, the secure computations preserve user information while providing accurate recommendations. In this thesis we preserve user privacy by means of encrypting user information, specifically their ratings and other related information using homomorphic encryption based techniques to provide recommendations based on the encrypted data. The main advantage of homomorphic encryption based technique is that it is semantically secure and computationally it is hard to distinguish the true information from the given ciphertext. Using the homomorphic based encryption tools and techniques we build different privacy preserving protocols for different types of recommendation approaches by analyzing their privacy requirements and challenges. More specifically, we focus on different key recommendation techniques and differentiate them into centralized and partitioned dataset based recommendation techniques. From available recommendation techniques, we found that some of the existing and popular recommendation techniques like user based recommendation, item based recommendation and context aware recommendation can be grouped into centralized recommendation approach. In partitioned dataset based recommendation, the user information can be partitioned into different organizations and these organizations can collaborate with each other by gathering sufficient information in order to provide accurate recommendations without revealing their own confidential information. After categorizing the recommendation techniques we analyze the problems and requirements in terms of privacy preservation. Then for each type of recommendation approach, we develop the privacy preserving protocols to generate recommendations taking their specific privacy requirements and challenges into consideration. We also investigate the problems and limitations of existing privacy preserving recommendations and found that the current solutions suffer from huge computation and communication overhead as well as privacy of users. In the thesis we identify the related problems and solve the issues using our proposed privacy preserving protocols. As an overall idea, our proposed recommendation protocols work as follows. The users encrypt their ratings using homomorphic encryption and send them to service providers. We assume the service providers are semi honest but curious, they follow the protocol but at the same time try to find new information from the available data. The service provider has the ability to perform homomorphic operations and it performs certain computations over encrypted data without learning any true information and returns the results to the query users who ask for recommendations. The system models of our privacy preserving protocols for different recommendation techniques differ from each other because of their different privacy requirements. The proposed privacy preserving protocols are tested on various real world datasets. Based on the application areas of different recommendation approaches our gathered datasets are also different such as movie rating, social network, checkin information for different locations and quality of service of web services. For each proposed privacy preserving protocols we also present the privacy analysis and describe how the system can perform the computations without leaking the private information of users. The experimental and privacy analysis of our proposed privacy preserving protocols for different types of recommendation techniques show that they are private as well as practical

    Privacy preserving association rule mining using attribute-identity mapping

    Get PDF
    Association rule mining uncovers hidden yet important patterns in data. Discovery of the patterns helps data owners to make right decision to enhance efficiency, increase profit and reduce loss. However, there is privacy concern especially when the data owner is not the miner or when many parties are involved. This research studied privacy preserving association rule mining (PPARM) of horizontally partitioned and outsourced data. Existing research works in the area concentrated mainly on the privacy issue and paid very little attention to data quality issue. Meanwhile, the more the data quality, the more accurate and reliable will the association rules be. Consequently, this research proposed Attribute-Identity Mapping (AIM) as a PPARM technique to address the data quality issue. Given a dataset, AIM identifies set of attributes, attribute values for each attribute. It then assigns ‘unique’ identity for each of the attributes and their corresponding values. It then generates sanitized dataset by replacing each attribute and its values with their corresponding identities. For privacy preservation purpose, the sanitization process will be carried out by data owners. They then send the sanitized data, which is made up of only identities, to data miner. When any or all the data owners need(s) ARM result from the aggregate data, they send query to the data miner. The query constitutes attributes (in form of identities), minSup and minConf thresholds and then number of rules they are want. Results obtained show that the PPARM technique maintains 100% data quality without compromising privacy, using Census Income dataset

    Functional encryption based approaches for practical privacy-preserving machine learning

    Get PDF
    Machine learning (ML) is increasingly being used in a wide variety of application domains. However, deploying ML solutions poses a significant challenge because of increasing privacy concerns, and requirements imposed by privacy-related regulations. To tackle serious privacy concerns in ML-based applications, significant recent research efforts have focused on developing privacy-preserving ML (PPML) approaches by integrating into ML pipeline existing anonymization mechanisms or emerging privacy protection approaches such as differential privacy, secure computation, and other architectural frameworks. While promising, existing secure computation based approaches, however, have significant computational efficiency issues and hence, are not practical. In this dissertation, we address several challenges related to PPML and propose practical secure computation based approaches to solve them. We consider both two-tier cloud-based and three-tier hybrid cloud-edge based PPML architectures and address both emerging deep learning models and federated learning approaches. The proposed approaches enable us to outsource data or update a locally trained model in a privacy-preserving manner by employing computation over encrypted datasets or local models. Our proposed secure computation solutions are based on functional encryption (FE) techniques. Evaluation of the proposed approaches shows that they are efficient and more practical than existing approaches, and provide strong privacy guarantees. We also address issues related to the trustworthiness of various entities within the proposed PPML infrastructures. This includes a third-party authority (TPA) which plays a critical role in the proposed FE-based PPML solutions, and cloud service providers. To ensure that such entities can be trusted, we propose a transparency and accountability framework using blockchain. We show that the proposed transparency framework is effective and guarantees security properties. Experimental evaluation shows that the proposed framework is efficient

    Vertical Federated Learning

    Full text link
    Vertical Federated Learning (VFL) is a federated learning setting where multiple parties with different features about the same set of users jointly train machine learning models without exposing their raw data or model parameters. Motivated by the rapid growth in VFL research and real-world applications, we provide a comprehensive review of the concept and algorithms of VFL, as well as current advances and challenges in various aspects, including effectiveness, efficiency, and privacy. We provide an exhaustive categorization for VFL settings and privacy-preserving protocols and comprehensively analyze the privacy attacks and defense strategies for each protocol. In the end, we propose a unified framework, termed VFLow, which considers the VFL problem under communication, computation, privacy, and effectiveness constraints. Finally, we review the most recent advances in industrial applications, highlighting open challenges and future directions for VFL

    Bench-Ranking: ettekirjutav analüüsimeetod suurte teadmiste graafide päringutele

    Get PDF
    Relatsiooniliste suurandmete (BD) töötlemisraamistike kasutamine suurte teadmiste graafide töötlemiseks kätkeb endas võimalust päringu jõudlust optimeerimida. Kaasaegsed BD-süsteemid on samas keerulised andmesüsteemid, mille konfiguratsioonid omavad olulist mõju jõudlusele. Erinevate raamistike ja konfiguratsioonide võrdlusuuringud pakuvad kogukonnale parimaid tavasid parema jõudluse saavutamiseks. Enamik neist võrdlusuuringutest saab liigitada siiski vaid kirjeldavaks ja diagnostiliseks analüütikaks. Lisaks puudub ühtne standard nende uuringute võrdlemiseks kvantitatiivselt järjestatud kujul. Veelgi enam, suurte graafide töötlemiseks vajalike konveierite kavandamine eeldab täiendavaid disainiotsuseid mis tulenevad mitteloomulikust (relatsioonilisest) graafi töötlemise paradigmast. Taolisi disainiotsuseid ei saa automaatselt langetada, nt relatsiooniskeemi, partitsioonitehnika ja salvestusvormingute valikut. Käesolevas töös käsitleme kuidas me antud uurimuslünga täidame. Esmalt näitame disainiotsuste kompromisside mõju BD-süsteemide jõudluse korratavusele suurte teadmiste graafide päringute tegemisel. Lisaks näitame BD-raamistike jõudluse kirjeldavate ja diagnostiliste analüüside piiranguid suurte graafide päringute tegemisel. Seejärel uurime, kuidas lubada ettekirjutavat analüütikat järjestamisfunktsioonide ja mitmemõõtmeliste optimeerimistehnikate (nn "Bench-Ranking") kaudu. See lähenemine peidab kirjeldava tulemusanalüüsi keerukuse, suunates praktiku otse teostatavate teadlike otsusteni.Leveraging relational Big Data (BD) processing frameworks to process large knowledge graphs yields a great interest in optimizing query performance. Modern BD systems are yet complicated data systems, where the configurations notably affect the performance. Benchmarking different frameworks and configurations provides the community with best practices for better performance. However, most of these benchmarking efforts are classified as descriptive and diagnostic analytics. Moreover, there is no standard for comparing these benchmarks based on quantitative ranking techniques. Moreover, designing mature pipelines for processing big graphs entails considering additional design decisions that emerge with the non-native (relational) graph processing paradigm. Those design decisions cannot be decided automatically, e.g., the choice of the relational schema, partitioning technique, and storage formats. Thus, in this thesis, we discuss how our work fills this timely research gap. Particularly, we first show the impact of those design decisions’ trade-offs on the BD systems’ performance replicability when querying large knowledge graphs. Moreover, we showed the limitations of the descriptive and diagnostic analyses of BD frameworks’ performance for querying large graphs. Thus, we investigate how to enable prescriptive analytics via ranking functions and Multi-Dimensional optimization techniques (called ”Bench-Ranking”). This approach abstracts out from the complexity of descriptive performance analysis, guiding the practitioner directly to actionable informed decisions.https://www.ester.ee/record=b553332

    Cloud based privacy preserving data mining model using hybrid k-anonymity and partial homomorphic encryption

    Get PDF
    The evolution of information and communication technologies have encourage numerous organizations to outsource their business and data to cloud computing to perform data mining and other data processing operations. Despite the great benefits of the cloud, it has a real problem in the security and privacy of data. Many studies explained that attackers often reveal the information from third-party services or third-party clouds. When a data owners outsource their data to the cloud, especially the SaaS cloud model, it is difficult to preserve the confidentiality and integrity of the data. Privacy-Preserving Data Mining (PPDM) aims to accomplish data mining operations while protecting the owner's data from violation. The current models of PPDM have some limitations. That is, they suffer from data disclosure caused by identity and attributes disclosure where some private information is revealed which causes the success of different types of attacks. Besides, existing solutions have poor data utility and high computational performance overhead. Therefore, this research aims to design and develop Hybrid Anonymization Cryptography PPDM (HAC-PPDM) model to improve the privacy-preserving level by reducing data disclosure before outsourcing data for mining over the cloud while maintaining data utility. The proposed HAC-PPDM model is further aimed reducing the computational performance overhead to improve efficiency. The Quasi-Identifiers Recognition algorithm (QIR) is defined and designed depending on attributes classification and Quasi-Identifiers dimension determine to overcome the identity disclosure caused by Quasi-Identifiers linking to reduce privacy leakage. An Enhanced Homomorphic Scheme is designed based on hybridizing Cloud-RSA encryption scheme, Extended Euclidean algorithm (EE), Fast Modular Exponentiation algorithm (FME), and Chinese Remainder Theorem (CRT) to minimize the computational time complexity while reducing the attribute disclosure. The proposed QIR, Enhanced Homomorphic Scheme and k-anonymity privacy model have been hybridized to obtain optimal data privacy-preservation before outsourced it on the cloud while maintaining the utility of data that meets the needs of mining with good efficiency. Real-world datasets have been used to evaluate the proposed algorithms and model. The experimental results show that the proposed QIR algorithm improved the data privacy-preserving percentage by 23% while maintaining the same or slightly better data utility. Meanwhile, the proposed Enhanced Homomorphic Scheme is more efficient comparing to the related works in terms of time complexity as represented by Big O notation. Moreover, it reduced the computational time of the encryption, decryption, and key generation time. Finally, the proposed HAC-PPDM model successfully reduced the data disclosures and improved the privacy-preserving level while preserved the data utility as it reduced the information loss. In short, it achieved improvement of privacy preserving and data mining (classification) accuracy by 7.59 % and 0.11 % respectively

    Secure Protocols for Privacy-preserving Data Outsourcing, Integration, and Auditing

    Get PDF
    As the amount of data available from a wide range of domains has increased tremendously in recent years, the demand for data sharing and integration has also risen. The cloud computing paradigm provides great flexibility to data owners with respect to computation and storage capabilities, which makes it a suitable platform for them to share their data. Outsourcing person-specific data to the cloud, however, imposes serious concerns about the confidentiality of the outsourced data, the privacy of the individuals referenced in the data, as well as the confidentiality of the queries processed over the data. Data integration is another form of data sharing, where data owners jointly perform the integration process, and the resulting dataset is shared between them. Integrating related data from different sources enables individuals, businesses, organizations and government agencies to perform better data analysis, make better informed decisions, and provide better services. Designing distributed, secure, and privacy-preserving protocols for integrating person-specific data, however, poses several challenges, including how to prevent each party from inferring sensitive information about individuals during the execution of the protocol, how to guarantee an effective level of privacy on the released data while maintaining utility for data mining, and how to support public auditing such that anyone at any time can verify that the integration was executed correctly and no participants deviated from the protocol. In this thesis, we address the aforementioned concerns by presenting secure protocols for privacy-preserving data outsourcing, integration and auditing. First, we propose a secure cloud-based data outsourcing and query processing framework that simultaneously preserves the confidentiality of the data and the query requests, while providing differential privacy guarantees on the query results. Second, we propose a publicly verifiable protocol for integrating person-specific data from multiple data owners, while providing differential privacy guarantees and maintaining an effective level of utility on the released data for the purpose of data mining. Next, we propose a privacy-preserving multi-party protocol for high-dimensional data mashup with guaranteed LKC-privacy on the output data. Finally, we apply the theory to the real world problem of solvency in Bitcoin. More specifically, we propose a privacy-preserving and publicly verifiable cryptographic proof of solvency scheme for Bitcoin exchanges such that no information is revealed about the exchange's customer holdings, the value of the exchange's total holdings is kept secret, and multiple exchanges performing the same proof of solvency can contemporaneously prove they are not colluding
    corecore