130 research outputs found

    A Trust-based Recommender System over Arbitrarily Partitioned Data with Privacy

    Get PDF
    Recommender systems are effective mechanisms for recommendations about what to watch, read, or taste based on user ratings about experienced products or services. To achieve higher quality recommendations, e-commerce parties may prefer to collaborate over partitioned data. Due to privacy issues, they might hesitate to work in pairs and some solutions motivate them to collaborate. This study examines how to estimate trust-based predictions on arbitrarily partitioned data in which two parties have ratings for similar sets of customers and items. A privacy- preserving scheme is proposed, and it is justified that it efficiently offers trust-based predictions on partitioned data while preserving privacy

    Efficient distributed privacy preserving clustering

    Get PDF
    With recent growing concerns about data privacy, researchers have focused their attention to developing new algorithms to perform privacy preserving data mining. However, methods proposed until now are either very inefficient to deal with large datasets, or compromise privacy with accuracy of data mining results. Secure multiparty computation helps researchers develop privacy preserving data mining algorithms without having to compromise quality of data mining results with data privacy. Also it provides formal guarantees about privacy. On the other hand, algorithms based on secure multiparty computation often rely on computationally expensive cryptographic operations, thus making them infeasible to use in real world scenarios. In this thesis, we study the problem of privacy preserving distributed clustering and propose an efficient and secure algorithm for this problem based on secret sharing and compare it to the state of the art. Experiments show that our algorithm has a lower communication overhead and a much lower computation overhead than the state of the art

    Practical Privacy-Preserving K-means Clustering

    Get PDF
    Clustering is a common technique for data analysis, which aims to partition data into similar groups. When the data comes from different sources, it is highly desirable to maintain the privacy of each database. In this work, we study a popular clustering algorithm (K-means) and adapt it to the privacy-preserving context. Specifically, to construct our privacy-preserving clustering algorithm, we first propose an efficient batched Euclidean squared distance computation protocol in the adaptive amortizing setting, when one needs to compute the distance from the same point to other points. This protocol can also serve as a key building block in many real-world applications such as Bio-metric Identification. Furthermore, we construct a customized garbled circuit for computing the minimum value among shared values. We implement and evaluate our protocols to demonstrate their practicality and show that they are able to train datasets that are much larger and faster than in the previous work. The numerical results also show that the proposed protocol achieve almost the same accuracy compared to a K-means plain-text clustering algorithm

    Privacy-Preserving Crowdsourcing-Based Recommender Systems for E-Commerce & Health Services

    Get PDF
    En l’actualitat, els sistemes de recomanació han esdevingut un mecanisme fonamental per proporcionar als usuaris informació útil i filtrada, amb l’objectiu d’optimitzar la presa de decisions, com per exemple, en el camp del comerç electrònic. La quantitat de dades existent a Internet és tan extensa que els usuaris necessiten sistemes automàtics per ajudar-los a distingir entre informació valuosa i soroll. No obstant, sistemes de recomanació com el Filtratge Col·laboratiu tenen diverses limitacions, com ara la manca de resposta i la privadesa. Una part important d'aquesta tesi es dedica al desenvolupament de metodologies per fer front a aquestes limitacions. A més de les aportacions anteriors, en aquesta tesi també ens centrem en el procés d'urbanització que s'està produint a tot el món i en la necessitat de crear ciutats més sostenibles i habitables. En aquest context, ens proposem solucions de salut intel·ligent (s-health) i metodologies eficients de caracterització de canals sense fils, per tal de proporcionar assistència sanitària sostenible en el context de les ciutats intel·ligents.En la actualidad, los sistemas de recomendación se han convertido en una herramienta indispensable para proporcionar a los usuarios información útil y filtrada, con el objetivo de optimizar la toma de decisiones en una gran variedad de contextos. La cantidad de datos existente en Internet es tan extensa que los usuarios necesitan sistemas automáticos para ayudarles a distinguir entre información valiosa y ruido. Sin embargo, sistemas de recomendación como el Filtrado Colaborativo tienen varias limitaciones, tales como la falta de respuesta y la privacidad. Una parte importante de esta tesis se dedica al desarrollo de metodologías para hacer frente a esas limitaciones. Además de las aportaciones anteriores, en esta tesis también nos centramos en el proceso de urbanización que está teniendo lugar en todo el mundo y en la necesidad de crear ciudades más sostenibles y habitables. En este contexto, proponemos soluciones de salud inteligente (s-health) y metodologías eficientes de caracterización de canales inalámbricos, con el fin de proporcionar asistencia sanitaria sostenible en el contexto de las ciudades inteligentes.Our society lives an age where the eagerness for information has resulted in problems such as infobesity, especially after the arrival of Web 2.0. In this context, automatic systems such as recommenders are increasing their relevance, since they help to distinguish noise from useful information. However, recommender systems such as Collaborative Filtering have several limitations such as non-response and privacy. An important part of this thesis is devoted to the development of methodologies to cope with these limitations. In addition to the previously stated research topics, in this dissertation we also focus in the worldwide process of urbanisation that is taking place and the need for more sustainable and liveable cities. In this context, we focus on smart health solutions and efficient wireless channel characterisation methodologies, in order to provide sustainable healthcare in the context of smart cities

    Differentially-private Multiparty Clustering

    Get PDF
    In an era marked by the widespread application of Machine Learning (ML) across diverse domains, the necessity of privacy-preserving techniques has become paramount. The Euclidean k-Means problem, a fundamental component of unsupervised learning, brings to light this privacy challenge, especially in federated contexts. Existing Federated approaches utilizing Secure Multiparty Computation (SMPC) or Homomorphic Encryption (HE) techniques, although promising, suffer from substantial overheads and do not offer output privacy. At the same time, differentially private k-Means algorithms fall short in federated settings. Recognizing the critical need for innovative solutions safeguarding privacy, this work pioneers integrating Differential Privacy (DP) into federated k-Means. The key contributions of this dissertation include the novel integration of DP in horizontally-federated k-Means, a lightweight aggregation protocol offering three orders of magnitude speedup over other multiparty approaches, the application of cluster-size constraints in DP k-Means to enhance state-of-the-art utility, and a meticulous examination of various aggregation methods in the protocol. Unlike traditional privacy-preserving approaches, our innovative design results in a faster, more private, and more accurate solution, significantly advancing the state-of-the-art in privacy-preserving machine learning

    Uncovering the Potential of Federated Learning: Addressing Algorithmic and Data-driven Challenges under Privacy Restrictions

    Get PDF
    Federated learning is a groundbreaking distributed machine learning paradigm that allows for the collaborative training of models across various entities without directly sharing sensitive data, ensuring privacy and robustness. This Ph.D. dissertation delves into the intricacies of federated learning, investigating the algorithmic and data-driven challenges of deep learning models in the presence of additive noise in this framework. The main objective is to provide strategies to measure the generalization, stability, and privacy-preserving capabilities of these models and further improve them. To this end, five noise infusion mechanisms at varying noise levels within centralized and federated learning settings are explored. As model complexity is a key component of the generalization and stability of deep learning models during training and evaluation, a comparative analysis of three Convolutional Neural Network (CNN) architectures is provided. A key contribution of this study is introducing specific metrics for training with noise. Signal-to-Noise Ratio (SNR) is introduced as a quantitative measure of the trade-off between privacy and training accuracy of noise-infused models, aiming to find the noise level that yields optimal privacy and accuracy. Moreover, the Price of Stability and Price of Anarchy are defined in the context of privacy-preserving deep learning, contributing to the systematic investigation of the noise infusion mechanisms to enhance privacy without compromising performance. This research sheds light on the delicate balance between these critical factors, fostering a deeper understanding of the implications of noise-based regularization in machine learning. The present study also explores a real-world application of federated learning in weather prediction applications that suffer from the issue of imbalanced datasets. Utilizing data from multiple sources combined with advanced data augmentation techniques improves the accuracy and generalization of weather prediction models, even when dealing with imbalanced datasets. Overall, federated learning is pivotal in harnessing decentralized datasets for real-world applications while safeguarding privacy. By leveraging noise as a tool for regularization and privacy enhancement, this research study aims to contribute to the development of robust, privacy-aware algorithms, ensuring that AI-driven solutions prioritize both utility and privacy

    Identity, location and query privacy for smart devices

    Full text link
    In this thesis, we have discussed three important aspects of users\u27 privacy namely, location privacy, identity privacy and query privacy. The information related to identity, location and query is very sensitive as it can reveal behavior patterns, interests, preferences and habits of the users. We have proposed several techniques in the thesis on how to better protect the identity, location and query privacy

    New Fundamental Technologies in Data Mining

    Get PDF
    The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. The series of books entitled by "Data Mining" address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters. The contributing authors have highlighted many future research directions that will foster multi-disciplinary collaborations and hence will lead to significant development in the field of data mining

    Aggregating privatized medical data for secure querying applications

    Full text link
     This thesis analyses and examines the challenges of aggregation of sensitive data and data querying on aggregated data at cloud server. This thesis also delineates applications of aggregation of sensitive medical data in several application scenarios, and tests privatization techniques to assist in improving the strength of privacy and utility

    Data utility and privacy protection in data publishing

    Get PDF
    Data about individuals is being increasingly collected and disseminated for purposes such as business analysis and medical research. This has raised some privacy concerns. In response, a number of techniques have been proposed which attempt to transform data prior to its release so that sensitive information about the individuals contained within it is protected. A:-Anonymisation is one such technique that has attracted much recent attention from the database research community. A:-Anonymisation works by transforming data in such a way that each record is made identical to at least A: 1 other records with respect to those attributes that are likely to be used to identify individuals. This helps prevent sensitive information associated with individuals from being disclosed, as each individual is represented by at least A: records in the dataset. Ideally, a /c-anonymised dataset should maximise both data utility and privacy protection, i.e. it should allow intended data analytic tasks to be carried out without loss of accuracy while preventing sensitive information disclosure, but these two notions are conflicting and only a trade-off between them can be achieved in practice. The existing works, however, focus on how either utility or protection requirement may be satisfied, which often result in anonymised data with an unnecessarily and/or unacceptably low level of utility or protection. In this thesis, we study how to construct /-anonymous data that satisfies both data utility and privacy protection requirements. We propose new criteria to capture utility and protection requirements, and new algorithms that allow A:-anonymisations with required utility/protection trade-off or guarantees to be generated. Our extensive experiments using both benchmarking and synthetic datasets show that our methods are efficient, can produce A:-anonymised data with desired properties, and outperform the state of the art methods in retaining data utility and providing privacy protection
    corecore