31 research outputs found

    Microaggregation Sorting Framework for K-Anonymity Statistical Disclosure Control in Cloud Computing

    Get PDF
    In cloud computing, there have led to an increase in the capability to store and record personal data ( microdata ) in the cloud. In most cases, data providers have no/little control that has led to concern that the personal data may be beached. Microaggregation techniques seek to protect microdata in such a way that data can be published and mined without providing any private information that can be linked to specific individuals. An optimal microaggregation method must minimize the information loss resulting from this replacement process. The challenge is how to minimize the information loss during the microaggregation process. This paper presents a sorting framework for Statistical Disclosure Control (SDC) to protect microdata in cloud computing. It consists of two stages. In the first stage, an algorithm sorts all records in a data set in a particular way to ensure that during microaggregation very dissimilar observations are never entered into the same cluster. In the second stage a microaggregation method is used to create k -anonymous clusters while minimizing the information loss. The performance of the proposed techniques is compared against the most recent microaggregation methods. Experimental results using benchmark datasets show that the proposed algorithms perform significantly better than existing associate techniques in the literature

    Microaggregation sorting framework for k-anonymity statistical disclosure control in cloud computing

    Get PDF
    In cloud computing, there have led to an increase in the capability to store and record personal data (microdata) in the cloud. In most cases, data providers have no/little control that has led to concern that the personal data may be beached. Microaggregation techniques seek to protect microdata in such a way that data can be published and mined without providing any private information that can be linked to specific individuals. An optimal microaggregation method must minimize the information loss resulting from this replacement process. The challenge is how to minimize the information loss during the microaggregation process. This paper presents a sorting framework for Statistical Disclosure Control (SDC) to protect microdata in cloud computing. It consists of two stages. In the first stage, an algorithm sorts all records in a data set in a particular way to ensure that during microaggregation very dissimilar observations are never entered into the same cluster. In the second stage a microaggregation method is used to create k-anonymous clusters while minimizing the information loss. The performance of the proposed techniques is compared against the most recent microaggregation methods. Experimental results using benchmark datasets show that the proposed algorithms perform significantly better than existing associate techniques in the literature

    Mathematically optimized, recursive prepartitioning strategies for k-anonymous microaggregation of large-scale datasets

    Get PDF
    © Elsevier. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/The technical contents of this work fall within the statistical disclosure control (SDC) field, which concerns the postprocessing of the demographic portion of the statistical results of surveys containing sensitive personal information, in order to effectively safeguard the anonymity of the participating respondents. A widely known technique to solve the problem of protecting the privacy of the respondents involved beyond the mere suppression of their identifiers is the k-anonymous microaggregation. Unfortunately, most microaggregation algorithms that produce competitively low levels of distortions exhibit a superlinear running time, typically scaling with the square of the number of records in the dataset. This work proposes and analyzes an optimized prepartitioning strategy to reduce significantly the running time for the k-anonymous microaggregation algorithm operating on large datasets, with mild loss in data utility with respect to that of MDAV, the underlying method. The optimization strategy is based on prepartitioning a dataset recursively until the desired k-anonymity parameter is achieved. Traditional microaggregation algorithms have quadratic computational complexity in the form T(n2). By using the proposed method and fixing the number of recurrent prepartitions we obtain subquadratic complexity in the form T(n3/2), T(n4/3), ..., depending on the number of prepartitions. Alternatively, fixing the ratio between the size of the microcell and the macrocell on each prepartition, quasilinear complexity in the form T(nlog¿n) is achieved. Our method is readily applicable to large-scale datasets with numerical demographic attributes.Peer ReviewedPostprint (author's final draft

    Optimal assignment problem on record linkage

    Get PDF
    We present an application of the Hungarian Method, an optimal assignment graph theory algorithm, to record linkage in order to improve the disclosure risk assessment. We should note that Hungarian Method has O(n^3) complexity; three different methods are presented to reduce its computational cost

    Revisiting distance-based record linkage for privacy-preserving release of statistical datasets

    Get PDF
    Statistical Disclosure Control (SDC, for short) studies the problem of privacy-preserving data publishing in cases where the data is expected to be used for statistical analysis. An original dataset T containing sensitive information is transformed into a sanitized version T' which is released to the public. Both utility and privacy aspects are very important in this setting. For utility, T' must allow data miners or statisticians to obtain similar results to those which would have been obtained from the original dataset T. For privacy, T' must significantly reduce the ability of an adversary to infer sensitive information on the data subjects in T. One of the main a-posteriori measures that the SDC community has considered up to now when analyzing the privacy offered by a given protection method is the Distance-Based Record Linkage (DBRL) risk measure. In this work, we argue that the classical DBRL risk measure is insufficient. For this reason, we introduce the novel Global Distance-Based Record Linkage (GDBRL) risk measure. We claim that this new measure must be evaluated alongside the classical DBRL measure in order to better assess the risk in publishing T' instead of T. After that, we describe how this new measure can be computed by the data owner and discuss the scalability of those computations. We conclude by extensive experimentation where we compare the risk assessments offered by our novel measure as well as by the classical one, using well-known SDC protection methods. Those experiments validate our hypothesis that the GDBRL risk measure issues, in many cases, higher risk assessments than the classical DBRL measure. In other words, relying solely on the classical DBRL measure for risk assessment might be misleading, as the true risk may be in fact higher. Hence, we strongly recommend that the SDC community considers the new GDBRL risk measure as an additional measure when analyzing the privacy offered by SDC protection algorithms.Postprint (author's final draft

    Cloud based privacy preserving data mining model using hybrid k-anonymity and partial homomorphic encryption

    Get PDF
    The evolution of information and communication technologies have encourage numerous organizations to outsource their business and data to cloud computing to perform data mining and other data processing operations. Despite the great benefits of the cloud, it has a real problem in the security and privacy of data. Many studies explained that attackers often reveal the information from third-party services or third-party clouds. When a data owners outsource their data to the cloud, especially the SaaS cloud model, it is difficult to preserve the confidentiality and integrity of the data. Privacy-Preserving Data Mining (PPDM) aims to accomplish data mining operations while protecting the owner's data from violation. The current models of PPDM have some limitations. That is, they suffer from data disclosure caused by identity and attributes disclosure where some private information is revealed which causes the success of different types of attacks. Besides, existing solutions have poor data utility and high computational performance overhead. Therefore, this research aims to design and develop Hybrid Anonymization Cryptography PPDM (HAC-PPDM) model to improve the privacy-preserving level by reducing data disclosure before outsourcing data for mining over the cloud while maintaining data utility. The proposed HAC-PPDM model is further aimed reducing the computational performance overhead to improve efficiency. The Quasi-Identifiers Recognition algorithm (QIR) is defined and designed depending on attributes classification and Quasi-Identifiers dimension determine to overcome the identity disclosure caused by Quasi-Identifiers linking to reduce privacy leakage. An Enhanced Homomorphic Scheme is designed based on hybridizing Cloud-RSA encryption scheme, Extended Euclidean algorithm (EE), Fast Modular Exponentiation algorithm (FME), and Chinese Remainder Theorem (CRT) to minimize the computational time complexity while reducing the attribute disclosure. The proposed QIR, Enhanced Homomorphic Scheme and k-anonymity privacy model have been hybridized to obtain optimal data privacy-preservation before outsourced it on the cloud while maintaining the utility of data that meets the needs of mining with good efficiency. Real-world datasets have been used to evaluate the proposed algorithms and model. The experimental results show that the proposed QIR algorithm improved the data privacy-preserving percentage by 23% while maintaining the same or slightly better data utility. Meanwhile, the proposed Enhanced Homomorphic Scheme is more efficient comparing to the related works in terms of time complexity as represented by Big O notation. Moreover, it reduced the computational time of the encryption, decryption, and key generation time. Finally, the proposed HAC-PPDM model successfully reduced the data disclosures and improved the privacy-preserving level while preserved the data utility as it reduced the information loss. In short, it achieved improvement of privacy preserving and data mining (classification) accuracy by 7.59 % and 0.11 % respectively

    Contributions to Context-Aware Smart Healthcare: A Security and Privacy Perspective

    Get PDF
    Les tecnologies de la informació i la comunicació han canviat les nostres vides de manera irreversible. La indústria sanitària, una de les indústries més grans i de major creixement, està dedicant molts esforços per adoptar les últimes tecnologies en la pràctica mèdica diària. Per tant, no és sorprenent que els paradigmes sanitaris estiguin en constant evolució cercant serveis més eficients, eficaços i sostenibles. En aquest context, el potencial de la computació ubiqua mitjançant telèfons intel·ligents, rellotges intel·ligents i altres dispositius IoT ha esdevingut fonamental per recopilar grans volums de dades, especialment relacionats amb l'estat de salut i la ubicació de les persones. Les millores en les capacitats de detecció juntament amb l'aparició de xarxes de telecomunicacions d'alta velocitat han facilitat la implementació d'entorns sensibles al context, com les cases i les ciutats intel·ligents, capaços d'adaptar-se a les necessitats dels ciutadans. La interacció entre la computació ubiqua i els entorns sensibles al context va obrir la porta al paradigma de la salut intel·ligent, centrat en la prestació de serveis de salut personalitzats i de valor afegit mitjançant l'explotació de grans quantitats de dades sanitàries, de mobilitat i contextuals. No obstant, la gestió de dades sanitàries, des de la seva recollida fins a la seva anàlisi, planteja una sèrie de problemes desafiants a causa del seu caràcter altament confidencial. Aquesta tesi té per objectiu abordar diversos reptes de seguretat i privadesa dins del paradigma de la salut intel·ligent. Els resultats d'aquesta tesi pretenen ajudar a la comunitat científica a millorar la seguretat dels entorns intel·ligents del futur, així com la privadesa dels ciutadans respecte a les seves dades personals i sanitàries.Las tecnologías de la información y la comunicación han cambiado nuestras vidas de forma irreversible. La industria sanitaria, una de las industrias más grandes y de mayor crecimiento, está dedicando muchos esfuerzos por adoptar las últimas tecnologías en la práctica médica diaria. Por tanto, no es sorprendente que los paradigmas sanitarios estén en constante evolución en busca de servicios más eficientes, eficaces y sostenibles. En este contexto, el potencial de la computación ubicua mediante teléfonos inteligentes, relojes inteligentes, dispositivos wearables y otros dispositivos IoT ha sido fundamental para recopilar grandes volúmenes de datos, especialmente relacionados con el estado de salud y la localización de las personas. Las mejoras en las capacidades de detección junto con la aparición de redes de telecomunicaciones de alta velocidad han facilitado la implementación de entornos sensibles al contexto, como las casas y las ciudades inteligentes, capaces de adaptarse a las necesidades de los ciudadanos. La interacción entre la computación ubicua y los entornos sensibles al contexto abrió la puerta al paradigma de la salud inteligente, centrado en la prestación de servicios de salud personalizados y de valor añadido mediante la explotación significativa de grandes cantidades de datos sanitarios, de movilidad y contextuales. No obstante, la gestión de datos sanitarios, desde su recogida hasta su análisis, plantea una serie de cuestiones desafiantes debido a su naturaleza altamente confidencial. Esta tesis tiene por objetivo abordar varios retos de seguridad y privacidad dentro del paradigma de la salud inteligente. Los resultados de esta tesis pretenden ayudar a la comunidad científica a mejorar la seguridad de los entornos inteligentes del futuro, así como la privacidad de los ciudadanos con respecto a sus datos personales y sanitarios.Information and communication technologies have irreversibly changed our lives. The healthcare industry, one of the world’s largest and fastest-growing industries, is dedicating many efforts in adopting the latest technologies into daily medical practice. It is not therefore surprising that healthcare paradigms are constantly evolving seeking for more efficient, effective and sustainable services. In this context, the potential of ubiquitous computing through smartphones, smartwatches, wearables and IoT devices has become fundamental to collect large volumes of data, including people's health status and people’s location. The enhanced sensing capabilities together with the emergence of high-speed telecommunication networks have facilitated the implementation of context-aware environments, such as smart homes and smart cities, able to adapt themselves to the citizens needs. The interplay between ubiquitous computing and context-aware environments opened the door to the so-called smart health paradigm, focused on the provision of added-value personalised health services by meaningfully exploiting vast amounts of health, mobility and contextual data. However, the management of health data, from their gathering to their analysis, arises a number of challenging issues due to their highly confidential nature. In particular, this dissertation addresses several security and privacy challenges within the smart health paradigm. The results of this dissertation are intended to help the research community to enhance the security of the intelligent environments of the future as well as the privacy of the citizens regarding their personal and health data

    Privacy-Preserving Crowdsourcing-Based Recommender Systems for E-Commerce & Health Services

    Get PDF
    En l’actualitat, els sistemes de recomanació han esdevingut un mecanisme fonamental per proporcionar als usuaris informació útil i filtrada, amb l’objectiu d’optimitzar la presa de decisions, com per exemple, en el camp del comerç electrònic. La quantitat de dades existent a Internet és tan extensa que els usuaris necessiten sistemes automàtics per ajudar-los a distingir entre informació valuosa i soroll. No obstant, sistemes de recomanació com el Filtratge Col·laboratiu tenen diverses limitacions, com ara la manca de resposta i la privadesa. Una part important d'aquesta tesi es dedica al desenvolupament de metodologies per fer front a aquestes limitacions. A més de les aportacions anteriors, en aquesta tesi també ens centrem en el procés d'urbanització que s'està produint a tot el món i en la necessitat de crear ciutats més sostenibles i habitables. En aquest context, ens proposem solucions de salut intel·ligent (s-health) i metodologies eficients de caracterització de canals sense fils, per tal de proporcionar assistència sanitària sostenible en el context de les ciutats intel·ligents.En la actualidad, los sistemas de recomendación se han convertido en una herramienta indispensable para proporcionar a los usuarios información útil y filtrada, con el objetivo de optimizar la toma de decisiones en una gran variedad de contextos. La cantidad de datos existente en Internet es tan extensa que los usuarios necesitan sistemas automáticos para ayudarles a distinguir entre información valiosa y ruido. Sin embargo, sistemas de recomendación como el Filtrado Colaborativo tienen varias limitaciones, tales como la falta de respuesta y la privacidad. Una parte importante de esta tesis se dedica al desarrollo de metodologías para hacer frente a esas limitaciones. Además de las aportaciones anteriores, en esta tesis también nos centramos en el proceso de urbanización que está teniendo lugar en todo el mundo y en la necesidad de crear ciudades más sostenibles y habitables. En este contexto, proponemos soluciones de salud inteligente (s-health) y metodologías eficientes de caracterización de canales inalámbricos, con el fin de proporcionar asistencia sanitaria sostenible en el contexto de las ciudades inteligentes.Our society lives an age where the eagerness for information has resulted in problems such as infobesity, especially after the arrival of Web 2.0. In this context, automatic systems such as recommenders are increasing their relevance, since they help to distinguish noise from useful information. However, recommender systems such as Collaborative Filtering have several limitations such as non-response and privacy. An important part of this thesis is devoted to the development of methodologies to cope with these limitations. In addition to the previously stated research topics, in this dissertation we also focus in the worldwide process of urbanisation that is taking place and the need for more sustainable and liveable cities. In this context, we focus on smart health solutions and efficient wireless channel characterisation methodologies, in order to provide sustainable healthcare in the context of smart cities

    Optimization of privacy-utility trade-offs under informational self-determination

    No full text
    The pervasiveness of Internet of Things results in vast volumes of personal data generated by smart devices of users (data producers) such as smart phones, wearables and other embedded sensors. It is a common requirement, especially for Big Data analytics systems, to transfer these large in scale and distributed data to centralized computational systems for analysis. Nevertheless, third parties that run and manage these systems (data consumers) do not always guarantee users’ privacy. Their primary interest is to improve utility that is usually a metric related to the performance, costs and the quality of service. There are several techniques that mask user-generated data to ensure privacy, e.g. differential privacy. Setting up a process for masking data, referred to in this paper as a ‘privacy setting’, decreases on the one hand the utility of data analytics, while, on the other hand, increases privacy. This paper studies parameterizations of privacy settings that regulate the trade-off between maximum utility, minimum privacy and minimum utility, maximum privacy, where utility refers to the accuracy in the estimations of aggregation functions. Privacy settings can be universally applied as system-wide parameterizations and policies (homogeneous data sharing). Nonetheless they can also be applied autonomously by each user or decided under the influence of (monetary) incentives (heterogeneous data sharing). This latter diversity in data sharing by informational self-determination plays a key role on the privacy-utility trajectories as shown in this paper both theoretically and empirically. A generic and novel computational framework is introduced for measuring privacy-utility trade-offs and their Pareto optimization. The framework computes a broad spectrum of such trade-offs that form privacy-utility trajectories under homogeneous and heterogeneous data sharing. The practical use of the framework is experimentally evaluated using real-world data from a Smart Grid pilot project in which energy consumers protect their privacy by regulating the quality of the shared power demand data, while utility companies make accurate estimations of the aggregate load in the network to manage the power grid. Over 20,000 differential privacy settings are applied to shape the computational trajectories that in turn provide a vast potential for data consumers and producers to participate in viable participatory data sharing systems

    A Comprehensive Bibliometric Analysis on Social Network Anonymization: Current Approaches and Future Directions

    Full text link
    In recent decades, social network anonymization has become a crucial research field due to its pivotal role in preserving users' privacy. However, the high diversity of approaches introduced in relevant studies poses a challenge to gaining a profound understanding of the field. In response to this, the current study presents an exhaustive and well-structured bibliometric analysis of the social network anonymization field. To begin our research, related studies from the period of 2007-2022 were collected from the Scopus Database then pre-processed. Following this, the VOSviewer was used to visualize the network of authors' keywords. Subsequently, extensive statistical and network analyses were performed to identify the most prominent keywords and trending topics. Additionally, the application of co-word analysis through SciMAT and the Alluvial diagram allowed us to explore the themes of social network anonymization and scrutinize their evolution over time. These analyses culminated in an innovative taxonomy of the existing approaches and anticipation of potential trends in this domain. To the best of our knowledge, this is the first bibliometric analysis in the social network anonymization field, which offers a deeper understanding of the current state and an insightful roadmap for future research in this domain.Comment: 73 pages, 28 figure
    corecore