25 research outputs found

    F-Classify: Fuzzy Rule Based Classification Method for Privacy Preservation of Multiple Sensitive Attributes

    Get PDF
    With the advent of smart health, smart cities, and smart grids, the amount of data has grown swiftly. When the collected data is published for valuable information mining, privacy turns out to be a key matter due to the presence of sensitive information. Such sensitive information comprises either a single sensitive attribute (an individual has only one sensitive attribute) or multiple sensitive attributes (an individual can have multiple sensitive attributes). Anonymization of data sets with multiple sensitive attributes presents some unique problems due to the correlation among these attributes. Artificial intelligence techniques can help the data publishers in anonymizing such data. To the best of our knowledge, no fuzzy logic-based privacy model has been proposed until now for privacy preservation of multiple sensitive attributes. In this paper, we propose a novel privacy preserving model F-Classify that uses fuzzy logic for the classification of quasi-identifier and multiple sensitive attributes. Classes are defined based on defined rules, and every tuple is assigned to its class according to attribute value. The working of the F-Classify Algorithm is also verified using HLPN. A wide range of experiments on healthcare data sets acknowledged that F-Classify surpasses its counterparts in terms of privacy and utility. Being based on artificial intelligence, it has a lower execution time than other approaches

    Algorithms to anonymize structured medical and healthcare data:A systematic review

    Get PDF
    Introduction: With many anonymization algorithms developed for structured medical health data (SMHD) in the last decade, our systematic review provides a comprehensive bird’s eye view of algorithms for SMHD anonymization. Methods: This systematic review was conducted according to the recommendations in the Cochrane Handbook for Reviews of Interventions and reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). Eligible articles from the PubMed, ACM digital library, Medline, IEEE, Embase, Web of Science Collection, Scopus, ProQuest Dissertation, and Theses Global databases were identified through systematic searches. The following parameters were extracted from the eligible studies: author, year of publication, sample size, and relevant algorithms and/or software applied to anonymize SMHD, along with the summary of outcomes. Results: Among 1,804 initial hits, the present study considered 63 records including research articles, reviews, and books. Seventy five evaluated the anonymization of demographic data, 18 assessed diagnosis codes, and 3 assessed genomic data. One of the most common approaches was k-anonymity, which was utilized mainly for demographic data, often in combination with another algorithm; e.g., l-diversity. No approaches have yet been developed for protection against membership disclosure attacks on diagnosis codes. Conclusion: This study reviewed and categorized different anonymization approaches for MHD according to the anonymized data types (demographics, diagnosis codes, and genomic data). Further research is needed to develop more efficient algorithms for the anonymization of diagnosis codes and genomic data. The risk of reidentification can be minimized with adequate application of the addressed anonymization approaches. Systematic Review Registration: [http://www.crd.york.ac.uk/prospero], identifier [CRD42021228200].</p

    A Novel Privacy Disclosure Risk Measure and Optimizing Privacy Preserving Data Publishing Techniques

    Get PDF
    A tremendous amount of individual-level data is generated each day, with a wide variety of uses. This data often contains sensitive information about individuals, which can be disclosed by “adversaries”. Even when direct identifiers such as social security numbers are masked, an adversary may be able to recognize an individual\u27s identity for a data record by looking at the values of quasi-identifiers (QID), known as identity disclosure, or can uncover sensitive attributes (SA) about an individual through attribute disclosure. In data privacy field, multiple disclosure risk measures have been proposed. These share two drawbacks: they do not consider identity and attribute disclosure concurrently, and they make restrictive assumptions on an adversary\u27s knowledge and disclosure target by assuming certain attributes are QIDs and SAs with clear boundary in between. In this study, we present a Flexible Adversary Disclosure Risk (FADR) measure that addresses these limitations, by presenting a single combined metric of identity and attribute disclosure, and considering all scenarios for an adversary’s knowledge and disclosure targets while providing the flexibility to model a specific disclosure preference. In addition, we employ FADR measure to develop our novel “RU Generalization” algorithm that anonymizes a sensitive dataset to be able to publish the data for public access while preserving the privacy of individuals in the dataset. The challenge is to preserve privacy without incurring excessive information loss. Our RU Generalization algorithm is a greedy heuristic algorithm, which aims at minimizing the combination of both disclosure risk and information loss, to obtain an optimized anonymized dataset. We have conducted a set of experiments on a benchmark dataset from 1994 Census database, to evaluate both our FADR measure and RU Generalization algorithm. We have shown the robustness of our FADR measure and the effectiveness of our RU Generalization algorithm by comparing with the benchmark anonymization algorithm

    Privacy-Preserving Data Publication for Static and Streaming Data

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Aggregating privatized medical data for secure querying applications

    Full text link
    &nbsp;This thesis analyses and examines the challenges of aggregation of sensitive data and data querying on aggregated data at cloud server. This thesis also delineates applications of aggregation of sensitive medical data in several application scenarios, and tests privatization techniques to assist in improving the strength of privacy and utility

    Garantia de privacidade na exploração de bases de dados distribuídas

    Get PDF
    Anonymisation is currently one of the biggest challenges when sharing sensitive personal information. Its importance depends largely on the application domain, but when dealing with health information, this becomes a more serious issue. A simpler approach to avoid this disclosure is to ensure that all data that can be associated directly with an individual is removed from the original dataset. However, some studies have shown that simple anonymisation procedures can sometimes be reverted using specific patients’ characteristics, namely when the anonymisation is based on hidden key attributes. In this work, we propose a secure architecture to share information from distributed databases without compromising the subjects’ privacy. The work was initially focused on identifying techniques to link information between multiple data sources, in order to revert the anonymization procedures. In a second phase, we developed the methodology to perform queries over distributed databases was proposed. The architecture was validated using a standard data schema that is widely adopted in observational research studies.A garantia da anonimização de dados é atualmente um dos maiores desafios quando existe a necessidade de partilhar informações pessoais de carácter sensível. Apesar de ser um problema transversal a muitos domínios de aplicação, este torna-se mais crítico quando a anonimização envolve dados clinicos. Nestes casos, a abordagem mais comum para evitar a divulgação de dados, que possam ser associados diretamente a um indivíduo, consiste na remoção de atributos identificadores. No entanto, segundo a literatura, esta abordagem não oferece uma garantia total de anonimato, que pode ser quebrada através de ataques específicos que permitem a reidentificação dos sujeitos. Neste trabalho, é proposta uma arquitetura que permite partilhar dados armazenados em repositórios distribuídos, de forma segura e sem comprometer a privacidade. Numa primeira fase deste trabalho, foi feita uma análise de técnicas que permitam reverter os procedimentos de anonimização. Na fase seguinte, foi proposta uma metodologia que permite realizar pesquisas em bases de dados distribuídas, sem que o anonimato seja quebrado. Esta arquitetura foi validada sobre um esquema de base de dados relacional que é amplamente utilizado em estudos clínicos observacionais.Mestrado em Ciberseguranç

    Privacy Preserving Data-as-a-Service Mashups

    Get PDF
    Data-as-a-Service (DaaS) is a paradigm that provides data on demand to consumers across different cloud platforms over the Internet. Yet, a single DaaS provider may not be able to fulfill a data request. Consequently, the concept of DaaS mashup was introduced to enable DaaS providers to dynamically integrate their data on demand depending on consumers’ requests. Utilizing DaaS mashup, however, involves some challenges. Mashing up data from multiple sources to answer a consumer’s request might reveal sensitive information and thereby compromise the privacy of individuals. Moreover, data integration of arbitrary DaaS providers might not always be sufficient to answer incoming requests. In this thesis, we provide a cloud-based framework for privacy-preserving DaaS mashup that enables secure collaboration between DaaS providers for the purpose of generating an anonymous dataset to support data mining. We propose a greedy algorithm to determine a suitable group of DaaS providers whose data can satisfy a given request. Furthermore, our framework securely integrates the data from multiple DaaS providers while preserving the privacy of the resulting mashup data. Experiments on real-life data demonstrate that our DaaS mashup framework is scalable to large set of databases and it can efficiently and effectively satisfy the data privacy and data mining requirements specified by the DaaS providers and the data consumers
    corecore