6,676 research outputs found

    Feature Based Data Anonymization for High Dimensional Data

    Get PDF
    Information surges and advances in machine learning tools have enable the collection and storage of large amounts of data. These data are highly dimensional.  Individuals are deeply concerned about the consequences of sharing and publishing these data as it may contain their personal information and may compromise their privacy. Anonymization techniques have been used widely to protect sensitive information in published datasets. However, the anonymization of high dimensional data while balancing between privacy and utility is a challenge. In this paper we use feature selection with information gain and ranking to demonstrate that the challenge of high dimensionality in data can be addressed by anonymizing attributes with more irrelevant features. We conduct experiments with real life datasets and build classifiers with the anonymized datasets. Our results show that by combining feature selection with slicing and reducing the amount of data distortion for features with high relevance in a dataset, the utility of anonymized dataset can be enhanced. Keywords: High Dimension, Privacy, Anonymization, Feature Selection, Classifier, Utility DOI: 10.7176/JIEA/9-2-03 Publication date: April 30th 201

    Comparative Analysis of Privacy Preservation Mechanism: Assessing Trustworthy Cloud Services with a Hybrid Framework and Swarm Intelligence

    Get PDF
    Cloud computing has emerged as a prominent field in modern computational technology, offering diverse services and resources. However, it has also raised pressing concerns regarding data privacy and the trustworthiness of cloud service providers. Previous works have grappled with these challenges, but many have fallen short in providing comprehensive solutions. In this context, this research proposes a novel framework designed to address the issues of maintaining data privacy and fostering trust in cloud computing services. The primary objective of this work is to develop a robust and integrated solution that safeguards sensitive data and enhances trust in cloud service providers. The proposed architecture encompasses a series of key components, including data collection and preprocessing with k-anonymity, trust generation using the Firefly Algorithm, Ant Colony Optimization for task scheduling and resource allocation, hybrid framework integration, and privacy-preserving computation. The scientific contribution of this work lies in the integration of multiple optimization techniques, such as the Firefly Algorithm and Ant Colony Optimization, to select reliable cloud service providers while considering trust factors and task/resource allocation. Furthermore, the proposed framework ensures data privacy through k-anonymity compliance, dynamic resource allocation, and privacy-preserving computation techniques such as differential privacy and homomorphic encryption. The outcomes of this research provide a comprehensive solution to the complex challenges of data privacy and trust in cloud computing services. By combining these techniques into a hybrid framework, this work contributes to the advancement of secure and effective cloud-based operations, offering a substantial step forward in addressing the critical issues faced by organizations and individuals in an increasingly interconnected digital landscape

    Privacy-Preserving Reengineering of Model-View-Controller Application Architectures Using Linked Data

    Get PDF
    When a legacy system’s software architecture cannot be redesigned, implementing additional privacy requirements is often complex, unreliable and costly to maintain. This paper presents a privacy-by-design approach to reengineer web applications as linked data-enabled and implement access control and privacy preservation properties. The method is based on the knowledge of the application architecture, which for the Web of data is commonly designed on the basis of a model-view-controller pattern. Whereas wrapping techniques commonly used to link data of web applications duplicate the security source code, the new approach allows for the controlled disclosure of an application’s data, while preserving non-functional properties such as privacy preservation. The solution has been implemented and compared with existing linked data frameworks in terms of reliability, maintainability and complexity

    A Survey and Experimental Study on Privacy-Preserving Trajectory Data Publishing

    Get PDF
    Trajectory data has become ubiquitous nowadays, which can benefit various real-world applications such as traffic management and location-based services. However, trajectories may disclose highly sensitive information of an individual including mobility patterns, personal profiles and gazetteers, social relationships, etc, making it indispensable to consider privacy protection when releasing trajectory data. Ensuring privacy on trajectories demands more than hiding single locations, since trajectories are intrinsically sparse and high-dimensional, and require to protect multi-scale correlations. To this end, extensive research has been conducted to design effective techniques for privacy-preserving trajectory data publishing. Furthermore, protecting privacy requires carefully balance two metrics: privacy and utility. In other words, it needs to protect as much privacy as possible and meanwhile guarantee the usefulness of the released trajectories for data analysis. In this survey, we provide a comprehensive study and a systematic summarization of existing protection models, privacy and utility metrics for trajectories developed in the literature. We also conduct extensive experiments on two real-life public trajectory datasets to evaluate the performance of several representative privacy protection models, demonstrate the trade-off between privacy and utility, and guide the choice of the right privacy model for trajectory publishing given certain privacy and utility desiderata

    Utility-Based Privacy Preserving Data Publishing

    Get PDF
    Advances in data collection techniques and need for automation triggered in proliferation of a huge amount of data. This exponential increase in the collection of personal information has for some time represented a serious threat to privacy. With the advancement of technologies for data storage, data mining, machine learning, social networking and cloud computing, the problem is further fueled. Privacy is a fundamental right of every human being and needs to be preserved. As a counterbalance to the socio-technical transformations, most nations have both general policies on preserving privacy and specic legislation to control access to and use of data. Privacy preserving data publishing is the ability to control the dissemination and use of one's personal information. Mere publishing (or sharing) of original data in raw form results in identity disclosure with linkage attacks. To overcome linkage attacks, the techniques of statistical disclosure control are employed. One such approach is k-anonymity that reduce data across a set of key variables to a set of classes. In a k-anonymized dataset each record is indistinguishable from at least k-1 others, meaning that an attacker cannot link the data records to population units with certainty thus reducing the probability of disclosure. Algorithms that have been proposed to enforce k-anonymity are Samarati's algorithm and Sweeney's Datafly algorithm. Both of these algorithms adhere to full domain generalization with global recording. These methods have a tradeo between utility, computing time and information loss. A good privacy preserving technique should ensure a balance of utility and privacy, giving good performance and level of uncertainty. In this thesis, we propose an improved greedy heuristic that maintains a balance between utility, privacy, computing time and information loss. Given a dataset and k, constructing the dataset to k-anonymous dataset can be done by the above-mentioned schemes. One of the challenges is to nd the best value of k, when the dataset is provided. In this thesis, a scheme has been proposed to achieve the best value of k for a given dataset. The k-anonymity scheme suers from homogeneity attack. As a result, the l-diverse scheme was developed. It states that the diversity of domain values of the dataset in an equivalence class should be l. The l-diversity scheme suers from background knowledge attack. To address this problem, t-closeness scheme was proposed. The t-closeness principle states that the distribution of records in an equivalence class and the distribution of records in the table should not exceed more than t. The drawback with this scheme is that, the distance metric deployed in constructing a table, satisfying t-closeness, does not follow the distance characteristics. In this thesis, we have deployed an alternative distance metric namely, Hellinger metric, for constructing a t-closeness table. The t-closeness scheme with this alternative distance metric performed better with respect to the discernability metric and computing time. The k-anonymity, l-diversity and t-closeness schemes can be used to anonymize the dataset before publishing (releasing or sharing). This is generally in a static environment. There are also data that need to be published in a dynamic environment. One such example is a social network. Anonymizing social networks poses great challenges. Solutions suggested till date do not consider utility of the data while anonymizing. In this thesis, we propose a novel scheme to anonymize the users depending on their importance and take utility into consideration. Importance of a node was decided by the centrality and prestige measures. Hence, the utility and privacy of the users are balanced

    BETWEEN FOOTPRINTS: BALANCING ENVIRONMENTAL SUSTAINABILITY AND PRIVACY IN SMART TOURISM DESTINATIONS

    Get PDF
    Data lies at the core of all smart tourism activities as tourists engage in different and personalized touristic services whilst the pre/during/post traveling or in holidays. From these interactions, a digital data trail is seamlessly captured in a technology embedded environment, and then mined and harnessed in the context of STD - Smart Tourist Destinations to create enriched, high-value experiences, namely those related to eco-responsibility, as well as granting destinations with competitive advantages. At the same time, these technologies enable tourism destinations for an optimization of the use natural resources and energy, as well as for the preservation of natural spaces, in short, reducing the “ecological footprint” of tourism. However, this comes with a cost, an increased “data footprint”. Therefore, the perceived enjoyment of experiences must be considered within the legal framework of Privacy and Data Protection by exposing inherent risks, analysing the available answers given by the GDPR – the General Data Protection Regulation of the European Union. Hence the purpose of this paper is i. to singularize the specificities of Smart Tourism Destinations; ii. to show how the principles of personal data protection, as set forth by the GDPR, are allocated within the STD realm; iii. and, finally, to derive potential legal implications of this ecosystem. Our approach is based on a legal analysis engaged in scholarship research. We have mostly denoted the underestimation of the legal implications of technology-enhanced tourism experiences, and the marginalization of both informed involvement and awareness by the individual in these processes. This study is novel in having undertaken an initial exploration of the legal implications of experiences taking place by STD.Los datos están en la base misma de todas las actividades turísticas inteligentes ya que los turistas se quedan inmersos en servicios distintos y personalizados antes/durante/después de los viajes o de las vacaciones.  De estas interacciones, un rastro es obtenido de un modo imperceptible a través de un medioambiente embutido en tecnología, el cual es a continuación extraído y almacenado en el contexto de los DTI - Destinos Turísticos Inteligentes para crear experiencias valiosas, señaladamente las relacionadas con la eco-responsabilidad, y bien así proporcionando ventajas competitivas a eses destinos. Asimismo, estas tecnologías permiten a los destinos turísticos una optimización del uso de los recursos naturales y de la energía, además de la preservación de los espacios naturales, en síntesis, reducen la “huella ecológica” del turismo. Sin embargo, esto tiene un coste, el incremento de la “huella de los datos”. Por ello, el disfrute apercibido de experiencias tendrá de ser tenido en cuenta en el marco normativo del RGPD – Reglamento General sobre Protección de Datos de la Unión Europea. Por ende, los objetivos de este artículo son los siguientes: i. identificar las especificidades de los Destinos Turísticos Inteligentes; ii. enseñar como los principios de la protección de datos, tal como están en el RGPD, son relevantes para los DTI; iii, en último lugar, evaluar las consecuencias jurídicas potenciales de este ecosistema. Nuestro enfoque se basa en un análisis jurídico de naturaleza académica. En especial, buscamos poner en evidencia como las implicaciones jurídicas de las experiencias turísticas reforzadas por las tecnologías han sido subestimadas, al igual que la participación informada y consciente de las personas en estos procesos. Este estudio es novedoso al haber emprendido una exploración inicial de las implicaciones jurídicas que resultan de experiencias que ocurren en los DTI.Os dados estão na base de todas as atividades turísticas inteligentes pois os turistas ficam envolvidos em serviços diferentes e personalizados antes/durante/depois das viagens ou das férias. Para estas interações, um rastro de dados é imperceptivelmente obtido por um meio ambiente embebido em tecnologia, sendo depois minerado e armazenado no contexto de Destinos Turísticos Inteligentes para criar experiências valiosas, designadamente relacionadas com a eco-responsabilidade, assim como facultando vantagens competitivas a tais destinos. Ao mesmo tempo, estas tecnologias permitem aos destinos turísticos uma otimização do uso de recursos naturais e da energia, assim como a preservação dos espaços naturais, em síntese, reduzindo a “pegada ecológica” do turismo. Porém, isto ocorre com um custo, o de uma “pegada de dados” acrescida. Consequentemente, a fruição apercebida de experiências tem de ser considerada no contexto normativo da Privacidade e da Proteção de Dados proteção de dados expondo os riscos potencias relacionados que lhe são inerentes, analisando as respostas das pelo RGPD - Regulamento Geral sobre Proteção de Dados da União Europeia. Assim, os objetivos do artigo são os seguintes: i. identificar as especificidades dos Destinos Turísticos Inteligentes; ii. mostrar como os princípios da proteção de dados, tal como constam do RGPD, são relevantes para os DTI; iii, finalmente, avaliar as consequências jurídicas potenciais deste ecossistema. A nossa perspectiva assenta numa análise jurídica de natureza académica. Sobretudo, procuramos mostrar como as implicações jurídicas das experiências turísticas reforçadas pelas tecnologias têm sido subestimadas, tal como o envolvimento informado e consciente das pessoas nestes processos. Este estudo é novo ao ter empreendido uma exploração inicial das implicações jurídicas que resultam das experiências que têm lugar nos DTI
    corecore