54 research outputs found

    z-anonymity: Zero-Delay Anonymization for Data Streams

    Get PDF
    With the advent of big data and the birth of the data markets that sell personal information, individuals' privacy is of utmost importance. The classical response is anonymization, i.e., sanitizing the information that can directly or indirectly allow users' re-identification. The most popular solution in the literature is the k-anonymity. However, it is hard to achieve k-anonymity on a continuous stream of data, as well as when the number of dimensions becomes high.In this paper, we propose a novel anonymization property called z-anonymity. Differently from k-anonymity, it can be achieved with zero-delay on data streams and it is well suited for high dimensional data. The idea at the base of z-anonymity is to release an attribute (an atomic information) about a user only if at least z - 1 other users have presented the same attribute in a past time window. z-anonymity is weaker than k-anonymity since it does not work on the combinations of attributes, but treats them individually. In this paper, we present a probabilistic framework to map the z-anonymity into the k-anonymity property. Our results show that a proper choice of the z-anonymity parameters allows the data curator to likely obtain a k-anonymized dataset, with a precisely measurable probability. We also evaluate a real use case, in which we consider the website visits of a population of users and show that z-anonymity can work in practice for obtaining the k-anonymity too

    Cloud based privacy preserving data mining model using hybrid k-anonymity and partial homomorphic encryption

    Get PDF
    The evolution of information and communication technologies have encourage numerous organizations to outsource their business and data to cloud computing to perform data mining and other data processing operations. Despite the great benefits of the cloud, it has a real problem in the security and privacy of data. Many studies explained that attackers often reveal the information from third-party services or third-party clouds. When a data owners outsource their data to the cloud, especially the SaaS cloud model, it is difficult to preserve the confidentiality and integrity of the data. Privacy-Preserving Data Mining (PPDM) aims to accomplish data mining operations while protecting the owner's data from violation. The current models of PPDM have some limitations. That is, they suffer from data disclosure caused by identity and attributes disclosure where some private information is revealed which causes the success of different types of attacks. Besides, existing solutions have poor data utility and high computational performance overhead. Therefore, this research aims to design and develop Hybrid Anonymization Cryptography PPDM (HAC-PPDM) model to improve the privacy-preserving level by reducing data disclosure before outsourcing data for mining over the cloud while maintaining data utility. The proposed HAC-PPDM model is further aimed reducing the computational performance overhead to improve efficiency. The Quasi-Identifiers Recognition algorithm (QIR) is defined and designed depending on attributes classification and Quasi-Identifiers dimension determine to overcome the identity disclosure caused by Quasi-Identifiers linking to reduce privacy leakage. An Enhanced Homomorphic Scheme is designed based on hybridizing Cloud-RSA encryption scheme, Extended Euclidean algorithm (EE), Fast Modular Exponentiation algorithm (FME), and Chinese Remainder Theorem (CRT) to minimize the computational time complexity while reducing the attribute disclosure. The proposed QIR, Enhanced Homomorphic Scheme and k-anonymity privacy model have been hybridized to obtain optimal data privacy-preservation before outsourced it on the cloud while maintaining the utility of data that meets the needs of mining with good efficiency. Real-world datasets have been used to evaluate the proposed algorithms and model. The experimental results show that the proposed QIR algorithm improved the data privacy-preserving percentage by 23% while maintaining the same or slightly better data utility. Meanwhile, the proposed Enhanced Homomorphic Scheme is more efficient comparing to the related works in terms of time complexity as represented by Big O notation. Moreover, it reduced the computational time of the encryption, decryption, and key generation time. Finally, the proposed HAC-PPDM model successfully reduced the data disclosures and improved the privacy-preserving level while preserved the data utility as it reduced the information loss. In short, it achieved improvement of privacy preserving and data mining (classification) accuracy by 7.59 % and 0.11 % respectively

    Privacy Preserving Attribute-Focused Anonymization Scheme for Healthcare Data Publishing

    Get PDF
    Advancements in Industry 4.0 brought tremendous improvements in the healthcare sector, such as better quality of treatment, enhanced communication, remote monitoring, and reduced cost. Sharing healthcare data with healthcare providers is crucial for harnessing the benefits of such improvements. In general, healthcare data holds sensitive information about individuals. Hence, sharing such data is challenging because of various security and privacy issues. According to privacy regulations and ethical requirements, it is essential to preserve the privacy of patients before sharing data for medical research. State-of-the-art literature on privacy preserving studies either uses cryptographic approaches to protect the privacy or uses anonymizing techniques regardless of the type of attributes, this results in poor protection and data utility. In this paper, we propose an attribute-focused privacy preserving data publishing scheme. The proposed scheme is two-fold, comprising a fixed-interval approach to protect numerical attributes and an improved l -diverse slicing approach to protect the categorical and sensitive attributes. In the fixed-interval approach, the original values of the healthcare data are replaced with an equivalent computed value. The improved l -diverse slicing approach partitions the data both horizontally and vertically to avoid privacy leaks. Extensive experiments with real-world datasets are conducted to evaluate the performance of the proposed scheme. The classification models built on anonymized dataset yields approximately 13% better accuracy than benchmarked algorithms. Experimental analyses show that the average information loss which is measured by normalized certainty penalty (NCP) is reduced by 12% compared to similar approaches. The attribute focused scheme not only provides data utility but also prevents the data from membership disclosures, attribute disclosures, and identity disclosures

    Practical anonymization for data streams: z-anonymity and relation with k-anonymity

    Get PDF
    With the advent of big data and the emergence of data markets, preserving individuals’ privacy has become of utmost importance. The classical response to this need is anonymization, i.e., sanitizing the information that, directly or indirectly, can allow users’ re-identification. Among the various approaches, -anonymity provides a simple and easy-to-understand protection. However, -anonymity is challenging to achieve in a continuous stream of data and scales poorly when the number of attributes becomes high. In this paper, we study a novel anonymization property called -anonymity that we explicitly design to deal with data streams, i.e., where the decision to publish a given attribute (atomic information) is made in real time. The idea at the base of -anonymity is to release such attribute about a user only if at least other users have exposed the same attribute in a past time window. Depending on the value of , the output stream results -anonymized with a certain probability. To this end, we present a probabilistic model to map the -anonymity into the -anonymity property. The model is not only helpful in studying the -anonymity property, but also general enough to evaluate the probability of achieving -anonymity in data streams, resulting in a generic contribution

    A Taxonomy for and Analysis of Anonymous Communications Networks

    Get PDF
    Any entity operating in cyberspace is susceptible to debilitating attacks. With cyber attacks intended to gather intelligence and disrupt communications rapidly replacing the threat of conventional and nuclear attacks, a new age of warfare is at hand. In 2003, the United States acknowledged that the speed and anonymity of cyber attacks makes distinguishing among the actions of terrorists, criminals, and nation states difficult. Even President Obama’s Cybersecurity Chief-elect recognizes the challenge of increasingly sophisticated cyber attacks. Now through April 2009, the White House is reviewing federal cyber initiatives to protect US citizen privacy rights. Indeed, the rising quantity and ubiquity of new surveillance technologies in cyberspace enables instant, undetectable, and unsolicited information collection about entities. Hence, anonymity and privacy are becoming increasingly important issues. Anonymization enables entities to protect their data and systems from a diverse set of cyber attacks and preserves privacy. This research provides a systematic analysis of anonymity degradation, preservation and elimination in cyberspace to enhance the security of information assets. This includes discovery/obfuscation of identities and actions of/from potential adversaries. First, novel taxonomies are developed for classifying and comparing well-established anonymous networking protocols. These expand the classical definition of anonymity and capture the peer-to-peer and mobile ad hoc anonymous protocol family relationships. Second, a unique synthesis of state-of-the-art anonymity metrics is provided. This significantly aids an entity’s ability to reliably measure changing anonymity levels; thereby, increasing their ability to defend against cyber attacks. Finally, a novel epistemic-based mathematical model is created to characterize how an adversary reasons with knowledge to degrade anonymity. This offers multiple anonymity property representations and well-defined logical proofs to ensure the accuracy and correctness of current and future anonymous network protocol design

    On the cyber security issues of the internet infrastructure

    Get PDF
    The Internet network has received huge attentions by the research community. At a first glance, the network optimization and scalability issues dominate the efforts of researchers and vendors. Many results have been obtained in the last decades: the Internet’s architecture is optimized to be cheap, robust and ubiquitous. In contrast, such a network has never been perfectly secure. During all its evolution, the security threats of the Internet persist as a transversal and endless topic. Nowadays, the Internet network hosts a multitude of mission critical activities. The electronic voting systems and financial services are carried out through it. Governmental institutions, financial and business organizations depend on the performance and the security of the Internet. This role confers to the Internet network a critical characterization. At the same time, the Internet network is a vector of malicious activities, like Denial of Service attacks; many reports of attacks can be found in both academic outcomes and daily news. In order to mitigate this wide range of issues, many research efforts have been carried out in the past decades; unfortunately, the complex architecture and the scale of the Internet make hard the evaluation and the adoption of such proposals. In order to improve the security of the Internet, the research community can benefit from sharing real network data. Unfortunately, privacy and security concerns inhibit the release of these data: its suffices to imagine the big amount of private information (e.g., political preferences or religious belief) it is possible to get while reading the Internet packets exchanged between users and web services. This scenario motivates my research, and represents the context of this dissertation which contributes to the analysis of the security issues of the Internet infrastructures and describes relevant security proposals. In particular, the main outcomes described in this dissertation are: • the definition of a secure routing protocol for the Internet network able to provide cryptographic guarantees against false route announcement and invalid path attack; • the definition of a new obfuscation technique that allow the research community to publicly release their real network flows with formal guarantees of security and privacy; • the evidence of a new kind of leakage of sensitive informations obtained hacking the models used by sundry Machine Learning Algorithms

    Secure and Efficient Comparisons between Untrusted Parties

    Get PDF
    A vast number of online services is based on users contributing their personal information. Examples are manifold, including social networks, electronic commerce, sharing websites, lodging platforms, and genealogy. In all cases user privacy depends on a collective trust upon all involved intermediaries, like service providers, operators, administrators or even help desk staff. A single adversarial party in the whole chain of trust voids user privacy. Even more, the number of intermediaries is ever growing. Thus, user privacy must be preserved at every time and stage, independent of the intrinsic goals any involved party. Furthermore, next to these new services, traditional offline analytic systems are replaced by online services run in large data centers. Centralized processing of electronic medical records, genomic data or other health-related information is anticipated due to advances in medical research, better analytic results based on large amounts of medical information and lowered costs. In these scenarios privacy is of utmost concern due to the large amount of personal information contained within the centralized data. We focus on the challenge of privacy-preserving processing on genomic data, specifically comparing genomic sequences. The problem that arises is how to efficiently compare private sequences of two parties while preserving confidentiality of the compared data. It follows that the privacy of the data owner must be preserved, which means that as little information as possible must be leaked to any party participating in the comparison. Leakage can happen at several points during a comparison. The secured inputs for the comparing party might leak some information about the original input, or the output might leak information about the inputs. In the latter case, results of several comparisons can be combined to infer information about the confidential input of the party under observation. Genomic sequences serve as a use-case, but the proposed solutions are more general and can be applied to the generic field of privacy-preserving comparison of sequences. The solution should be efficient such that performing a comparison yields runtimes linear in the length of the input sequences and thus producing acceptable costs for a typical use-case. To tackle the problem of efficient, privacy-preserving sequence comparisons, we propose a framework consisting of three main parts. a) The basic protocol presents an efficient sequence comparison algorithm, which transforms a sequence into a set representation, allowing to approximate distance measures over input sequences using distance measures over sets. The sets are then represented by an efficient data structure - the Bloom filter -, which allows evaluation of certain set operations without storing the actual elements of the possibly large set. This representation yields low distortion for comparing similar sequences. Operations upon the set representation are carried out using efficient, partially homomorphic cryptographic systems for data confidentiality of the inputs. The output can be adjusted to either return the actual approximated distance or the result of an in-range check of the approximated distance. b) Building upon this efficient basic protocol we introduce the first mechanism to reduce the success of inference attacks by detecting and rejecting similar queries in a privacy-preserving way. This is achieved by generating generalized commitments for inputs. This generalization is done by treating inputs as messages received from a noise channel, upon which error-correction from coding theory is applied. This way similar inputs are defined as inputs having a hamming distance of their generalized inputs below a certain predefined threshold. We present a protocol to perform a zero-knowledge proof to assess if the generalized input is indeed a generalization of the actual input. Furthermore, we generalize a very efficient inference attack on privacy-preserving sequence comparison protocols and use it to evaluate our inference-control mechanism. c) The third part of the framework lightens the computational load of the client taking part in the comparison protocol by presenting a compression mechanism for partially homomorphic cryptographic schemes. It reduces the transmission and storage overhead induced by the semantically secure homomorphic encryption schemes, as well as encryption latency. The compression is achieved by constructing an asymmetric stream cipher such that the generated ciphertext can be converted into a ciphertext of an associated homomorphic encryption scheme without revealing any information about the plaintext. This is the first compression scheme available for partially homomorphic encryption schemes. Compression of ciphertexts of fully homomorphic encryption schemes are several orders of magnitude slower at the conversion from the transmission ciphertext to the homomorphically encrypted ciphertext. Indeed our compression scheme achieves optimal conversion performance. It further allows to generate keystreams offline and thus supports offloading to trusted devices. This way transmission-, storage- and power-efficiency is improved. We give security proofs for all relevant parts of the proposed protocols and algorithms to evaluate their security. A performance evaluation of the core components demonstrates the practicability of our proposed solutions including a theoretical analysis and practical experiments to show the accuracy as well as efficiency of approximations and probabilistic algorithms. Several variations and configurations to detect similar inputs are studied during an in-depth discussion of the inference-control mechanism. A human mitochondrial genome database is used for the practical evaluation to compare genomic sequences and detect similar inputs as described by the use-case. In summary we show that it is indeed possible to construct an efficient and privacy-preserving (genomic) sequences comparison, while being able to control the amount of information that leaves the comparison. To the best of our knowledge we also contribute to the field by proposing the first efficient privacy-preserving inference detection and control mechanism, as well as the first ciphertext compression system for partially homomorphic cryptographic systems

    Privacy in rfid and mobile objects

    Get PDF
    Los sistemas RFID permiten la identificación rápida y automática de etiquetas RFID a través de un canal de comunicación inalámbrico. Dichas etiquetas son dispositivos con cierto poder de cómputo y capacidad de almacenamiento de información. Es por ello que los objetos que contienen una etiqueta RFID adherida permiten la lectura de una cantidad rica y variada de datos que los describen y caracterizan, por ejemplo, un código único de identificación, el nombre, el modelo o la fecha de expiración. Además, esta información puede ser leída sin la necesidad de un contacto visual entre el lector y la etiqueta, lo cual agiliza considerablemente los procesos de inventariado, identificación, o control automático. Para que el uso de la tecnología RFID se generalice con éxito, es conveniente cumplir con varios objetivos: eficiencia, seguridad y protección de la privacidad. Sin embargo, el diseño de protocolos de identificación seguros, privados, y escalables es un reto difícil de abordar dada las restricciones computacionales de las etiquetas RFID y su naturaleza inalámbrica. Es por ello que, en la presente tesis, partimos de protocolos de identificación seguros y privados, y mostramos cómo se puede lograr escalabilidad mediante una arquitectura distribuida y colaborativa. De este modo, la seguridad y la privacidad se alcanzan mediante el propio protocolo de identificación, mientras que la escalabilidad se logra por medio de novedosos métodos colaborativos que consideran la posición espacial y temporal de las etiquetas RFID. Independientemente de los avances en protocolos inalámbricos de identificación, existen ataques que pueden superar exitosamente cualquiera de estos protocolos sin necesidad de conocer o descubrir claves secretas válidas ni de encontrar vulnerabilidades en sus implementaciones criptográficas. La idea de estos ataques, conocidos como ataques de “relay”, consiste en crear inadvertidamente un puente de comunicación entre una etiqueta legítima y un lector legítimo. De este modo, el adversario usa los derechos de la etiqueta legítima para pasar el protocolo de autenticación usado por el lector. Nótese que, dada la naturaleza inalámbrica de los protocolos RFID, este tipo de ataques representa una amenaza importante a la seguridad en sistemas RFID. En esta tesis proponemos un nuevo protocolo que además de autenticación realiza un chequeo de la distancia a la cual se encuentran el lector y la etiqueta. Este tipo de protocolos se conocen como protocolos de acotación de distancia, los cuales no impiden este tipo de ataques, pero sí pueden frustrarlos con alta probabilidad. Por último, afrontamos los problemas de privacidad asociados con la publicación de información recogida a través de sistemas RFID. En particular, nos concentramos en datos de movilidad que también pueden ser proporcionados por otros sistemas ampliamente usados tales como el sistema de posicionamiento global (GPS) y el sistema global de comunicaciones móviles. Nuestra solución se basa en la conocida noción de k-anonimato, alcanzada mediante permutaciones y microagregación. Para este fin, definimos una novedosa función de distancia entre trayectorias con la cual desarrollamos dos métodos diferentes de anonimización de trayectorias.Els sistemes RFID permeten la identificació ràpida i automàtica d’etiquetes RFID a través d’un canal de comunicació sense fils. Aquestes etiquetes són dispositius amb cert poder de còmput i amb capacitat d’emmagatzematge de informació. Es per això que els objectes que porten una etiqueta RFID adherida permeten la lectura d’una quantitat rica i variada de dades que els descriuen i caracteritzen, com per exemple un codi únic d’identificació, el nom, el model o la data d’expiració. A més, aquesta informació pot ser llegida sense la necessitat d’un contacte visual entre el lector i l’etiqueta, la qual cosa agilitza considerablement els processos d’inventariat, identificació o control automàtic. Per a que l’ús de la tecnologia RFID es generalitzi amb èxit, es convenient complir amb diversos objectius: eficiència, seguretat i protecció de la privacitat. No obstant això, el disseny de protocols d’identificació segurs, privats i escalables, es un repte difícil d’abordar dades les restriccions computacionals de les etiquetes RFID i la seva naturalesa sense fils. Es per això que, en la present tesi, partim de protocols d’identificació segurs i privats, i mostrem com es pot aconseguir escalabilitat mitjançant una arquitectura distribuïda i col•laborativa. D’aquesta manera, la seguretat i la privacitat s’aconsegueixen mitjançant el propi protocol d’identificació, mentre que l’escalabilitat s’aconsegueix per mitjà de nous protocols col•laboratius que consideren la posició espacial i temporal de les etiquetes RFID. Independentment dels avenços en protocols d’identificació sense fils, existeixen atacs que poden passar exitosament qualsevol d’aquests protocols sense necessitat de conèixer o descobrir claus secretes vàlides, ni de trobar vulnerabilitats a les seves implantacions criptogràfiques. La idea d’aquestos atacs, coneguts com atacs de “relay”, consisteix en crear inadvertidament un pont de comunicació entre una etiqueta legítima i un lector legítim. D’aquesta manera, l’adversari utilitza els drets de l’etiqueta legítima per passar el protocol d’autentificació utilitzat pel lector. Es important tindre en compte que, dada la naturalesa sense fils dels protocols RFID, aquests tipus d’atacs representen una amenaça important a la seguretat en sistemes RFID. En aquesta dissertació proposem un nou protocol que, a més d’autentificació, realitza una revisió de la distància a la qual es troben el lector i l’etiqueta. Aquests tipus de protocols es coneixen com a “distance-boulding protocols”, els quals no prevenen aquests tipus d’atacs, però si que poden frustrar-los amb alta probabilitat. Per últim, afrontem els problemes de privacitat associats amb la publicació de informació recol•lectada a través de sistemes RFID. En concret, ens concentrem en dades de mobilitat, que també poden ser proveïdes per altres sistemes àmpliament utilitzats tals com el sistema de posicionament global (GPS) i el sistema global de comunicacions mòbils. La nostra solució es basa en la coneguda noció de privacitat “k-anonymity” i parcialment en micro-agregació. Per a aquesta finalitat, definim una nova funció de distància entre trajectòries amb la qual desenvolupen dos mètodes diferents d’anonimització de trajectòries.Radio Frequency Identification (RFID) is a technology aimed at efficiently identifying and tracking goods and assets. Such identification may be performed without requiring line-of-sight alignment or physical contact between the RFID tag and the RFID reader, whilst tracking is naturally achieved due to the short interrogation field of RFID readers. That is why the reduction in price of the RFID tags has been accompanied with an increasing attention paid to this technology. However, since tags are resource-constrained devices sending identification data wirelessly, designing secure and private RFID identification protocols is a challenging task. This scenario is even more complex when scalability must be met by those protocols. Assuming the existence of a lightweight, secure, private and scalable RFID identification protocol, there exist other concerns surrounding the RFID technology. Some of them arise from the technology itself, such as distance checking, but others are related to the potential of RFID systems to gather huge amount of tracking data. Publishing and mining such moving objects data is essential to improve efficiency of supervisory control, assets management and localisation, transportation, etc. However, obvious privacy threats arise if an individual can be linked with some of those published trajectories. The present dissertation contributes to the design of algorithms and protocols aimed at dealing with the issues explained above. First, we propose a set of protocols and heuristics based on a distributed architecture that improve the efficiency of the identification process without compromising privacy or security. Moreover, we present a novel distance-bounding protocol based on graphs that is extremely low-resource consuming. Finally, we present two trajectory anonymisation methods aimed at preserving the individuals' privacy when their trajectories are released

    Protection of data privacy based on artificial intelligence in Cyber-Physical Systems

    Full text link
    With the rapid evolution of cyber attack techniques, the security and privacy of Cyber-Physical Systems (CPSs) have become key challenges. CPS environments have several properties that make them unique in efforts to appropriately secure them when compared with the processes, techniques and processes that have evolved for traditional IT networks and platforms. CPS ecosystems are comprised of heterogeneous systems, each with long lifespans. They use multitudes of operating systems and communication protocols and are often designed without security as a consideration. From a privacy perspective, there are also additional challenges. It is hard to capture and filter the heterogeneous data sources of CPSs, especially power systems, as their data should include network traffic and the sensing data of sensors. Protecting such data during the stages of collection, analysis and publication still open the possibility of new cyber threats disrupting the operational loops of power systems. Moreover, while protecting the original data of CPSs, identifying cyberattacks requires intrusion detection that produces high false alarm rates. This thesis significantly contributes to the protection of heterogeneous data sources, along with the high performance of discovering cyber-attacks in CPSs, especially smart power networks (i.e., power systems and their networks). For achieving high data privacy, innovative privacy-preserving techniques based on Artificial Intelligence (AI) are proposed to protect the original and sensitive data generated by CPSs and their networks. For cyber-attack discovery, meanwhile applying privacy-preserving techniques, new anomaly detection algorithms are developed to ensure high performances in terms of data utility and accuracy detection. The first main contribution of this dissertation is the development of a privacy preservation intrusion detection methodology that uses the correlation coefficient, independent component analysis, and Expectation Maximisation (EM) clustering algorithms to select significant data portions and discover cyber attacks against power networks. Before and after applying this technique, machine learning algorithms are used to assess their capabilities to classify normal and suspicious vectors. The second core contribution of this work is the design of a new privacy-preserving anomaly detection technique protecting the confidential information of CPSs and discovering malicious observations. Firstly, a data pre-processing technique filters and transforms data into a new format that accomplishes the aim of preserving privacy. Secondly, an anomaly detection technique using a Gaussian mixture model which fits selected features, and a Kalman filter technique that accurately computes the posterior probabilities of legitimate and anomalous events are employed. The third significant contribution of this thesis is developing a novel privacy-preserving framework for achieving the privacy and security criteria of smart power networks. In the first module, a two-level privacy module is developed, including an enhanced proof of work technique-based blockchain for accomplishing data integrity and a variational autoencoder approach for changing the data to an encoded data format to prevent inference attacks. In the second module, a long short-term memory deep learning algorithm is employed in anomaly detection to train and validate the outputs from the two-level privacy modules
    corecore