136 research outputs found

    Privacy preservation in social media environments using big data

    Get PDF
    With the pervasive use of mobile devices, social media, home assistants, and smart devices, the idea of individual privacy is fading. More than ever, the public is giving up personal information in order to take advantage of what is now considered every day conveniences and ignoring the consequences. Even seemingly harmless information is making headlines for its unauthorized use (18). Among this data is user trajectory data which can be described as a user\u27s location information over a time period (6). This data is generated whenever users access their devices to record their location, query the location of a point of interest, query directions to get to a location, request services to come to their location, and many other applications. This data could be used by a malicious adversary to track a user\u27s movements, location, daily patterns, and learn details personal to the user. While the best course of action would be to hide this information entirely, this data can be used for many beneficial purposes as well. Emergency vehicles could be more efficiently routed based on trajectory patterns, businesses could make intelligent marketing or building decisions, and users themselves could benefit by taking advantage of more conveniences. There are several challenges to publishing this data while also preserving user privacy. For example, while location data has good utility, users expect their data to be private. For real world applications, users generate many terabytes of data every day. To process this volume of data for later use and anonymize it in order to hide individual user identities, this thesis presents an efficient algorithm to change the processing time for anonymization from days, as seen in (20), to a matter of minutes or hours. We cannot focus just on location data, however. Social media has a great many uses, one of which being the sharing of images. Privacy cannot stop with location, but must reach to other data as well. This thesis addresses the issue of image privacy in this work, as often images can be even more sensitive than location --Abstract, page iv

    Data Anonymization for Privacy Preservation in Big Data

    Get PDF
    Cloud computing provides capable ascendable IT edifice to provision numerous processing of a various big data applications in sectors such as healthcare and business. Mainly electronic health records data sets and in such applications generally contain privacy-sensitive data. The most popular technique for data privacy preservation is anonymizing the data through generalization. Proposal is to examine the issue against proximity privacy breaches for big data anonymization and try to recognize a scalable solution to this issue. Scalable clustering approach with two phase consisting of clustering algorithm and K-Anonymity scheme with Generalisation and suppression is intended to work on this problem. Design of the algorithms is done with MapReduce to increase high scalability by carrying out dataparallel execution in cloud. Wide-ranging researches on actual data sets substantiate that the method deliberately advances the competence of defensive proximity privacy breaks, the scalability and the efficiency of anonymization over existing methods. Anonymizing data sets through generalization to gratify some of the privacy attributes like k- Anonymity is a popularly-used type of privacy preserving methods. Currently, the gauge of data in numerous cloud surges extremely in agreement with the Big Data, making it a dare for frequently used tools to actually get, manage, and process large-scale data for a particular accepted time scale. Hence, it is a trial for prevailing anonymization approaches to attain privacy conservation for big data private information due to scalabilty issues

    Protecting sensitive data using differential privacy and role-based access control

    Get PDF
    Dans le monde d'aujourd'hui où la plupart des aspects de la vie moderne sont traités par des systèmes informatiques, la vie privée est de plus en plus une grande préoccupation. En outre, les données ont été générées massivement et traitées en particulier dans les deux dernières années, ce qui motive les personnes et les organisations à externaliser leurs données massives à des environnements infonuagiques offerts par des fournisseurs de services. Ces environnements peuvent accomplir les tâches pour le stockage et l'analyse de données massives, car ils reposent principalement sur Hadoop MapReduce qui est conçu pour traiter efficacement des données massives en parallèle. Bien que l'externalisation de données massives dans le nuage facilite le traitement de données et réduit le coût de la maintenance et du stockage de données locales, elle soulève de nouveaux problèmes concernant la protection de la vie privée. Donc, comment on peut effectuer des calculs sur de données massives et sensibles tout en préservant la vie privée. Par conséquent, la construction de systèmes sécurisés pour la manipulation et le traitement de telles données privées et massives est cruciale. Nous avons besoin de mécanismes pour protéger les données privées, même lorsque le calcul en cours d'exécution est non sécurisé. Il y a eu plusieurs recherches ont porté sur la recherche de solutions aux problèmes de confidentialité et de sécurité lors de l'analyse de données dans les environnements infonuagique. Dans cette thèse, nous étudions quelques travaux existants pour protéger la vie privée de tout individu dans un ensemble de données, en particulier la notion de vie privée connue comme confidentialité différentielle. Confidentialité différentielle a été proposée afin de mieux protéger la vie privée du forage des données sensibles, assurant que le résultat global publié ne révèle rien sur la présence ou l'absence d'un individu donné. Enfin, nous proposons une idée de combiner confidentialité différentielle avec une autre méthode de préservation de la vie privée disponible.In nowadays world where most aspects of modern life are handled and managed by computer systems, privacy has increasingly become a big concern. In addition, data has been massively generated and processed especially over the last two years. The rate at which data is generated on one hand, and the need to efficiently store and analyze it on the other hand, lead people and organizations to outsource their massive amounts of data (namely Big Data) to cloud environments supported by cloud service providers (CSPs). Such environments can perfectly undertake the tasks for storing and analyzing big data since they mainly rely on Hadoop MapReduce framework, which is designed to efficiently handle big data in parallel. Although outsourcing big data into the cloud facilitates data processing and reduces the maintenance cost of local data storage, it raises new problem concerning privacy protection. The question is how one can perform computations on sensitive and big data while still preserving privacy. Therefore, building secure systems for handling and processing such private massive data is crucial. We need mechanisms to protect private data even when the running computation is untrusted. There have been several researches and work focused on finding solutions to the privacy and security issues for data analytics on cloud environments. In this dissertation, we study some existing work to protect the privacy of any individual in a data set, specifically a notion of privacy known as differential privacy. Differential privacy has been proposed to better protect the privacy of data mining over sensitive data, ensuring that the released aggregate result gives almost nothing about whether or not any given individual has been contributed to the data set. Finally, we propose an idea of combining differential privacy with another available privacy preserving method

    Optimal assignment problem on record linkage

    Get PDF
    We present an application of the Hungarian Method, an optimal assignment graph theory algorithm, to record linkage in order to improve the disclosure risk assessment. We should note that Hungarian Method has O(n^3) complexity; three different methods are presented to reduce its computational cost

    A Classification of non-Cryptographic Anonymization Techniques Ensuring Privacy in Big Data

    Get PDF
    Recently, Big Data processing becomes crucial to most enterprise and government applications due to the fast growth of the collected data. However, this data often includes private personal information that arise new security and privacy concerns. Moreover, it is widely agreed that the sheer scale of big data makes many privacy preserving techniques unavailing. Therefore, in order to ensure privacy in big data, anonymization is suggested as one of the most efficient approaches. In this paper, we will provide a new detailed classification of the most used non-cryptographic anonymization techniques related to big data including generalization and randomization approaches. Besides, the paper evaluates the presented techniques through integrity, confidentiality and credibility criteria. In addition, three relevant anonymization techniques including k-anonymity, l-diversity and t-closeness are tested on an extract of a huge real data set

    Protection of big data privacy

    Full text link
    In recent years, big data have become a hot research topic. The increasing amount of big data also increases the chance of breaching the privacy of individuals. Since big data require high computational power and large storage, distributed systems are used. As multiple parties are involved in these systems, the risk of privacy violation is increased. There have been a number of privacy-preserving mechanisms developed for privacy protection at different stages (e.g., data generation, data storage, and data processing) of a big data life cycle. The goal of this paper is to provide a comprehensive overview of the privacy preservation mechanisms in big data and present the challenges for existing mechanisms. In particular, in this paper, we illustrate the infrastructure of big data and the state-of-the-art privacy-preserving mechanisms in each stage of the big data life cycle. Furthermore, we discuss the challenges and future research directions related to privacy preservation in big data

    Privacy and trustworthiness management in moving object environments

    Get PDF
    The use of location-based services (LBS) (e.g., Intel\u27s Thing Finder) is expanding. Besides the traditional centralized location-based services, distributed ones are also emerging due to the development of Vehicular Ad-hoc Networks (VANETs), a dynamic network which allows vehicles to communicate with one another. Due to the nature of the need of tracking users\u27 locations, LBS have raised increasing concerns on users\u27 location privacy. Although many research has been carried out for users to submit their locations anonymously, the collected anonymous location data may still be mapped to individuals when the adversary has related background knowledge. To improve location privacy, in this dissertation, the problem of anonymizing the collected location datasets is addressed so that they can be published for public use without violating any privacy concerns. Specifically, a privacy-preserving trajectory publishing algorithm is proposed that preserves high data utility rate. Moreover, the scalability issue is tackled in the case the location datasets grows gigantically due to continuous data collection as well as increase of LBS users by developing a distributed version of our trajectory publishing algorithm which leveraging the MapReduce technique. As a consequence of users being anonymous, it becomes more challenging to evaluate the trustworthiness of messages disseminated by anonymous users. Existing research efforts are mainly focused on privacy-preserving authentication of users which helps in tracing malicious vehicles only after the damage is done. However, it is still not sufficient to prevent malicious behavior from happening in the case where attackers do not care whether they are caught later on. Therefore, it would be more effective to also evaluate the content of the message. In this dissertation, a novel information-oriented trustworthiness evaluation is presented which enables each individual user to evaluate the message content and make informed decisions --Abstract, page iii

    Integration of Differential Privacy Mechanism to Map-Reduce Platform for Preserving Privacy in Cloud Environments

    Get PDF
    Le cloud computing peut être désigné comme utilisant les capacités de ressources matérielles et logicielles basées sur Internet; C’est la tendance de la dernière décennie dans le monde numérique d’aujourd’hui, de plus en plus rapide. Cela a changé le monde qui nous entoure. L’utilisation du cloud est devenue une norme et les utilisateurs transfèrent leurs données vers le cloud à mesure que les données grossissent et qu’il est nécessaire d’accéder aux données à partir de nombreux appareils. Des tonnes de données sont créées chaque jour et toutes les organisations, des instituts scientifiques aux entreprises industrielles, ont pour objectif d’analyser les données et d’en extraire les schémas afin d’améliorer leurs services ou à d’autres fins. Dans l’intervalle, les sociétés d’analyse de données utilisent les informations de millions de personnes et il est de plus en plus nécessaire de garantir la protection de leurs données. Des techniques d’ingénierie sociale aux attaques techniques malveillantes, les données risquent toujours de fuir et nous devrions proposer des solutions pour protéger les données des individus. Dans cette thèse, nous présentons «Parmanix», une plateforme de protection de la confidentialité pour l’analyse de données. Il est basé sur le système MapReduce et fournit des garanties de confidentialité pour les données sensibles dans les calculs distribués sur des données sensibles. Sur cette plate-forme, les fournisseurs de données définissent la politique de sécurité de leurs données. Le fournisseur de calcul peut écrire du code Mapper non approuvé et utiliser l’un des réducteurs de confiance déjà définis dans Parmanix. Comme le système garantit une surcharge acceptable, il n’y aura aucune fuite de données individuelles lors des calculs de la plate-forme.----------ABSTRACT: Cloud computing can be referred to as using the capabilities of hardware and software resources that are based on the Internet; It is the trend of the past decade growing among today’s digital world at a fast pace. It has changed the world around us. Using the cloud has become a norm and people are moving their data to the cloud since data is getting bigger and there is the need to access the data from many devices. Tones of data are creating every day and all the organizations, from science institutes to industrial companies aim to analyze the data and extract the patterns within them to improve their services or for other purposes. In between, information of millions of people is getting used by data analytic companies and there is an increasing need to guarantee the protection of their data. From social engineering techniques to malicious technical attacks, the data is always at the risk of leakage and we should propose solutions to keep an individual’s data protected. In this thesis, we present “Parmanix”, a privacy preserve module for data analytics. It is based on the MapReduce system and provides privacy guarantees for sensitive data in distributed computations on sensitive data. With this module, data providers define the security policy for their data, and computation provider can write untrusted Mapper code and use one of the trusted Reducers that we have already defined within Parmanix. As system guarantees with an acceptable amount of overhead, there would be no leakage of individual’s data through the platform computations
    • …
    corecore