446 research outputs found

    PriPeARL: A Framework for Privacy-Preserving Analytics and Reporting at LinkedIn

    Full text link
    Preserving privacy of users is a key requirement of web-scale analytics and reporting applications, and has witnessed a renewed focus in light of recent data breaches and new regulations such as GDPR. We focus on the problem of computing robust, reliable analytics in a privacy-preserving manner, while satisfying product requirements. We present PriPeARL, a framework for privacy-preserving analytics and reporting, inspired by differential privacy. We describe the overall design and architecture, and the key modeling components, focusing on the unique challenges associated with privacy, coverage, utility, and consistency. We perform an experimental study in the context of ads analytics and reporting at LinkedIn, thereby demonstrating the tradeoffs between privacy and utility needs, and the applicability of privacy-preserving mechanisms to real-world data. We also highlight the lessons learned from the production deployment of our system at LinkedIn.Comment: Conference information: ACM International Conference on Information and Knowledge Management (CIKM 2018

    The Privacy Preservation of Data Cubes

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Secured Data Masking Framework and Technique for Preserving Privacy in a Business Intelligence Analytics Platform

    Get PDF
    The main concept behind business intelligence (BI) is how to use integrated data across different business systems within an enterprise to make strategic decisions. It is difficult to map internal and external BI’s users to subsets of the enterprise’s data warehouse (DW), resulting that protecting the privacy of this data while maintaining its utility is a challenging task. Today, such DW systems constitute one of the most serious privacy breach threats that an enterprise might face when many internal users of different security levels have access to BI components. This thesis proposes a data masking framework (iMaskU: Identify, Map, Apply, Sign, Keep testing, Utilize) for a BI platform to protect the data at rest, preserve the data format, and maintain the data utility on-the-fly querying level. A new reversible data masking technique (COntent BAsed Data masking - COBAD) is developed as an implementation of iMaskU. The masking algorithm in COBAD is based on the statistical content of the extracted dataset, so that, the masked data cannot be linked with specific individuals or be re-identified by any means. The strength of the re-identification risk factor for the COBAD technique has been computed using a supercomputer where, three security scheme/attacking methods are considered, a) the brute force attack, needs, on average, 55 years to crack the key of each record; b) the dictionary attack, needs 231 days to crack the same key for the entire extracted dataset (containing 50,000 records), c) a data linkage attack, the re-identification risk is very low when the common linked attributes are used. The performance validation of COBAD masking technique has been conducted. A database schema of 1GB is used in TPC-H decision support benchmark. The performance evaluation for the execution time of the selected TPC-H queries presented that the COBAD speed results are much better than AES128 and 3DES encryption. Theoretical and experimental results show that the proposed solution provides a reasonable trade-off between data security and the utility of re-identified data

    Privacy-preserving data mining

    Get PDF
    In the research of privacy-preserving data mining, we address issues related to extracting knowledge from large amounts of data without violating the privacy of the data owners. In this study, we first introduce an integrated baseline architecture, design principles, and implementation techniques for privacy-preserving data mining systems. We then discuss the key components of privacy-preserving data mining systems which include three protocols: data collection, inference control, and information sharing. We present and compare strategies for realizing these protocols. Theoretical analysis and experimental evaluation show that our protocols can generate accurate data mining models while protecting the privacy of the data being mined

    Implications and challenges to using data mining in educational research in the Canadian context

    Get PDF
    Canadian institutions of higher education are major players on the international arena for educating future generations and producing leaders around the world in various fields. In the last decade, Canadian universities have seen an influx in their incoming international students, who contribute over 3.5 billion to the Canadian economy (Madgett & Bélanger 2008, p. 195). Research in Canadian post-secondary institutions is booming, especially in education (SSHRC, 2011)—for the academic year 2010-2011, of the 12 subject areas, the total SSHRC funding for projects in education, ranked fourth, exceeding 27 million. All of these variables place Canadian higher education in a leading and strategic position in several educational research fields. One can imagine the wealth of knowledge about trends in higher education that could be revealed if the large amount of data generated by Canadian universities were systematically analyzed and handled using techniques such as data mining. However, not much can be achieved from the unharnessed knowledge accumulated on a daily basis, as the advancement of data mining research that would provide the ultimate tool to learn about trends and changes in Canadian institutions is often held back by inadequate data warehousing, as well as by privacy, confidentiality, and copyright regulations. In this paper, we engage in a critical discussion/analysis of the interface between data mining research in higher education and the legal implications of such a tool.Les établissements canadiens d'enseignement supérieur jouent un rôle majeur sur la scène internationale dans l'éducation des générations futures et dans la formation de leaders dans divers domaines à travers le monde. Au cours de la dernière décennie, les universités canadiennes ont connu un afflux d'étudiants internationaux, qui contribuent plus de 3,5 milliards de dollars à l'économie canadienne (Bélanger & Madgett 2008, p. 195). La recherche dans les institutions canadiennes d'enseignement postsecondaire est en plein essor, en particulier en matière d'éducation (CRSH, 2011) - pour l'année académique 2010-2011, parmi les 12 domaines de recherche, le financement total du CRSH pour les projets portant sur l'éducation, au quatrième rang, s'élevait à plus de 27 millions de dollars. Toutes ces variables placent l'enseignement supérieur canadien dans une position stratégique et de premier plan dans plusieurs domaines de recherche en éducation. On peut imaginer la richesse des informations sur les tendances dans l'enseignement supérieur qui pourrait être révélée si la masse de données générée par les universités canadiennes était systématiquement analysée et traitée en utilisant des techniques telle que l'exploration de données. Cependant, on ne peut guère obtenir grand chose à partir des informations accumulées sur une base quotidienne, étant donné que l'avancement de la recherche à exploration de données, qui serait l'outil ultime pour en apprendre davantage sur les tendances et les changements dans les institutions canadiennes, est souvent freinée par un entreposage de données insuffisant, ainsi que par les règlementations sur la protection des renseignements personnels, la confidentialité et le droit d'auteur. Dans cet article, nous engageons une discussion et une analyse critiques de l'interface entre la recherche à exploration de données dans l'enseignement supérieur et les implications juridiques d'un tel outil

    Marginal Release Under Local Differential Privacy

    Full text link
    Many analysis and machine learning tasks require the availability of marginal statistics on multidimensional datasets while providing strong privacy guarantees for the data subjects. Applications for these statistics range from finding correlations in the data to fitting sophisticated prediction models. In this paper, we provide a set of algorithms for materializing marginal statistics under the strong model of local differential privacy. We prove the first tight theoretical bounds on the accuracy of marginals compiled under each approach, perform empirical evaluation to confirm these bounds, and evaluate them for tasks such as modeling and correlation testing. Our results show that releasing information based on (local) Fourier transformations of the input is preferable to alternatives based directly on (local) marginals

    Implanting Life-Cycle Privacy Policies in a Context Database

    Get PDF
    Ambient intelligence (AmI) environments continuously monitor surrounding individuals' context (e.g., location, activity, etc.) to make existing applications smarter, i.e., make decision without requiring user interaction. Such AmI smartness ability is tightly coupled to quantity and quality of the available (past and present) context. However, context is often linked to an individual (e.g., location of a given person) and as such falls under privacy directives. The goal of this paper is to enable the difficult wedding of privacy (automatically fulfilling users' privacy whishes) and smartness in the AmI. interestingly, privacy requirements in the AmI are different from traditional environments, where systems usually manage durable data (e.g., medical or banking information), collected and updated trustfully either by the donor herself, her doctor, or an employee of her bank. Therefore, proper information disclosure to third parties constitutes a major privacy concern in the traditional studies
    corecore