8 research outputs found

    A Fast Minimal Infrequent Itemset Mining Algorithm

    Get PDF
    A novel fast algorithm for finding quasi identifiers in large datasets is presented. Performance measurements on a broad range of datasets demonstrate substantial reductions in run-time relative to the state of the art and the scalability of the algorithm to realistically-sized datasets up to several million records

    Infrequent Weighted Itemset Mining Using Frequent Pattern Growth

    Get PDF
    Frequent weighted itemsets represent correlations frequently holding in data in which items may weight differently. However, in some contexts, e.g., when the need is to minimize a certain cost function, discovering rare data correlations is more interesting than mining frequent ones. This paper tackles the issue of discovering rare and weighted itemsets, i.e., the infrequent weighted itemset (IWI) mining problem. Two novel quality measures are proposed to drive the IWI mining process. Furthermore, two algorithms that perform IWI and Minimal IWI mining efficiently, driven by the proposed measures, are presented. Experimental results show efficiency and effectiveness of the proposed approach

    諸外国における政府統計ミクロデータの提供の現状とわが国の課題

    Get PDF
     政府統計データにおいては,秘匿性と利用者のニーズを踏まえた形で多様な提供形態が存在する。政府統計は,統計表およびミクロデータという形で利用可能であるが,とくに,政府統計のミクロデータにおいては,①匿名化ミクロデータの提供,②個票データの提供,③オーダーメイド集計,④オンデマンド型の提供サービスといった様々な形態による提供が進められてきた。一方,諸外国においても,データの秘匿性と有用性の両面から,政府統計のミクロデータの提供に関して多様なチャンネルが存在するが,政府統計データの提供状況は,個別具体的に見ると,各国によって異なる様相を呈している。 他方,近年,わが国ではオンサイト施設やリモートアクセスにおける政府統計の個票データの利用のあり方が議論されているが,個票データの利用後に利用者が「安全な分析結果」を得る上で,集計表や回帰分析の結果をどのようにチェックするかについての具体的なガイドラインが求められている。 本稿では,諸外国における政府統計ミクロデータ提供状況を明らかにした上で,わが国における政府統計データの提供における将来的な方向性を追究する

    Dictionary of privacy, data protection and information security

    Get PDF
    The Dictionary of Privacy, Data Protection and Information Security explains the complex technical terms, legal concepts, privacy management techniques, conceptual matters and vocabulary that inform public debate about privacy. The revolutionary and pervasive influence of digital technology affects numerous disciplines and sectors of society, and concerns about its potential threats to privacy are growing. With over a thousand terms meticulously set out, described and cross-referenced, this Dictionary enables productive discussion by covering the full range of fields accessibly and comprehensively. In the ever-evolving debate surrounding privacy, this Dictionary takes a longer view, transcending the details of today''s problems, technology, and the law to examine the wider principles that underlie privacy discourse. Interdisciplinary in scope, this Dictionary is invaluable to students, scholars and researchers in law, technology and computing, cybersecurity, sociology, public policy and administration, and regulation. It is also a vital reference for diverse practitioners including data scientists, lawyers, policymakers and regulators

    Erzeugung Mehrfach Imputierter Synthetischer Datensätze: Theorie und Implementierung

    Get PDF
    The book describes different approaches to generating multiply imputed synthetic datasets to guarantee confidentiality. Each chapter is dedicated to one approach, first describing the general concept followed by a detailed application to a real dataset providing useful guidelines on how to implement the theory in practice.Die Arbeit beschreibt verschiedene Ansätze zur Erstellung mehrfach imputierter synthetischer Datensätze. Diese Datensätze können der interessierten Fachöffentlichkeit zur Verfügung gestellt werden, ohne den Datenschutz zu verletzen. Jedes Kapitel befasst sich mit einem eigenen Ansatz, wobei zunächst das allgemeine Konzept beschrieben wird. Anschließend bietet eine detailierte Anwendung auf einen realen Datensatz hilfreiche Richtlinien, wie sich die beschriebene Theorie in der Praxis anwenden läßt
    corecore