Search CORE

17,864 research outputs found

Semi-supervised model-based clustering with controlled clusters leakage

Author: Struski Łukasz
Tabor Jacek
Śmieja Marek
Publication venue
Publication date: 01/01/2017
Field of study

In this paper, we focus on finding clusters in partially categorized data sets. We propose a semi-supervised version of Gaussian mixture model, called C3L, which retrieves natural subgroups of given categories. In contrast to other semi-supervised models, C3L is parametrized by user-defined leakage level, which controls maximal inconsistency between initial categorization and resulting clustering. Our method can be implemented as a module in practical expert systems to detect clusters, which combine expert knowledge with true distribution of data. Moreover, it can be used for improving the results of less flexible clustering techniques, such as projection pursuit clustering. The paper presents extensive theoretical analysis of the model and fast algorithm for its efficient optimization. Experimental results show that C3L finds high quality clustering model, which can be applied in discovering meaningful groups in partially classified data

arXiv.org e-Print Archive

Jagiellonian Univeristy Repository

Context-Aware Generative Adversarial Privacy

Author: Chen Xiao
Huang Chong
Kairouz Peter
Rajagopal Ram
Sankar Lalitha
Publication venue: 'MDPI AG'
Publication date: 01/12/2017
Field of study

Preserving the utility of published datasets while simultaneously providing provable privacy guarantees is a well-known challenge. On the one hand, context-free privacy solutions, such as differential privacy, provide strong privacy guarantees, but often lead to a significant reduction in utility. On the other hand, context-aware privacy solutions, such as information theoretic privacy, achieve an improved privacy-utility tradeoff, but assume that the data holder has access to dataset statistics. We circumvent these limitations by introducing a novel context-aware privacy framework called generative adversarial privacy (GAP). GAP leverages recent advancements in generative adversarial networks (GANs) to allow the data holder to learn privatization schemes from the dataset itself. Under GAP, learning the privacy mechanism is formulated as a constrained minimax game between two players: a privatizer that sanitizes the dataset in a way that limits the risk of inference attacks on the individuals' private variables, and an adversary that tries to infer the private variables from the sanitized dataset. To evaluate GAP's performance, we investigate two simple (yet canonical) statistical dataset models: (a) the binary data model, and (b) the binary Gaussian mixture model. For both models, we derive game-theoretically optimal minimax privacy mechanisms, and show that the privacy mechanisms learned from data (in a generative adversarial fashion) match the theoretically optimal ones. This demonstrates that our framework can be easily applied in practice, even in the absence of dataset statistics.Comment: Improved version of a paper accepted by Entropy Journal, Special Issue on Information Theory in Machine Learning and Data Scienc

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Privacy-Preserving Adversarial Networks

Author: Ishwar Prakash
Tripathy Ardhendu
Wang Ye
Publication venue
Publication date: 12/06/2019
Field of study

We propose a data-driven framework for optimizing privacy-preserving data release mechanisms to attain the information-theoretically optimal tradeoff between minimizing distortion of useful data and concealing specific sensitive information. Our approach employs adversarially-trained neural networks to implement randomized mechanisms and to perform a variational approximation of mutual information privacy. We validate our Privacy-Preserving Adversarial Networks (PPAN) framework via proof-of-concept experiments on discrete and continuous synthetic data, as well as the MNIST handwritten digits dataset. For synthetic data, our model-agnostic PPAN approach achieves tradeoff points very close to the optimal tradeoffs that are analytically-derived from model knowledge. In experiments with the MNIST data, we visually demonstrate a learned tradeoff between minimizing the pixel-level distortion versus concealing the written digit.Comment: 16 page

arXiv.org e-Print Archive

Crossref

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Innovations orthogonalization: a solution to the major pitfalls of EEG/MEG "leakage correction"

Author: Biscay Rolando J.
Bosch-Bayard Jorge
Faber Pascal
Kinoshita Toshihiko
Kochi Kieko
Milz Patricia
Nishida Keiichiro
Pascual-Marqui Roberto D.
Yoshimura Masafumi
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 23/08/2017
Field of study

The problem of interest here is the study of brain functional and effective connectivity based on non-invasive EEG-MEG inverse solution time series. These signals generally have low spatial resolution, such that an estimated signal at any one site is an instantaneous linear mixture of the true, actual, unobserved signals across all cortical sites. False connectivity can result from analysis of these low-resolution signals. Recent efforts toward "unmixing" have been developed, under the name of "leakage correction". One recent noteworthy approach is that by Colclough et al (2015 NeuroImage, 117:439-448), which forces the inverse solution signals to have zero cross-correlation at lag zero. One goal is to show that Colclough's method produces false human connectomes under very broad conditions. The second major goal is to develop a new solution, that appropriately "unmixes" the inverse solution signals, based on innovations orthogonalization. The new method first fits a multivariate autoregression to the inverse solution signals, giving the mixed innovations. Second, the mixed innovations are orthogonalized. Third, the mixed and orthogonalized innovations allow the estimation of the "unmixing" matrix, which is then finally used to "unmix" the inverse solution signals. It is shown that under very broad conditions, the new method produces proper human connectomes, even when the signals are not generated by an autoregressive model.Comment: preprint, technical report, under license "Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)", https://creativecommons.org/licenses/by-nc-nd/4.0

arXiv.org e-Print Archive

Crossref

ZORA