22 research outputs found

    DPWeka: Achieving Differential Privacy in WEKA

    Get PDF
    Organizations belonging to the government, commercial, and non-profit industries collect and store large amounts of sensitive data, which include medical, financial, and personal information. They use data mining methods to formulate business strategies that yield high long-term and short-term financial benefits. While analyzing such data, the private information of the individuals present in the data must be protected for moral and legal reasons. Current practices such as redacting sensitive attributes, releasing only the aggregate values, and query auditing do not provide sufficient protection against an adversary armed with auxiliary information. In the presence of additional background information, the privacy protection framework, differential privacy, provides mathematical guarantees against adversarial attacks. Existing platforms for differential privacy employ specific mechanisms for limited applications of data mining. Additionally, widely used data mining tools do not contain differentially private data mining algorithms. As a result, for analyzing sensitive data, the cognizance of differentially private methods is currently limited outside the research community. This thesis examines various mechanisms to realize differential privacy in practice and investigates methods to integrate them with a popular machine learning toolkit, WEKA. We present DPWeka, a package that provides differential privacy capabilities to WEKA, for practical data mining. DPWeka includes a suite of differential privacy preserving algorithms which support a variety of data mining tasks including attribute selection and regression analysis. It has provisions for users to control privacy and model parameters, such as privacy mechanism, privacy budget, and other algorithm specific variables. We evaluate private algorithms on real-world datasets, such as genetic data and census data, to demonstrate the practical applicability of DPWeka

    Preface

    Get PDF

    A comparison of the CAR and DAGAR spatial random effects models with an application to diabetics rate estimation in Belgium

    Get PDF
    When hierarchically modelling an epidemiological phenomenon on a finite collection of sites in space, one must always take a latent spatial effect into account in order to capture the correlation structure that links the phenomenon to the territory. In this work, we compare two autoregressive spatial models that can be used for this purpose: the classical CAR model and the more recent DAGAR model. Differently from the former, the latter has a desirable property: its ρ parameter can be naturally interpreted as the average neighbor pair correlation and, in addition, this parameter can be directly estimated when the effect is modelled using a DAGAR rather than a CAR structure. As an application, we model the diabetics rate in Belgium in 2014 and show the adequacy of these models in predicting the response variable when no covariates are available

    A Statistical Approach to the Alignment of fMRI Data

    Get PDF
    Multi-subject functional Magnetic Resonance Image studies are critical. The anatomical and functional structure varies across subjects, so the image alignment is necessary. We define a probabilistic model to describe functional alignment. Imposing a prior distribution, as the matrix Fisher Von Mises distribution, of the orthogonal transformation parameter, the anatomical information is embedded in the estimation of the parameters, i.e., penalizing the combination of spatially distant voxels. Real applications show an improvement in the classification and interpretability of the results compared to various functional alignment methods

    Infringement of Individual Privacy via Mining Differentially Private GWAS Statistics

    No full text
    Individual privacy in genomic era is becoming a growing concern as more individuals get their genomes sequenced or genotyped. Infringement of genetic privacy can be conducted even without raw genotypes or sequencing data. Studies have reported that summary statistics from Genome Wide Association Studies (GWAS) can be exploited to threat individual privacy. In this study, we show that even with differentially private GWAS statistics, there is still a risk for leaking individual privacy. Specifically, we constructed a Bayesian network through mining public GWAS statistics, and evaluated two attacks, namely trait inference attack and identity inference attack, for infringement of individual privacy not only for GWAS participants but also regular individuals. We used both simulation and real human genetic data from 1000 Genome Project to evaluate our methods. Our results demonstrated that unexpected privacy breaches could occur and attackers can derive identity information and private information by utilizing these algorithms. Hence, more methodological studies should be invested to understand the infringement and protection of genetic privacy

    RESPECTING THE ETHICAL TENSION BETWEEN SURVEILLANCE AND PRIVACY IN PROMOTING PUBLIC HEALTH AND DISEASE MANAGEMENT

    Get PDF
    The recognition of the need to undertake surveillance and to protect privacy is well established. However, the continually changing circumstances and fast-paced development of healthcare today requires a continuing need to respect this ethical tension between surveillance and privacy. Hence, this dissertation is to respect the ethical tension between surveillance and privacy in promoting public health and disease management. This dissertation investigates the ethics of conducting public health surveillance, including the challenges associated with obtaining consent and protecting data from unauthorized access. The dissertation will focus on the ethical consequences of big data, including issues associated with obtaining informed consent, data ownership, and privacy. As the dissertation concludes, it will provide an ethical justification of observing privacy in public health surveillance. The analysis is pursued in the dissertation in the following manner. After a brief introduction in Chapter 1, the analysis begins in Chapter 2 by explaining the importance of consent with regard to protecting privacy, including confidentiality in clinical ethics. Chapter 3 moves the discussion to the realm of public health ethics, discussing two examples of population health matters to illustrate the dissertation’s focus. Chapter 4 focuses on the complex issue of disease management for which the ethical tension between surveillance and privacy is pivotal. Chapter 5 then discusses the critical need for respecting this ethical tension in research protocols from a global perspective. Chapter 6 moves the discussion to the fast-developing debate of data analysis in healthcare for which respecting the ethical tension between surveillance and privacy will be pivotal for the continuing success in this new arena. Finally, Chapter 7 provides a brief conclusion to the dissertation

    Healthy Living: The European Congress of Epidemiology, 2015

    Get PDF

    Novel statistical and bioinformatic tools for identifying predictive metabolic biomarkers in molecular epidemiology studies

    No full text
    A top-down systems biology approach investigating metabolic responses to external stimuli or physiological processes requires multivariate statistical tools to identify metabolites associated with the global biochemical changes in a supra-organism. In this thesis I describe several tools I have developed to improve or supplement currently used methods in molecular epidemiology studies. First, I describe the MetaboNetworks toolbox which is able to create custom, multi-compartmental metabolic reaction networks for a supra-organism, combining both mammalian and microbial reactions. These networks are essentially a summary of the supra-organisms homeostatic signature. Second, I describe a novel statistical spectroscopy approach called STORM which aids in the elucidation of unknown biomarker signals in 1H NMR spectra. Third, I describe the Metabolome-Wide Association Study on obesity in U.S. and U.K. populations. Many novel metabolic associations with obesity are described in a systems framework, among which metabolites associated with energy, skeletal muscle, lipid, amino acid and gut microbial metabolism. Last, I describe a new multivariate approach to adjust for confounders, CA-OPLS. Correcting for confounders is an essential aspect in molecular epidemiology studies as metabolites can be related to a variety of factors such as lifestyle, diet and environmental exposures which or may not be causally related to disease risk. In developing CA-OPLS another aim was to simultaneously eliminate/minimize the effects of different types of sampling bias which are often not taken into account in modelling metabonomics data with current methods.Open Acces
    corecore