Search CORE

17 research outputs found

A new PCA-based utility measure for synthetic data evaluation

Author: Dankar F. K.
Ibrahim M. K.
Publication venue
Publication date: 26/11/2022
Field of study

Data synthesis is a privacy enhancing technology aiming to produce realistic and timely data when real data is hard to obtain. Utility of synthetic data generators (SDGs) has been investigated through different utility metrics. These metrics have been found to generate conflicting conclusions making direct comparison of SDGs surprisingly difficult. Moreover, prior research found no correlation between popular metrics, concluding they tackle different utility-dimensions. This paper aggregates four popular utility metrics (representing different utility dimensions) into one using principal-component-analysis and checks whether the new measure can generate synthetic data that perform well in real-life. The new measure is used to compare four well-recognized SDGs.Comment: 20 pages, 5 figures, 8 tables, 1 appendi

arXiv.org e-Print Archive

A Protocol for the Secure Linking of Registries for HPV Surveillance

In order to monitor the effectiveness of HPV vaccination in Canada the linkage of multiple data registries may be required. These registries may not always be managed by the same organization and, furthermore, privacy legislation or practices may restrict any data linkages of records that can actually be done among registries. The objective of this study was to develop a secure protocol for linking data from different registries and to allow on-going monitoring of HPV vaccine effectiveness.A secure linking protocol, using commutative hash functions and secure multi-party computation techniques was developed. This protocol allows for the exact matching of records among registries and the computation of statistics on the linked data while meeting five practical requirements to ensure patient confidentiality and privacy. The statistics considered were: odds ratio and its confidence interval, chi-square test, and relative risk and its confidence interval. Additional statistics on contingency tables, such as other measures of association, can be added using the same principles presented. The computation time performance of this protocol was evaluated.The protocol has acceptable computation time and scales linearly with the size of the data set and the size of the contingency table. The worse case computation time for up to 100,000 patients returned by each query and a 16 cell contingency table is less than 4 hours for basic statistics, and the best case is under 3 hours.A computationally practical protocol for the secure linking of data from multiple registries has been demonstrated in the context of HPV vaccine initiative impact assessment. The basic protocol can be generalized to the surveillance of other conditions, diseases, or vaccination programs

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Efficient and effective pruning strategies for health data de-identification

Author: B Davey
B Fung
B Malin
BCM Fung
CC Aggarwal
DE Willard
F Kohlmayer
F Prasser
F Prasser
F Prasser
Fabian Prasser
FK Dankar
Florian Kohlmayer
G Poulis
J Domingo-Ferrer
J Goldberger
J Soria-Comas
K Babu
K El Emam
K El Emam
K El Emam
K El Emam
K LeFevre
KE Emam
Klaus A. Kuhn
L Mattner
L Sweeney
LH Cox
M Maass
M Nergiz
N Li
P Bose
P Samarati
P Samarati
R Lautenschläger
RJ Bayardo
V Iyengar
W Xia
Z Wan
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Optimization of Ontology‐Based Clinical Pathways and Incorporating Differential Privacy in the Healthcare System

Author: Avent B.
Ciagli E.
Dankar F. K.
De Medeiros A. A.
Dwork C.
Lee DG-Y
Pihur V.
Rotter T.
Ye Q.
Publication venue: Scrivener Publishing
Publication date: 04/10/2019
Field of study

International audienc

Crossref

Hal-Diderot

A Survey of Differentially Private Regression for Clinical and Epidemiological Research

Author: Agresti A.
Chaudhuri K.
Dankar F.
Dwork C.
Hall R.
Lee J.
Lei J.
Li C.
Minami K.
Qardaji W.
Snoke J.
Solea E.
Williams O.
Publication venue: 'Wiley'
Publication date
Field of study

Crossref

The development of large-scale de-identified biomedical databases in the age of genomics—principles and challenges

Author: A Gelfand
AL McGuire
AL McGuire
AL McGuire
CM de
D Cyranoski
DF Gudbjartsson
DG Graham
DM Roden
DR Nyholt
E Check Hayden
EA Whitley
F Dankar
FK Dankar
FK Dankar
FK Dankar
FK Dankar
G Silberman
J Gulcher
JM Oliver
K El Emam
K El Emam
K El Emam
K El Emam
L Kamm
L Leitsalu
L Lessig
L Sweeney
LC Hartmann
LE Wolf
M Abu-Elmagd
M Gymrek
M Naveed
N Abokhodair
N Homer
PS Appelbaum
R Dresser
S He
S Lederer
S Wang
SF Terry
SM Domchek
SN Murphy
TR Rebbeck
VV Cogo
Y Lindell
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Estimating the re-identification risk of clinical data sets

Author: A Cavoukian
A Dale
A Robeznieks
A Takemara
A Vickers
Angelica Neisa
B Yolles
C Becker
C Dwork
C Dwork
C Dwork
C Freeze
C Hogue
C Mackie
C Marsh
C Marsh
C Skinner
Canadian Institutes of Health Research
Commission of the European Communities
Consumer-Purchaser Disclosure Project
Council. NR
D Altman
D Hutchon
D Lafky
D Pullman
D Willison
Department of Health and Human Services
Department of Health and Human Services
Department of Health and Human Services
E Martin
E Singer
F Dankar
F Dankar
Fida Kamal Dankar
G Chen
G Seni
G Skinner
GD Smith
H Howe
H Howe
ISO/TS 25237
J Bethlehem
J Lee
J Pitman
J Pritts
J Yakowitz
K Abraham
K Benitez
K El Emam
K El Emam
K El Emam
K El Emam
K El Emam
K El Emam
K El Emam
K El Emam
K El Emam
K El Emam
K El Emam
K El Emam
K Nair
Khaled El Emam
L Breiman
L Damschroder
L Sweeney
L Sweeney
L Sweeney
L Zayatz
M Koot
N Hoshino
N Johnson
N Kass
National Committee on Vital and Health Statistics
Office for Civil Rights
OIPC Stakeholder Survey 2003
P Golle
P Haas
P Samarati
P Samarati
R Benedetti
R Sarathy
R Whiddett
S Bell
S Bell
S Fienberg
Statistics Canada
Statistics Canada
Statistics Canada
T Dalenius
T Delamothe
T Hastie
T Hedrick
TS Mayer
Tyson Roffey
V Ciriani
W Ewens
W Lowrance
Publication venue: BMC
Publication date: 01/01/2012
Field of study

Abstract Background De-identification is a common way to protect patient privacy when disclosing clinical data for secondary purposes, such as research. One type of attack that de-identification protects against is linking the disclosed patient data with public and semi-public registries. Uniqueness is a commonly used measure of re-identification risk under this attack. If uniqueness can be measured accurately then the risk from this kind of attack can be managed. In practice, it is often not possible to measure uniqueness directly, therefore it must be estimated. Methods We evaluated the accuracy of uniqueness estimators on clinically relevant data sets. Four candidate estimators were identified because they were evaluated in the past and found to have good accuracy or because they were new and not evaluated comparatively before: the Zayatz estimator, slide negative binomial estimator, Pitman’s estimator, and mu-argus. A Monte Carlo simulation was performed to evaluate the uniqueness estimators on six clinically relevant data sets. We varied the sampling fraction and the uniqueness in the population (the value being estimated). The median relative error and inter-quartile range of the uniqueness estimates was measured across 1000 runs. Results There was no single estimator that performed well across all of the conditions. We developed a decision rule which selected between the Pitman, slide negative binomial and Zayatz estimators depending on the sampling fraction and the difference between estimates. This decision rule had the best consistent median relative error across multiple conditions and data sets. Conclusion This study identified an accurate decision rule that can be used by health privacy researchers and disclosure control professionals to estimate uniqueness in clinical data sets. The decision rule provides a reliable way to measure re-identification risk.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Survival analysis of infected mice reveals pathogenic variations in the genome of avian H1N1 viruses

Author: A Dias
AH Reid
BG Hale
BG Hale
BG Hale
BW Jagger
C Wu
C Yin
D Guilligay
D Jackson
D Paterson
D Zamarin
DA Steinhauer
DL Noah
E Hoffmann
E Obayashi
EK Subbarao
F Tarendeau
GW Chen
H Akarsu
H Yamada
HM Berman
HM Wise
I Burgui
J Hu
J Liu
J Meng
J Yu
JC Obenauer
JJ Treanor
JK Taubenberger
JR Yang
JS Gibbs
JY Han
K Sugiyama
M Igarashi
RA Fouchier
RG Webster
RJ Russell
S Arzt
S Li
S Tong
S Tong
SK Dankar
SW Gerritz
T Kageyama
W Chen
X Xu
X Zhu
Y Benjamini
Y Kanegae
Y Liu
ZA Koçer
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2014
Field of study

Most influenza pandemics have been caused by H1N1 viruses of purely or partially avian origin. Here, using Cox proportional hazard model, we attempt to identify the genetic variations in the whole genome of wild-type North American avian H1N1 influenza A viruses that are associated with their virulence in mice by residue variations, host origins of virus (Anseriformes-ducks or Charadriiformes-shorebirds), and host-residue interactions. In addition, through structural modeling, we predicted that several polymorphic sites associated with pathogenicity were located in structurally important sites, especially in the polymerase complex and NS genes. Our study introduces a new approach to identify pathogenic variations in wild-type viruses circulating in the natural reservoirs and ultimately to understand their infectious risks to humans as part of risk assessment efforts towards the emergence of future pandemic strains

Crossref

PubMed Central

Dokuz Eylul University Research Information System

Dynamic Fracture of Ceramics in Armor Applications

Crossref

Hybrid fused filament fabrication for manufacturing of Al microfilm reinforced PLA structures

Author: A Chadha
A Singh
AD Valino
Aniket Yadav
ASTM
B Şimşek
CL Hwang
D Mathias
DY Kim
E Ordoñez
F Alam
HK Sezer
HS Shih
I Dankar
J Dong
J Khatwani
J Ni
J Torres
Jasgurpreet Singh Chohan
JRL Monteiro
K Henke
L Wan
M Attaran
M Hassan
M Krinitcyn
M Lay
M Li
M Spoerk
MV Patton
N Yuvaraj
Narinder Singh
Piyush
R Singh
Raman Kumar
Ranvijay Kumar
RC Nonato
S Balasubramaniyan
S Ford
S Kumar
S Singh
T Tilford
TS Lan
V Francis
W Ye
Z Chen
Z Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref