22 research outputs found
A Protocol for the Secure Linking of Registries for HPV Surveillance
In order to monitor the effectiveness of HPV vaccination in Canada the linkage of multiple data registries may be required. These registries may not always be managed by the same organization and, furthermore, privacy legislation or practices may restrict any data linkages of records that can actually be done among registries. The objective of this study was to develop a secure protocol for linking data from different registries and to allow on-going monitoring of HPV vaccine effectiveness.A secure linking protocol, using commutative hash functions and secure multi-party computation techniques was developed. This protocol allows for the exact matching of records among registries and the computation of statistics on the linked data while meeting five practical requirements to ensure patient confidentiality and privacy. The statistics considered were: odds ratio and its confidence interval, chi-square test, and relative risk and its confidence interval. Additional statistics on contingency tables, such as other measures of association, can be added using the same principles presented. The computation time performance of this protocol was evaluated.The protocol has acceptable computation time and scales linearly with the size of the data set and the size of the contingency table. The worse case computation time for up to 100,000 patients returned by each query and a 16 cell contingency table is less than 4 hours for basic statistics, and the best case is under 3 hours.A computationally practical protocol for the secure linking of data from multiple registries has been demonstrated in the context of HPV vaccine initiative impact assessment. The basic protocol can be generalized to the surveillance of other conditions, diseases, or vaccination programs
Multifunctional Adaptive NS1 Mutations Are Selected upon Human Influenza Virus Evolution in the Mouse
The role of the NS1 protein in modulating influenza A virulence and host range was assessed by adapting A/Hong Kong/1/1968 (H3N2) (HK-wt) to increased virulence in the mouse. Sequencing the NS genome segment of mouse-adapted variants revealed 11 mutations in the NS1 gene and 4 in the overlapping NEP gene. Using the HK-wt virus and reverse genetics to incorporate mutant NS gene segments, we demonstrated that all NS1 mutations were adaptive and enhanced virus replication (up to 100 fold) in mouse cells and/or lungs. All but one NS1 mutant was associated with increased virulence measured by survival and weight loss in the mouse. Ten of twelve NS1 mutants significantly enhanced IFN-β antagonism to reduce the level of IFN β production relative to HK-wt in infected mouse lungs at 1 day post infection, where 9 mutants induced viral yields in the lung that were equivalent to or significantly greater than HK-wt (up to 16 fold increase). Eight of 12 NS1 mutants had reduced or lost the ability to bind the 30 kDa cleavage and polyadenylation specificity factor (CPSF30) thus demonstrating a lack of correlation with reduced IFN β production. Mutant NS1 genes resulted in increased viral mRNA transcription (10 of 12 mutants), and protein production (6 of 12 mutants) in mouse cells. Increased transcription activity was demonstrated in the influenza mini-genome assay for 7 of 11 NS1 mutants. Although we have shown gain-of-function properties for all mutant NS genes, the contribution of the NEP mutations to phenotypic changes remains to be assessed. This study demonstrates that NS1 is a multifunctional virulence factor subject to adaptive evolution
De-identifying a public use microdata file from the Canadian national discharge abstract database
<p>Abstract</p> <p>Background</p> <p>The Canadian Institute for Health Information (CIHI) collects hospital discharge abstract data (DAD) from Canadian provinces and territories. There are many demands for the disclosure of this data for research and analysis to inform policy making. To expedite the disclosure of data for some of these purposes, the construction of a DAD public use microdata file (PUMF) was considered. Such purposes include: confirming some published results, providing broader feedback to CIHI to improve data quality, training students and fellows, providing an easily accessible data set for researchers to prepare for analyses on the full DAD data set, and serve as a large health data set for computer scientists and statisticians to evaluate analysis and data mining techniques. The objective of this study was to measure the probability of re-identification for records in a PUMF, and to de-identify a national DAD PUMF consisting of 10% of records.</p> <p>Methods</p> <p>Plausible attacks on a PUMF were evaluated. Based on these attacks, the 2008-2009 national DAD was de-identified. A new algorithm was developed to minimize the amount of suppression while maximizing the precision of the data. The acceptable threshold for the probability of correct re-identification of a record was set at between 0.04 and 0.05. Information loss was measured in terms of the extent of suppression and entropy.</p> <p>Results</p> <p>Two different PUMF files were produced, one with geographic information, and one with no geographic information but more clinical information. At a threshold of 0.05, the maximum proportion of records with the diagnosis code suppressed was 20%, but these suppressions represented only 8-9% of all values in the DAD. Our suppression algorithm has less information loss than a more traditional approach to suppression. Smaller regions, patients with longer stays, and age groups that are infrequently admitted to hospitals tend to be the ones with the highest rates of suppression.</p> <p>Conclusions</p> <p>The strategies we used to maximize data utility and minimize information loss can result in a PUMF that would be useful for the specific purposes noted earlier. However, to create a more detailed file with less information loss suitable for more complex health services research, the risk would need to be mitigated by requiring the data recipient to commit to a data sharing agreement.</p
Antiretroviral APOBEC3 cytidine deaminases alter HIV-1 provirus integration site profiles
Antiretroviral APOBEC3 may contribute to HIV-1 latency. In this study, Ajoge and Renner et al. identify a previously undescribed function of human APOBEC3 proteins in redirecting integrations of HIV-1 DNA into more transcriptionally inactive regions of the genome
Estimating the re-identification risk of clinical data sets
<p>Abstract</p> <p>Background</p> <p>De-identification is a common way to protect patient privacy when disclosing clinical data for secondary purposes, such as research. One type of attack that de-identification protects against is linking the disclosed patient data with public and semi-public registries. Uniqueness is a commonly used measure of re-identification risk under this attack. If uniqueness can be measured accurately then the risk from this kind of attack can be managed. In practice, it is often not possible to measure uniqueness directly, therefore it must be estimated.</p> <p>Methods</p> <p>We evaluated the accuracy of uniqueness estimators on clinically relevant data sets. Four candidate estimators were identified because they were evaluated in the past and found to have good accuracy or because they were new and not evaluated comparatively before: the Zayatz estimator, slide negative binomial estimator, Pitman’s estimator, and mu-argus. A Monte Carlo simulation was performed to evaluate the uniqueness estimators on six clinically relevant data sets. We varied the sampling fraction and the uniqueness in the population (the value being estimated). The median relative error and inter-quartile range of the uniqueness estimates was measured across 1000 runs.</p> <p>Results</p> <p>There was no single estimator that performed well across all of the conditions. We developed a decision rule which selected between the Pitman, slide negative binomial and Zayatz estimators depending on the sampling fraction and the difference between estimates. This decision rule had the best consistent median relative error across multiple conditions and data sets.</p> <p>Conclusion</p> <p>This study identified an accurate decision rule that can be used by health privacy researchers and disclosure control professionals to estimate uniqueness in clinical data sets. The decision rule provides a reliable way to measure re-identification risk.</p