6,700 research outputs found
Anonymizing datasets with demographics and diagnosis codes in the presence of utility constraints
Publishing data about patients that contain both demographics and diagnosis codes is essential to perform large-scale, low-cost medical studies. However, preserving the privacy and utility of such data is challenging, because it requires: (i) guarding against identity disclosure (re-identification) attacks based on both demographics and diagnosis codes, (ii) ensuring that the anonymized data remain useful in intended analysis tasks, and (iii) minimizing the information loss, incurred by anonymization, to preserve the utility of general analysis tasks that are difficult to determine before data publishing. Existing anonymization approaches are not suitable for being used in this setting, because they cannot satisfy all three requirements. Therefore, in this work, we propose a new approach to deal with this problem. We enforce the requirement (i) by applying (k; k^m)-anonymity, a privacy principle that prevents re-identification from attackers who know the demographics of a patient and up to m of their diagnosis codes, where k and m are tunable parameters. To capture the requirement (ii), we propose the concept of utility constraint for both demographics and diagnosis codes. Utility constraints limit the amount of generalization and are specified by data owners (e.g., the healthcare institution that performs anonymization). We also capture requirement (iii), by employing well-established information loss measures for demographics and for diagnosiscodes. To realize our approach, we develop an algorithm that enforces (k; k^m)-anonymity on a dataset containing both demographics and diagnosis codes, in a way that satisfies the specified utility constraints and with minimal information loss, according to the measures. Our experiments with a large dataset containing more than 200; 000 electronic health recordsshow the effectiveness and efficiency of our algorithm
Race and “Hotspots” of Preventable Hospitalizations
Abstract
Preventable hospitalizations (PHs) are those for ambulatory care-sensitive conditions that indicate insufficiencies in local primary healthcare. PH rates tend to be higher among African Americans, in urban centers, rural areas and areas with more African American residents. The objective of this study is to determine geographic clusters of high PH rates (“spatial clusters”) by race. Data from Maryland hospitals were utilized to determine the rates of PHs in zip code tabulation areas (ZCTAs) by race in 2010. Geographic clusters of ZCTAs with higher than expected PH rates were identified using Scan Statistic and Anselin’s Local Moran’s I. 10 PH spatial clusters were observed among the total population with an average PH rate of 3,046.6 per 100,000 population. Among whites, the average PH rate was 3,339.9 per 100,000 in 11 PH spatial clusters. Only five PH spatial clusters were observed among African Americans with a higher average PH rate (3,710.8 per 100,000). The locations and other characteristics of PH spatial clusters differed by race. These results can be used to target resources to areas with high PH rates. Because PH spatial clusters are observed in differing locations for African Americans, approaches that include cultural tailoring may need to be specifically targeted
Recommended from our members
Understanding Disease Heterogeneity and Patient Characteristics in Patients with Amyotrophic Lateral Sclerosis (ALS)
Background: Amytrophic lateral sclerosis (ALS) is a fatal neurologic disease that is projected to double in worldwide incidence in the next 20 years. The heterogenic nature of the disease and relatively limited research data, compared to non-rare diseases, have made it difficult for clinician researchers to alter the course of the disease within the short life expectancy after symptom onset. Method: This was a mixed-method retrospective review and live sampling study using three distinct data sources. Retrospective data was abstracted from the electronic medical record systems for a select group of ALS patients seen at the University of California, Irvine Neuromuscular Center (UCI NMC). Additional retrospective datasets curated by the Pooled Resources Open-Access Clinical Trials (PRO-ACT) database were also analyzed. Observational data was collected using a 9-item survey developed on Google Forms and disseminated through the ALS Association Golden West Chapter. The items measured symptom onset, diagnostic journey, and patient demographics.Results: The analyses confirmed current reports of higher disease incidence in Caucasian populations, usually comprising at least 60% of each dataset. The gender prevalence towards males was only observed in the PRO-ACT dataset. There was also a difference in mean age between PRO-ACT (56 years), UCI (61 years), and Online Questionnaire respondents (66 years). Discussion: Ultimately retrospective data analyses were limited by substantial missing, not at random data. Large data repositories can bridge the gap between non-rare and rare disease research, but only with robust and methodologic data collection across all participating sites
Algorithms to anonymize structured medical and healthcare data:A systematic review
Introduction: With many anonymization algorithms developed for structured medical health data (SMHD) in the last decade, our systematic review provides a comprehensive bird’s eye view of algorithms for SMHD anonymization. Methods: This systematic review was conducted according to the recommendations in the Cochrane Handbook for Reviews of Interventions and reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). Eligible articles from the PubMed, ACM digital library, Medline, IEEE, Embase, Web of Science Collection, Scopus, ProQuest Dissertation, and Theses Global databases were identified through systematic searches. The following parameters were extracted from the eligible studies: author, year of publication, sample size, and relevant algorithms and/or software applied to anonymize SMHD, along with the summary of outcomes. Results: Among 1,804 initial hits, the present study considered 63 records including research articles, reviews, and books. Seventy five evaluated the anonymization of demographic data, 18 assessed diagnosis codes, and 3 assessed genomic data. One of the most common approaches was k-anonymity, which was utilized mainly for demographic data, often in combination with another algorithm; e.g., l-diversity. No approaches have yet been developed for protection against membership disclosure attacks on diagnosis codes. Conclusion: This study reviewed and categorized different anonymization approaches for MHD according to the anonymized data types (demographics, diagnosis codes, and genomic data). Further research is needed to develop more efficient algorithms for the anonymization of diagnosis codes and genomic data. The risk of reidentification can be minimized with adequate application of the addressed anonymization approaches. Systematic Review Registration: [http://www.crd.york.ac.uk/prospero], identifier [CRD42021228200].</p
Temporal Subtyping of Alzheimer's Disease Using Medical Conditions Preceding Alzheimer's Disease Onset in Electronic Health Records
Subtyping of Alzheimer's disease (AD) can facilitate diagnosis, treatment,
prognosis and disease management. It can also support the testing of new
prevention and treatment strategies through clinical trials. In this study, we
employed spectral clustering to cluster 29,922 AD patients in the OneFlorida
Data Trust using their longitudinal EHR data of diagnosis and conditions into
four subtypes. These subtypes exhibit different patterns of progression of
other conditions prior to the first AD diagnosis. In addition, according to the
results of various statistical tests, these subtypes are also significantly
different with respect to demographics, mortality, and prescription medications
after the AD diagnosis. This study could potentially facilitate early detection
and personalized treatment of AD as well as data-driven generalizability
assessment of clinical trials for AD.Comment: 10 page
Discovering Patient Phenotypes Using Generalized Low Rank Models
The practice of medicine is predicated on discovering commonalities or distinguishing characteristics among patients
to inform corresponding treatment. Given a patient grouping (hereafter referred to as a p henotype ), clinicians can
implement a treatment pathway accounting for the underlying cause of disease in that phenotype. Traditionally,
phenotypes have been discovered by intuition, experience in practice, and advancements in basic science, but these
approaches are often heuristic, labor intensive, and can take decades to produce actionable knowledge. Although our
understanding of disease has progressed substantially in the past century, there are still important domains in which
our phenotypes are murky, such as in behavioral health or in hospital settings. To accelerate phenotype discovery,
researchers have used machine learning to find patterns in electronic health records, but have often been thwarted by
missing data, sparsity, and data heterogeneity. In this study, we use a flexible framework called Generalized Low
Rank Modeling (GLRM) to overcome these barriers and discover phenotypes in two sources of patient data. First, we
analyze data from the 2010 Healthcare Cost and Utilization Project National Inpatient Sample (NIS), which contains
upwards of 8 million hospitalization records consisting of administrative codes and demographic information. Second,
we analyze a small (N=1746), local dataset documenting the clinical progression of autism spectrum disorder patients using granular features from the electronic health record, including text from physician notes. We demonstrate that
low rank modeling successfully captures known and putative phenotypes in these vastly different datasets
- …