52 research outputs found

    The Costs of Anonymization: Case Study Using Clinical Data

    Get PDF
    Background: Sharing data from clinical studies can accelerate scientific progress, improve transparency, and increase the potential for innovation and collaboration. However, privacy concerns remain a barrier to data sharing. Certain concerns, such as reidentification risk, can be addressed through the application of anonymization algorithms, whereby data are altered so that it is no longer reasonably related to a person. Yet, such alterations have the potential to influence the data set’s statistical properties, such that the privacy-utility trade-off must be considered. This has been studied in theory, but evidence based on real-world individual-level clinical data is rare, and anonymization has not broadly been adopted in clinical practice. Objective: The goal of this study is to contribute to a better understanding of anonymization in the real world by comprehensively evaluating the privacy-utility trade-off of differently anonymized data using data and scientific results from the German Chronic Kidney Disease (GCKD) study. Methods: The GCKD data set extracted for this study consists of 5217 records and 70 variables. A 2-step procedure was followed to determine which variables constituted reidentification risks. To capture a large portion of the risk-utility space, we decided on risk thresholds ranging from 0.02 to 1. The data were then transformed via generalization and suppression, and the anonymization process was varied using a generic and a use case–specific configuration. To assess the utility of the anonymized GCKD data, general-purpose metrics (ie, data granularity and entropy), as well as use case–specific metrics (ie, reproducibility), were applied. Reproducibility was assessed by measuring the overlap of the 95% CI lengths between anonymized and original results. Results: Reproducibility measured by 95% CI overlap was higher than utility obtained from general-purpose metrics. For example, granularity varied between 68.2% and 87.6%, and entropy varied between 25.5% and 46.2%, whereas the average 95% CI overlap was above 90% for all risk thresholds applied. A nonoverlapping 95% CI was detected in 6 estimates across all analyses, but the overwhelming majority of estimates exhibited an overlap over 50%. The use case–specific configuration outperformed the generic one in terms of actual utility (ie, reproducibility) at the same level of privacy. Conclusions: Our results illustrate the challenges that anonymization faces when aiming to support multiple likely and possibly competing uses, while use case–specific anonymization can provide greater utility. This aspect should be taken into account when evaluating the associated costs of anonymized data and attempting to maintain sufficiently high levels of privacy for anonymized data

    Bucket Fuser: Statistical Signal Extraction for 1D 1H NMR Metabolomic Data

    Get PDF
    Untargeted metabolomics is a promising tool for identifying novel disease biomarkers and unraveling underlying pathomechanisms. Nuclear magnetic resonance (NMR) spectroscopy is particularly suited for large-scale untargeted metabolomics studies due to its high reproducibility and cost effectiveness. Here, one-dimensional (1D) 1H NMR experiments offer good sensitivity at reasonable measurement times. Their subsequent data analysis requires sophisticated data preprocessing steps, including the extraction of NMR features corresponding to specific metabolites. We developed a novel 1D NMR feature extraction procedure, called Bucket Fuser (BF), which is based on a regularized regression framework with fused group LASSO terms. The performance of the BF procedure was demonstrated using three independent NMR datasets and was benchmarked against existing state-of-the-art NMR feature extraction methods. BF dynamically constructs NMR metabolite features, the widths of which can be adjusted via a regularization parameter. BF consistently improved metabolite signal extraction, as demonstrated by our correlation analyses with absolutely quantified metabolites. It also yielded a higher proportion of statistically significant metabolite features in our differential metabolite analyses. The BF algorithm is computationally efficient and it can deal with small sample sizes. In summary, the Bucket Fuser algorithm, which is available as a supplementary python code, facilitates the fast and dynamic extraction of 1D NMR signals for the improved detection of metabolic biomarker

    Genetic studies of paired metabolomes reveal enzymatic and transport processes at the interface of plasma and urine.

    Get PDF
    The kidneys operate at the interface of plasma and urine by clearing molecular waste products while retaining valuable solutes. Genetic studies of paired plasma and urine metabolomes may identify underlying processes. We conducted genome-wide studies of 1,916 plasma and urine metabolites and detected 1,299 significant associations. Associations with 40% of implicated metabolites would have been missed by studying plasma alone. We detected urine-specific findings that provide information about metabolite reabsorption in the kidney, such as aquaporin (AQP)-7-mediated glycerol transport, and different metabolomic footprints of kidney-expressed proteins in plasma and urine that are consistent with their localization and function, including the transporters NaDC3 (SLC13A3) and ASBT (SLC10A2). Shared genetic determinants of 7,073 metabolite-disease combinations represent a resource to better understand metabolic diseases and revealed connections of dipeptidase 1 with circulating digestive enzymes and with hypertension. Extending genetic studies of the metabolome beyond plasma yields unique insights into processes at the interface of body compartments

    Blood pressure control in chronic kidney disease: A cross-sectional analysis from the German Chronic Kidney Disease (GCKD) study

    Get PDF
    We assessed the prevalence, awareness, treatment and control of hypertension in patients with moderate chronic kidney disease (CKD) under nephrological care in Germany. In the German Chronic Kidney Disease (GCKD) study, 5217 patients under nephrology specialist care were enrolled from 2010 to 2012 in a prospective observational cohort study. Inclusion criteria were an estimated glomerular filtration rate (eGFR) of 30 +/- 60 mL/min/1.73 m 2 or overt proteinuria in the presence of an eGFR> 60 mL/min/1.73 m 2. Office blood pressure was measured by trained study personnel in a standardized way and hypertension awareness and medication were assessed during standardized interviews. Blood pressure was considered as controlled if systolic 90%. However, only 2456 (49.3%) of the hypertensive patients had controlled blood pressure. About half (51.0%) of the patients with uncontrolled blood pressure met criteria for resistant hypertension. Factors associated with better odds for controlled blood pressure in multivariate analyses included younger age, female sex, higher income, low or absent proteinuria, and use of certain classes of antihypertensive medication. We conclude that blood pressure control of CKD patients remains challenging even in the setting of nephrology specialist care, despite high rates of awareness and medication use

    PCSK9 and Cardiovascular Disease in Individuals with Moderately Decreased Kidney Function

    No full text
    Background and objectives: Proprotein convertase subtilisin/kexin type 9 (PCSK9) is a key regulator of lipid homeostasis. Studies investigating the association between PCSK9 and cardiovascular disease in large cohorts of patients with CKD are limited. Design, setting, participants, & measurements: The association of PCSK9 concentrations with prevalent and incident cardiovascular disease was investigated in 5138 White participants of the German Chronic Kidney Disease study with a median follow-up of 6.5 years. Inclusion criteria were eGFR of 30-60 or >60 ml/min per 1.73 m2 in the presence of overt proteinuria (urine albumin-creatinine ratio >300 mg/g or equivalent). Prevalent cardiovascular disease was defined as a history of nonfatal myocardial infarction, coronary artery bypass grafting, percutaneous transluminal coronary angioplasty, carotid arteries interventions, and stroke. Incident major adverse cardiovascular disease events included death from cardiovascular causes, acute nonfatal myocardial infarction, and nonfatal stroke. Results: Median PCSK9 concentration in the cohort was 285 ng/ml (interquartile range, 231-346 ng/ml). There was no association between PCSK9 concentrations and baseline eGFR and albuminuria. With each 100-ng/ml increment of PCSK9, the odds for prevalent cardiovascular disease (n=1284) were 1.22-fold (95% confidence interval, 1.12 to 1.34; P1.75; P=0.01). In addition, PCSK9 showed a valuable gain in classification accuracy for both prevalent cardiovascular disease (net reclassification index =0.27; 95% confidence interval, 0.20 to 0.33) and incident major adverse cardiovascular disease events during follow-up (net reclassification index =0.10; 95% confidence interval, 0.01 to 0.21) when added to an extended adjustment model. Conclusions: Our findings reveal no relation of PCSK9 with baseline eGFR and albuminuria but a significant association between higher PCSK9 concentrations and risk of cardiovascular disease independent of traditional risk factors, including LDL cholesterol level
    corecore