33 research outputs found

    FACT: Federated Adversarial Cross Training

    Full text link
    Federated Learning (FL) facilitates distributed model development to aggregate multiple confidential data sources. The information transfer among clients can be compromised by distributional differences, i.e., by non-i.i.d. data. A particularly challenging scenario is the federated model adaptation to a target client without access to annotated data. We propose Federated Adversarial Cross Training (FACT), which uses the implicit domain differences between source clients to identify domain shifts in the target domain. In each round of FL, FACT cross initializes a pair of source clients to generate domain specialized representations which are then used as a direct adversary to learn a domain invariant data representation. We empirically show that FACT outperforms state-of-the-art federated, non-federated and source-free domain adaptation models on three popular multi-source-single-target benchmarks, and state-of-the-art Unsupervised Domain Adaptation (UDA) models on single-source-single-target experiments. We further study FACT's behavior with respect to communication restrictions and the number of participating clients

    Chronic Kidney Disease Cohort Studies: A Guide to Metabolome Analyses

    Get PDF
    Kidney diseases still pose one of the biggest challenges for global health, and their heterogeneity and often high comorbidity load seriously hinders the unraveling of their underlying pathomechanisms and the delivery of optimal patient care. Metabolomics, the quantitative study of small organic compounds, called metabolites, in a biological specimen, is gaining more and more importance in nephrology research. Conducting a metabolomics study in human kidney disease cohorts, however, requires thorough knowledge about the key workflow steps: study planning, sample collection, metabolomics data acquisition and preprocessing, statistical/bioinformatics data analysis, and results interpretation within a biomedical context. This review provides a guide for future metabolomics studies in human kidney disease cohorts. We will offer an overview of important a priori considerations for metabolomics cohort studies, available analytical as well as statistical/bioinformatics data analysis techniques, and subsequent interpretation of metabolic findings. We will further point out potential research questions for metabolomics studies in the context of kidney diseases and summarize the main results and data availability of important studies already conducted in this field

    An R-Package for the Deconvolution and Integration of 1D NMR Data: MetaboDecon1D

    Get PDF
    NMR spectroscopy is a widely used method for the detection and quantification of metabolites in complex biological fluids. However, the large number of metabolites present in a biological sample such as urine or plasma leads to considerable signal overlap in one-dimensional NMR spectra, which in turn hampers both signal identification and quantification. As a consequence, we have developed an easy to use R-package that allows the fully automated deconvolution of overlapping signals in the underlying Lorentzian line-shapes. We show that precise integral values are computed, which are required to obtain both relative and absolute quantitative information. The algorithm is independent of any knowledge of the corresponding metabolites, which also allows the quantitative description of features of yet unknown identity

    Bucket Fuser: Statistical Signal Extraction for 1D 1H NMR Metabolomic Data

    Get PDF
    Untargeted metabolomics is a promising tool for identifying novel disease biomarkers and unraveling underlying pathomechanisms. Nuclear magnetic resonance (NMR) spectroscopy is particularly suited for large-scale untargeted metabolomics studies due to its high reproducibility and cost effectiveness. Here, one-dimensional (1D) 1H NMR experiments offer good sensitivity at reasonable measurement times. Their subsequent data analysis requires sophisticated data preprocessing steps, including the extraction of NMR features corresponding to specific metabolites. We developed a novel 1D NMR feature extraction procedure, called Bucket Fuser (BF), which is based on a regularized regression framework with fused group LASSO terms. The performance of the BF procedure was demonstrated using three independent NMR datasets and was benchmarked against existing state-of-the-art NMR feature extraction methods. BF dynamically constructs NMR metabolite features, the widths of which can be adjusted via a regularization parameter. BF consistently improved metabolite signal extraction, as demonstrated by our correlation analyses with absolutely quantified metabolites. It also yielded a higher proportion of statistically significant metabolite features in our differential metabolite analyses. The BF algorithm is computationally efficient and it can deal with small sample sizes. In summary, the Bucket Fuser algorithm, which is available as a supplementary python code, facilitates the fast and dynamic extraction of 1D NMR signals for the improved detection of metabolic biomarker

    Platform independent protein-based cell-of-origin subtyping of diffuse large B-cell lymphoma in formalin-fixed paraffin-embedded tissue

    Get PDF
    Diffuse large B-cell lymphoma (DLBCL) is commonly classified by gene expression profiling according to its cell of origin (COO) into activated B-cell (ABC)-like and germinal center B-cell (GCB)-like subgroups. Here we report the application of label-free nano-liquid chromatography - Sequential Window Acquisition of all THeoretical fragment-ion spectra - mass spectrometry (nanoLC-SWATH-MS) to the COO classification of DLBCL in formalin-fixed paraffin-embedded (FFPE) tissue. To generate a protein signature capable of predicting Affymetrix-based GCB scores, the summed log(2)-transformed fragment ion intensities of 780 proteins quantified in a training set of 42 DLBCL cases were used as independent variables in a penalized zero-sum elastic net regression model with variable selection. The eight-protein signature obtained showed an excellent correlation (r=0.873) between predicted and true GCB scores and yielded only 9 (21.4%) minor discrepancies between the three classifications: ABC, GCB, and unclassified. The robustness of the model was validated successfully in two independent cohorts of 42 and 31 DLBCL cases, the latter cohort comprising only patients aged >75 years, with Pearson correlation coefficients of 0.846 and 0.815, respectively, between predicted and NanoString nCounter based GCB scores. We further show that the 8-protein signature is directly transferable to both a triple quadrupole and a Q Exactive quadrupole-Orbitrap mass spectrometer, thus obviating the need for proprietary instrumentation and reagents. This method may therefore be used for robust and competitive classification of DLBCLs on the protein level

    Automated macrophage counting in DLBCL tissue samples: a ROF filter based approach

    Get PDF
    BackgroundFor analysis of the tumor microenvironment in diffuse large B-cell lymphoma (DLBCL) tissue samples, it is desirable to obtain information about counts and distribution of different macrophage subtypes. Until now, macrophage counts are mostly inferred from gene expression analysis of whole tissue sections, providing only indirect information. Direct analysis of immunohistochemically (IHC) fluorescence stained tissue samples is confronted with several difficulties, e.g. high variability of shape and size of target macrophages and strongly inhomogeneous intensity of staining. Consequently, application of commercial software is largely restricted to very rough analysis modes, and most macrophage counts are still obtained by manual counting in microarrays or high power fields, thus failing to represent the heterogeneity of tumor microenvironment adequately.MethodsWe describe a Rudin-Osher-Fatemi (ROF) filter based segmentation approach for whole tissue samples, combining floating intensity thresholding and rule-based feature detection. Method is validated against manual counts and compared with two commercial software kits (Tissue Studio 64, Definiens AG, and Halo, Indica Labs) and a straightforward machine-learning approach in a set of 50 test images. Further, the novel method and both commercial packages are applied to a set of 44 whole tissue sections. Outputs are compared with gene expression data available for the same tissue samples. Finally, the ROF based method is applied to 44 expert-specified tumor subregions for testing selection and subsampling strategies.ResultsAmong all tested methods, the novel approach is best correlated with manual count (0.9297). Automated detection of evaluation subregions proved to be fully reliable. Comparison with gene expression data obtained for the same tissue samples reveals only moderate to low correlation levels. Subsampling within tumor subregions is possible with results almost identical to full sampling. Mean macrophage size in tumor subregions is 152.5111.3 m(2).ConclusionsROF based approach is successfully applied to detection of IHC stained macrophages in DLBCL tissue samples. The method competes well with existing commercial software kits. In difference to them, it is fully automated, externally repeatable, independent on training data and completely documented. Comparison with gene expression data indicates that image morphometry constitutes an independent source of information about antibody-polarized macrophage occurence and distribution

    Reference point insensitive molecular data analysis

    Get PDF
    Motivation: In biomedicine, every molecular measurement is relative to a reference point, like a fixed aliquot of RNA extracted from a tissue, a defined number of blood cells, or a defined volume of biofluid. Reference points are often chosen for practical reasons. For example, we might want to assess the metabolome of a diseased organ but can only measure metabolites in blood or urine. In this case, the observable data only indirectly reflects the disease state. The statistical implications of these discrepancies in reference points have not yet been discussed. Results: Here, we show that reference point discrepancies compromise the performance of regression models like the LASSO. As an alternative, we suggest zero-sum regression for a reference point insensitive analysis. We show that zero-sum regression is superior to the LASSO in case of a poor choice of reference point both in simulations and in an application that integrates intestinal microbiome analysis with metabolomics. Moreover, we describe a novel coordinate descent based algorithm to fit zero-sum elastic nets

    BITES: Balanced Individual Treatment Effect for Survival data

    Get PDF
    Estimating the effects of interventions on patient outcome is one of the key aspects of personalized medicine. Their inference is often challenged by the fact that the training data comprises only the outcome for the administered treatment, and not for alternative treatments (the so-called counterfactual outcomes). Several methods were suggested for this scenario based on observational data, i.e.~data where the intervention was not applied randomly, for both continuous and binary outcome variables. However, patient outcome is often recorded in terms of time-to-event data, comprising right-censored event times if an event does not occur within the observation period. Albeit their enormous importance, time-to-event data is rarely used for treatment optimization. We suggest an approach named BITES (Balanced Individual Treatment Effect for Survival data), which combines a treatment-specific semi-parametric Cox loss with a treatment-balanced deep neural network; i.e.~we regularize differences between treated and non-treated patients using Integral Probability Metrics (IPM). We show in simulation studies that this approach outperforms the state of the art. Further, we demonstrate in an application to a cohort of breast cancer patients that hormone treatment can be optimized based on six routine parameters. We successfully validated this finding in an independent cohort. BITES is provided as an easy-to-use python implementation

    A multi-source data integration approach reveals novel associations between metabolites and renal outcomes in the German Chronic Kidney Disease study

    Get PDF
    Omics data facilitate the gain of novel insights into the pathophysiology of diseases and, consequently, their diagnosis, treatment, and prevention. To this end, omics data are integrated with other data types, e.g., clinical, phenotypic, and demographic parameters of categorical or continuous nature. We exemplify this data integration issue for a chronic kidney disease (CKD) study, comprising complex clinical, demographic, and one-dimensional H-1 nuclear magnetic resonance metabolic variables. Routine analysis screens for associations of single metabolic features with clinical parameters while accounting for confounders typically chosen by expert knowledge. This knowledge can be incomplete or unavailable. We introduce a framework for data integration that intrinsically adjusts for confounding variables. We give its mathematical and algorithmic foundation, provide a state-of-the-art implementation, and evaluate its performance by sanity checks and predictive performance assessment on independent test data. Particularly, we show that discovered associations remain significant after variable adjustment based on expert knowledge. In contrast, we illustrate that associations discovered in routine univariate screening approaches can be biased by incorrect or incomplete expert knowledge. Our data integration approach reveals important associations between CKD comorbidities and metabolites, including novel associations of the plasma metabolite trimethylamine-N-oxide with cardiac arrhythmia and infarction in CKD stage 3 patients
    corecore