77 research outputs found

    Federated learning enables big data for rare cancer boundary detection.

    Get PDF
    Although machine learning (ML) has shown promise across disciplines, out-of-sample generalizability is concerning. This is currently addressed by sharing multi-site data, but such centralization is challenging/infeasible to scale due to various limitations. Federated ML (FL) provides an alternative paradigm for accurate and generalizable ML, by only sharing numerical model updates. Here we present the largest FL study to-date, involving data from 71 sites across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, reporting the largest such dataset in the literature (n = 6, 314). We demonstrate a 33% delineation improvement for the surgically targetable tumor, and 23% for the complete tumor extent, over a publicly trained model. We anticipate our study to: 1) enable more healthcare studies informed by large diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further analyses for glioblastoma by releasing our consensus model, and 3) demonstrate the FL effectiveness at such scale and task-complexity as a paradigm shift for multi-site collaborations, alleviating the need for data-sharing

    Author Correction: Federated learning enables big data for rare cancer boundary detection.

    Get PDF
    10.1038/s41467-023-36188-7NATURE COMMUNICATIONS14

    Federated Learning Enables Big Data for Rare Cancer Boundary Detection

    Get PDF
    Although machine learning (ML) has shown promise across disciplines, out-of-sample generalizability is concerning. This is currently addressed by sharing multi-site data, but such centralization is challenging/infeasible to scale due to various limitations. Federated ML (FL) provides an alternative paradigm for accurate and generalizable ML, by only sharing numerical model updates. Here we present the largest FL study to-date, involving data from 71 sites across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, reporting the largest such dataset in the literature (n = 6, 314). We demonstrate a 33% delineation improvement for the surgically targetable tumor, and 23% for the complete tumor extent, over a publicly trained model. We anticipate our study to: 1) enable more healthcare studies informed by large diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further analyses for glioblastoma by releasing our consensus model, and 3) demonstrate the FL effectiveness at such scale and task-complexity as a paradigm shift for multi-site collaborations, alleviating the need for data-sharing

    A hierarchical spectral clustering and non-linear dimensionality reduction scheme for detection of prostate cancer from magnetic resonance spectroscopy:

    No full text
    Magnetic Resonance Spectroscopy (MRS) is a unique non-invasive method which has recently been shown to have great potential in screening of prostate cancer (CaP). MRS provides functional information regarding the concentrations of different biochemicals present in the prostate at single or multiple locations within a rectangular grid of spectra superposed on the structural T2-weighted Magnetic Resonance Imaging (MRI). Changes in relative concentration of specific metabolites including choline, creatine and citrate compared to "normal" levels is highly indicative of the presence of CaP. Most previous attempts at developing computerized schemes for automated prostate cancer detection using MRS have been centered on developing peak area quantification algorithms. These methods seek to obtain area under peaks corresponding to choline, creatine and citrate which is then used to compute relative concentrations of these metabolites. However, manual identification of metabolite peaks on the MR spectra, let alone via automated algorithms, is a challenging problem on account of low SNR, baseline irregularity, peak-overlap, and peak distortion. In this thesis work a novel computer aided detection (CAD) scheme for prostate MRS is presented that integrates non-linear dimensionality reduction (NLDR) with an unsupervised hierarchical clustering algorithm to automatically identify cancerous spectra. The methodology comprises of two specific aims. Aim 1 is to first automatically localize the prostate region followed in Aim 2 by automated cancer detection on the prostate obtained in Aim 1. In Aim 1, a hierarchical spectral clustering algorithm is used to distinguish between informative and non-informative spectra in order to localize the region of interest (ROI) corresponding to the prostate. Once the prostate ROI is localized, in Aim 2, a non-linear dimensionality reduction (NLDR) scheme in conjunction with a replicated k-means clustering algorithm is used to automatically discriminate between 3 classes of spectra (normal, CaP, and intermediate tissue classes). Results of qualitative and quantitative evaluation of the methodology over 18 1.5 Tesla (T) in-vivo prostate T2-w and MRS studies obtained from the multi-site, multi-institutional ACRIN trial, for which corresponding histological ground truth of spatial extent of CaP is available, reveal that the CAD scheme has a high detection sensitivity (89.60) and specificity (78.98). Results further suggest that the CAD scheme has a higher detection accuracy compared to such commonly used MRS analysis schemes as z-score and PCA.M.S.Includes bibliographical references (p. 47-49).by Pallavi Tiwar

    Quantitative integration of imaging and non-imaging data: application to integrating multi-parametric MRI for prostate cancer diagnosis, grading and treatment evaluation

    No full text
    The problem of data integration involving imaging and non-imaging modalities is largely unexplored in the biomedical eld, mainly due to the challenges in quantitatively combining such heterogeneous modalities existing in diff erent dimensions and scales. Although several methods have been proposed in the literature involving quantitative integration of multi-protocol imaging, there has been a paucity of similar biomedical tools for quantitative integration of imaging and non-imaging data. In this work, we present novel data integration schemes to overcome the aforementioned challenges limiting the integration of imaging and non-imaging modalities, and hence improve disease characterization. Our novel data integration methods are applied to integration of multi-parametric Magnetic Resonance (MR) imaging (MP-MRI)-structural MR imaging with metabolic spectroscopic information (non-imaging) for improved prostate cancer (CaP) diagnosis, grading, and treatment evaluation post-radiation therapy (RT). To this end, we have developed novel data integration schemes such as, Multimodal Wavelet Embedding Representation for data Combination (MaWERiC), and Semi-Supervised Multi-Kernel (SeSMiK) Graph Embedding, which fi rst uniformly represent individual data modalities into a common framework using dimensionality reduction and kernel embedding techniques, followed by a seamless integration of imaging and non-imaging data in the common framework. The integrated quantitative signatures thus obtained are shown to be signifi cantly more diagnostically informative as compared to any single modality. Similar improvement in results was observed using integrated MP-MRI signatures for evaluating radiation therapy related changes in CaP patients, with an aim to identify (a) pre-RT disease extent along with extra capsule spread (if any) and (b) residual disease on post-RT MP-MRI.Ph. D.Includes bibliographical referencesIncludes vitaby Pallavi Tiwar
    corecore