285 research outputs found
Federated and distributed learning applications for electronic health records and structured medical data: A scoping review
Federated learning (FL) has gained popularity in clinical research in recent
years to facilitate privacy-preserving collaboration. Structured data, one of
the most prevalent forms of clinical data, has experienced significant growth
in volume concurrently, notably with the widespread adoption of electronic
health records in clinical practice. This review examines FL applications on
structured medical data, identifies contemporary limitations and discusses
potential innovations. We searched five databases, SCOPUS, MEDLINE, Web of
Science, Embase, and CINAHL, to identify articles that applied FL to structured
medical data and reported results following the PRISMA guidelines. Each
selected publication was evaluated from three primary perspectives, including
data quality, modeling strategies, and FL frameworks. Out of the 1160 papers
screened, 34 met the inclusion criteria, with each article consisting of one or
more studies that used FL to handle structured clinical/medical data. Of these,
24 utilized data acquired from electronic health records, with clinical
predictions and association studies being the most common clinical research
tasks that FL was applied to. Only one article exclusively explored the
vertical FL setting, while the remaining 33 explored the horizontal FL setting,
with only 14 discussing comparisons between single-site (local) and FL (global)
analysis. The existing FL applications on structured medical data lack
sufficient evaluations of clinically meaningful benefits, particularly when
compared to single-site analyses. Therefore, it is crucial for future FL
applications to prioritize clinical motivations and develop designs and
methodologies that can effectively support and aid clinical practice and
research
Analyzing Patient Trajectories With Artificial Intelligence
In digital medicine, patient data typically record health events over time (eg, through electronic health records, wearables, or other sensing technologies) and thus form unique patient trajectories. Patient trajectories are highly predictive of the future course of diseases and therefore facilitate effective care. However, digital medicine often uses only limited patient data, consisting of health events from only a single or small number of time points while ignoring additional information encoded in patient trajectories. To analyze such rich longitudinal data, new artificial intelligence (AI) solutions are needed. In this paper, we provide an overview of the recent efforts to develop trajectory-aware AI solutions and provide suggestions for future directions. Specifically, we examine the implications for developing disease models from patient trajectories along the typical workflow in AI: problem definition, data processing, modeling, evaluation, and interpretation. We conclude with a discussion of how such AI solutions will allow the field to build robust models for personalized risk scoring, subtyping, and disease pathway discovery
Privacy-preserving patient clustering for personalized federated learning
Federated Learning (FL) is a machine learning framework that enables multiple
organizations to train a model without sharing their data with a central
server. However, it experiences significant performance degradation if the data
is non-identically independently distributed (non-IID). This is a problem in
medical settings, where variations in the patient population contribute
significantly to distribution differences across hospitals. Personalized FL
addresses this issue by accounting for site-specific distribution differences.
Clustered FL, a Personalized FL variant, was used to address this problem by
clustering patients into groups across hospitals and training separate models
on each group. However, privacy concerns remained as a challenge as the
clustering process requires exchange of patient-level information. This was
previously solved by forming clusters using aggregated data, which led to
inaccurate groups and performance degradation. In this study, we propose
Privacy-preserving Community-Based Federated machine Learning (PCBFL), a novel
Clustered FL framework that can cluster patients using patient-level data while
protecting privacy. PCBFL uses Secure Multiparty Computation, a cryptographic
technique, to securely calculate patient-level similarity scores across
hospitals. We then evaluate PCBFL by training a federated mortality prediction
model using 20 sites from the eICU dataset. We compare the performance gain
from PCBFL against traditional and existing Clustered FL frameworks. Our
results show that PCBFL successfully forms clinically meaningful cohorts of
low, medium, and high-risk patients. PCBFL outperforms traditional and existing
Clustered FL frameworks with an average AUC improvement of 4.3% and AUPRC
improvement of 7.8%
Federated Learning for Mortality Prediction in Intensive Care Units
Federated learning is a method to train a machine learning model on multiple remote datasets without the need to gather the data from the remote sites to a central location. In healthcare, gathering the data from different hospitals into a central location can be a difficult and time-consuming task, due to privacy concerns and regulations regarding the use of sensitive data, making federated learning an attractive alternative to more traditional methods.
This thesis adapted an existing federated gradient boosting model and developed a new federated random forest model and applied them to mortality prediction in intensive care units. The results were then compared to the centralized counterparts of the models.
The results showed that while the federated models did not perform as well as the centralized models on a similar sized dataset, the federated random forest model can achieve superior performance when trained on multiple hospitals' data compared to centralized models trained on a single hospital. In scenarios where the centralized models had data from multiple hospitals the federated models could not perform as well as the centralized models. It was also found that the performance of the centralized models could not be improved with further federated training. In addition to practical advantages such as possibility of parallel or asynchronous training without modifications to the algorithm, the federated random forest performed better in all scenarios compared to the federated gradient boosting. The performance of the federated random forest was also found to be more consistent over different scenarios than the performance of federated gradient boosting, which was highly dependent on factors such as the order with the hospitals were traversed
- …