784 research outputs found

    Analyzing Patient Trajectories With Artificial Intelligence

    Full text link
    In digital medicine, patient data typically record health events over time (eg, through electronic health records, wearables, or other sensing technologies) and thus form unique patient trajectories. Patient trajectories are highly predictive of the future course of diseases and therefore facilitate effective care. However, digital medicine often uses only limited patient data, consisting of health events from only a single or small number of time points while ignoring additional information encoded in patient trajectories. To analyze such rich longitudinal data, new artificial intelligence (AI) solutions are needed. In this paper, we provide an overview of the recent efforts to develop trajectory-aware AI solutions and provide suggestions for future directions. Specifically, we examine the implications for developing disease models from patient trajectories along the typical workflow in AI: problem definition, data processing, modeling, evaluation, and interpretation. We conclude with a discussion of how such AI solutions will allow the field to build robust models for personalized risk scoring, subtyping, and disease pathway discovery

    Federated Learning on Heterogenous Data using Chest CT

    Full text link
    Large data have accelerated advances in AI. While it is well known that population differences from genetics, sex, race, diet, and various environmental factors contribute significantly to disease, AI studies in medicine have largely focused on locoregional patient cohorts with less diverse data sources. Such limitation stems from barriers to large-scale data share in medicine and ethical concerns over data privacy. Federated learning (FL) is one potential pathway for AI development that enables learning across hospitals without data share. In this study, we show the results of various FL strategies on one of the largest and most diverse COVID-19 chest CT datasets: 21 participating hospitals across five continents that comprise >10,000 patients with >1 million images. We present three techniques: Fed Averaging (FedAvg), Incremental Institutional Learning (IIL), and Cyclical Incremental Institutional Learning (CIIL). We also propose an FL strategy that leverages synthetically generated data to overcome class imbalances and data size disparities across centers. We show that FL can achieve comparable performance to Centralized Data Sharing (CDS) while maintaining high performance across sites with small, underrepresented data. We investigate the strengths and weaknesses for all technical approaches on this heterogeneous dataset including the robustness to non-Independent and identically distributed (non-IID) diversity of data. We also describe the sources of data heterogeneity such as age, sex, and site locations in the context of FL and show how even among the correctly labeled populations, disparities can arise due to these biases

    An Experimental Study of Class Imbalance in Federated Learning

    Full text link
    Federated learning is a distributed machine learning paradigm that trains a global model for prediction based on a number of local models at clients while local data privacy is preserved. Class imbalance is believed to be one of the factors that degrades the global model performance. However, there has been very little research on if and how class imbalance can affect the global performance. class imbalance in federated learning is much more complex than that in traditional non-distributed machine learning, due to different class imbalance situations at local clients. Class imbalance needs to be re-defined in distributed learning environments. In this paper, first, we propose two new metrics to define class imbalance -- the global class imbalance degree (MID) and the local difference of class imbalance among clients (WCS). Then, we conduct extensive experiments to analyze the impact of class imbalance on the global performance in various scenarios based on our definition. Our results show that a higher MID and a larger WCS degrade more the performance of the global model. Besides, WCS is shown to slow down the convergence of the global model by misdirecting the optimization

    The Use and Misuse of Biomedical Data: Is Bigger Really Better?”

    Get PDF
    Very large biomedical research databases, containing electronic health records (HER) and genomic data from millions of patients, have been heralded recently for their potential to accelerate scientific discovery and produce dramatic improvements in medical treatments. Research enabled by these databases may also lead to profound changes in law, regulation, social policy, and even litigation strategies. Yet, is “big data” necessarily better data? This paper makes an original contribution to the legal literature by focusing on what can go wrong in the process of biomedical database research and what precautions are necessary to avoid critical mistakes. We address three main reasons for a cautious approach to such research and to relying on its outcomes for purposes of public policy or litigation. First, the data contained in databases is surprisingly likely to be incorrect or incomplete. Second, systematic biases, arising from both the nature of the data and the preconceptions of investigators, are serious threats to the validity of biomedical database research, especially in answering causal questions. Third, data mining of biomedical databases makes it easier for individuals with political, social, or economic agendas to generate ostensibly scientific but misleading research findings for the purpose of manipulating public opinion and swaying policy makers. In short, this paper sheds much-needed light on the problems of credulous and uninformed uses of biomedical databases. An understanding of the pitfalls of big data analysis is of critical importance to anyone who will rely on or dispute its outcomes, including lawyers, policy makers, and the public at large. The article also recommends technical, methodological, and educational interventions to combat the dangers of database errors and abuses
    corecore