7 research outputs found

    Generation of realistic synthetic data using Multimodal Neural Ordinary Differential Equations

    No full text
    Abstract Individual organizations, such as hospitals, pharmaceutical companies, and health insurance providers, are currently limited in their ability to collect data that are fully representative of a disease population. This can, in turn, negatively impact the generalization ability of statistical models and scientific insights. However, sharing data across different organizations is highly restricted by legal regulations. While federated data access concepts exist, they are technically and organizationally difficult to realize. An alternative approach would be to exchange synthetic patient data instead. In this work, we introduce the Multimodal Neural Ordinary Differential Equations (MultiNODEs), a hybrid, multimodal AI approach, which allows for generating highly realistic synthetic patient trajectories on a continuous time scale, hence enabling smooth interpolation and extrapolation of clinical studies. Our proposed method can integrate both static and longitudinal data, and implicitly handles missing values. We demonstrate the capabilities of MultiNODEs by applying them to real patient-level data from two independent clinical studies and simulated epidemiological data of an infectious disease

    Forecast Alzheimer's disease progression to better select patients for clinical trials

    No full text
    International audienceObjectivesSubject recruitment is a burden that hampers clinical trials, especially in neurodegenerative diseases, where worsening of abilities is subtle, long-term and heterogeneous. Targeting the right patients during trial screening is a way to reduce the needed sample size or conversely to improve the proven effect size.MethodsFrom Alzheimer’s disease (AD) observational cohorts, we selected longitudinal data that matched AD trials (inclusion and exclusion criteria, trial duration and primary endpoint). We modeled EMERGE, a phase 3 trial in pre-clinical AD, and a mild AD trial, using 4 research cohorts (ADNI, Memento, PharmaCog, AIBL). For each patient, we simulated its treated counterpart by applying an individual treatment effect. It consisted in a linear improvement of outcome for effective decliners, calibrated on our data so to match the expected trial effect size. Next, we built a multimodal AD course map that grasped long-term disease progression in a mixed-effects fashion [1] with Leaspy. We used it to forecast never-seen individuals’ outcomes from their screening biomarkers. Based on these individual screening predictions, we selected clinically relevant sub-groups [2]. Finally, we compared the effective sample size that would have been needed for the trial, with and without our selections. We evaluated dispersion of this metric using a bootstrap procedure.ResultsIn all investigated setups and cohorts, we found a decrease in needed sample sizes with selection. For EMERGE trial, we showed that selecting patients having a predicted CDR-SoB changed between 0.5 and 1.5 points per year enabled to reduce the needed sample size by 38.2 ± 3.3 %. For the mild AD trial, we showed that selecting patients having a predicted MMSE changed between 1 and 2 points per year enabled to reduce the needed sample size by 38.9 ± 2.2 %.ConclusionsWe build a modelling framework for forecasting individual outcomes from their multimodal screening assessments. Using them as an extra inclusion criterion in clinical trials, we can better control trial population and thus reduce the needed sample size for a given treatment effect

    Forecast Alzheimer's disease progression to better select patients for clinical trials

    No full text
    Objectives Subject recruitment is a burden that hampers clinical trials, especially in neurodegenerative diseases, where worsening of abilities is subtle, long-term and heterogeneous. Targeting the right patients during trial screening is a way to reduce the needed sample size or conversely to improve the proven effect size. Methods From Alzheimer’s disease (AD) observational cohorts, we selected longitudinal data that matched AD trials (inclusion and exclusion criteria, trial duration and primary endpoint). We modeled EMERGE, a phase 3 trial in pre-clinical AD, and a mild AD trial, using 4 research cohorts (ADNI, Memento, PharmaCog, AIBL). For each patient, we simulated its treated counterpart by applying an individual treatment effect. It consisted in a linear improvement of outcome for effective decliners, calibrated on our data so to match the expected trial effect size. Next, we built a multimodal AD course map that grasped long-term disease progression in a mixed-effects fashion [1] with Leaspy. We used it to forecast never-seen individuals’ outcomes from their screening biomarkers. Based on these individual screening predictions, we selected clinically relevant sub-groups [2]. Finally, we compared the effective sample size that would have been needed for the trial, with and without our selections. We evaluated dispersion of this metric using a bootstrap procedure. Results In all investigated setups and cohorts, we found a decrease in needed sample sizes with selection. For EMERGE trial, we showed that selecting patients having a predicted CDR-SoB changed between 0.5 and 1.5 points per year enabled to reduce the needed sample size by 38.2 ± 3.3 %. For the mild AD trial, we showed that selecting patients having a predicted MMSE changed between 1 and 2 points per year enabled to reduce the needed sample size by 38.9 ± 2.2 %. Conclusions We build a modelling framework for forecasting individual outcomes from their multimodal screening assessments. Using them as an extra inclusion criterion in clinical trials, we can better control trial population and thus reduce the needed sample size for a given treatment effect

    Forecast Alzheimer's disease progression to better select patients for clinical trials

    No full text
    International audienceObjectivesSubject recruitment is a burden that hampers clinical trials, especially in neurodegenerative diseases, where worsening of abilities is subtle, long-term and heterogeneous. Targeting the right patients during trial screening is a way to reduce the needed sample size or conversely to improve the proven effect size.MethodsFrom Alzheimer’s disease (AD) observational cohorts, we selected longitudinal data that matched AD trials (inclusion and exclusion criteria, trial duration and primary endpoint). We modeled EMERGE, a phase 3 trial in pre-clinical AD, and a mild AD trial, using 4 research cohorts (ADNI, Memento, PharmaCog, AIBL). For each patient, we simulated its treated counterpart by applying an individual treatment effect. It consisted in a linear improvement of outcome for effective decliners, calibrated on our data so to match the expected trial effect size. Next, we built a multimodal AD course map that grasped long-term disease progression in a mixed-effects fashion [1] with Leaspy. We used it to forecast never-seen individuals’ outcomes from their screening biomarkers. Based on these individual screening predictions, we selected clinically relevant sub-groups [2]. Finally, we compared the effective sample size that would have been needed for the trial, with and without our selections. We evaluated dispersion of this metric using a bootstrap procedure.ResultsIn all investigated setups and cohorts, we found a decrease in needed sample sizes with selection. For EMERGE trial, we showed that selecting patients having a predicted CDR-SoB changed between 0.5 and 1.5 points per year enabled to reduce the needed sample size by 38.2 ± 3.3 %. For the mild AD trial, we showed that selecting patients having a predicted MMSE changed between 1 and 2 points per year enabled to reduce the needed sample size by 38.9 ± 2.2 %.ConclusionsWe build a modelling framework for forecasting individual outcomes from their multimodal screening assessments. Using them as an extra inclusion criterion in clinical trials, we can better control trial population and thus reduce the needed sample size for a given treatment effect

    Forecasting individual progression trajectories in Alzheimer’s disease

    No full text
    International audienceThe anticipation of progression of Alzheimer’s disease (AD) is crucial for evaluations of secondary prevention measures thought to modify the disease trajectory. However, it is difficult to forecast the natural progression of AD, notably because several functions decline at different ages and different rates in different patients. We evaluate here AD Course Map, a statistical model predicting the progression of neuropsychological assessments and imaging biomarkers for a patient from current medical and radiological data at early disease stages. We tested the method on more than 96,000 cases, with a pool of more than 4,600 patients from four continents. We measured the accuracy of the method for selecting participants displaying a progression of clinical endpoints during a hypothetical trial. We show that enriching the population with the predicted progressors decreases the required sample size by 38% to 50%, depending on trial duration, outcome, and targeted disease stage, from asymptomatic individuals at risk of AD to subjects with early and mild AD. We show that the method introduces no biases regarding sex or geographic locations and is robust to missing data. It performs best at the earliest stages of disease and is therefore highly suitable for use in prevention trials

    STonKGs: A Sophisticated Transformer Trained on Biomedical Text and Knowledge Graphs

    Get PDF
    The majority of biomedical knowledge is stored in structured databases or as unstructured text in scientific publications. This vast amount of information has led to numerous machine learning-based biological applications using either text through natural language processing (NLP) or structured data through knowledge graph embedding models (KGEMs). However, representations based on a single modality are inherently limited. To generate better representations of biological knowledge, we propose STonKGs, a Sophisticated Transformer trained on biomedical text and Knowledge Graphs. This multimodal Transformer uses combined input sequences of structured information from KGs and unstructured text data from biomedical literature to learn joint representations. First, we pre-trained STonKGs on a knowledge base assembled by the Integrated Network and Dynamical Reasoning Assembler (INDRA) consisting of millions of text-triple pairs extracted from biomedical literature by multiple NLP systems. Then, we benchmarked STonKGs against two baseline models trained on either one of the modalities (i.e., text or KG) across eight different classification tasks, each corresponding to a different biological application. Our results demonstrate that STonKGs outperforms both baselines, especially on the more challenging tasks with respect to the number of classes, improving upon the F1-score of the best baseline by up to 0.083. Additionally, our pre-trained model as well as the model architecture can be adapted to various other transfer learning applications. Finally, the source code and pre-trained STonKGs models are available at https://github.com/stonkgs/stonkgs and https://huggingface.co/stonkgs/stonkgs-150k

    Differences in cohort study data affect external validation of artificial intelligence models for predictive diagnostics of dementia - lessons for translation into clinical practice

    Get PDF
    Artificial intelligence (AI) approaches pose a great opportunity for individualized, pre-symptomatic disease diagnosis which plays a key role in the context of personalized, predictive, and finally preventive medicine (PPPM). However, to translate PPPM into clinical practice, it is of utmost importance that AI-based models are carefully validated. The validation process comprises several steps, one of which is testing the model on patient-level data from an independent clinical cohort study. However, recruitment criteria can bias statistical analysis of cohort study data and impede model application beyond the training data. To evaluate whether and how data from independent clinical cohort studies differ from each other, this study systematically compares the datasets collected from two major dementia cohorts, namely, the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and AddNeuroMed. The presented comparison was conducted on individual feature level and revealed significant differences among both cohorts. Such systematic deviations can potentially hamper the generalizability of results which were based on a single cohort dataset. Despite identified differences, validation of a previously published, ADNI trained model for prediction of personalized dementia risk scores on 244 AddNeuroMed subjects was successful: External validation resulted in a high prediction performance of above 80% area under receiver operator characteristic curve up to 6 years before dementia diagnosis. Propensity score matching identified a subset of patients from AddNeuroMed, which showed significantly smaller demographic differences to ADNI. For these patients, an even higher prediction performance was achieved, which demonstrates the influence systematic differences between cohorts can have on validation results. In conclusion, this study exposes challenges in external validation of AI models on cohort study data and is one of the rare cases in the neurology field in which such external validation was performed. The presented model represents a proof of concept that reliable models for personalized predictive diagnostics are feasible, which, in turn, could lead to adequate disease prevention and hereby enable the PPPM paradigm in the dementia field
    corecore