25 research outputs found
A review of mechanistic learning in mathematical oncology
Mechanistic learning, the synergistic combination of knowledge-driven and
data-driven modeling, is an emerging field. In particular, in mathematical
oncology, the application of mathematical modeling to cancer biology and
oncology, the use of mechanistic learning is growing. This review aims to
capture the current state of the field and provide a perspective on how
mechanistic learning may further progress in mathematical oncology. We
highlight the synergistic potential of knowledge-driven mechanistic
mathematical modeling and data-driven modeling, such as machine and deep
learning. We point out similarities and differences regarding model complexity,
data requirements, outputs generated, and interpretability of the algorithms
and their results. Then, organizing combinations of knowledge- and data-driven
modeling into four categories (sequential, parallel, intrinsic, and extrinsic
mechanistic learning), we summarize a variety of approaches at the interface
between purely data- and knowledge-driven models. Using examples predominantly
from oncology, we discuss a range of techniques including physics-informed
neural networks, surrogate model learning, and digital twins. We see that
mechanistic learning, with its intentional leveraging of the strengths of both
knowledge and data-driven modeling, can greatly impact the complex problems of
oncology. Given the increasing ubiquity and impact of machine learning, it is
critical to incorporate it into the study of mathematical oncology with
mechanistic learning providing a path to that end. As the field of mechanistic
learning advances, we aim for this review and proposed categorization framework
to foster additional collaboration between the data- and knowledge-driven
modeling fields. Further collaboration will help address difficult issues in
oncology such as limited data availability, requirements of model transparency,
and complex input dat
Data-driven prediction of spinal cord injury recovery: An exploration of current status and future perspectives
Spinal Cord Injury (SCI) presents a significant challenge in rehabilitation medicine, with recovery outcomes varying widely among individuals. Machine learning (ML) is a promising approach to enhance the prediction of recovery trajectories, but its integration into clinical practice requires a thorough understanding of its efficacy and applicability. We systematically reviewed the current literature on data-driven models of SCI recovery prediction. The included studies were evaluated based on a range of criteria assessing the approach, implementation, input data preferences, and the clinical outcomes aimed to forecast. We observe a tendency to utilize routinely acquired data, such as International Standards for Neurological Classification of SCI (ISNCSCI), imaging, and demographics, for the prediction of functional outcomes derived from the Spinal Cord Independence Measure (SCIM) III and Functional Independence Measure (FIM) scores with a focus on motor ability. Although there has been an increasing interest in data-driven studies over time, traditional machine learning architectures, such as linear regression and tree-based approaches, remained the overwhelmingly popular choices for implementation. This implies ample opportunities for exploring architectures addressing the challenges of predicting SCI recovery, including techniques for learning from limited longitudinal data, improving generalizability, and enhancing reproducibility. We conclude with a perspective, highlighting possible future directions for data-driven SCI recovery prediction and drawing parallels to other application fields in terms of diverse data types (imaging, tabular, sequential, multimodal), data challenges (limited, missing, longitudinal data), and algorithmic needs (causal inference, robustness)
reComBat: Batch effect removal in large-scale, multi-source omics data integration
With the steadily increasing abundance of omics data produced all over the world, some-times decades apart and under vastly different experimental conditions residing in public databases, a crucial step in many data-driven bioinformatics applications is that of data integration. The challenge of batch effect removal for entire databases lies in the large number and coincide of both batches and desired, biological variation resulting in design matrix singularity. This problem currently cannot be solved by any common batch correction algorithm. In this study, we present reComBat , a regularised version of the empirical Bayes method to overcome this limitation. We demonstrate our approach for the harmonisation of public gene expression data of the human opportunistic pathogen Pseudomonas aeruginosa and study a several metrics to empirically demonstrate that batch effects are successfully mitigated while biologically meaningful gene expression variation is retained. reComBat fills the gap in batch correction approaches applicable to large scale, public omics databases and opens up new avenues for data driven analysis of complex biological processes beyond the scope of a single study
reComBat: batch-effect removal in large-scale multi-source gene-expression data integration
With the steadily increasing abundance of omics data produced all over the world under vastly different experimental conditions residing in public databases, a crucial step in many data-driven bioinformatics applications is that of data integration. The challenge of batch-effect removal for entire databases lies in the large number of batches and biological variation, which can result in design matrix singularity. This problem can currently not be solved satisfactorily by any common batch-correction algorithm.; We present; reComBat; , a regularized version of the empirical Bayes method to overcome this limitation and benchmark it against popular approaches for the harmonization of public gene-expression data (both microarray and bulkRNAsq) of the human opportunistic pathogen; Pseudomonas aeruginosa; . Batch-effects are successfully mitigated while biologically meaningful gene-expression variation is retained.; reComBat; fills the gap in batch-correction approaches applicable to large-scale, public omics databases and opens up new avenues for data-driven analysis of complex biological processes beyond the scope of a single study.; The code is available at https://github.com/BorgwardtLab/reComBat, all data and evaluation code can be found at https://github.com/BorgwardtLab/batchCorrectionPublicData.; Supplementary data are available at; Bioinformatics Advances; online
Machine learning to predict poor school performance in paediatric survivors of intensive care: a population-based cohort study
Purpose: Whilst survival in paediatric critical care has improved, clinicians lack tools capable of predicting long-term outcomes. We developed a machine learning model to predict poor school outcomes in children surviving intensive care unit (ICU). Methods: Population-based study of children < 16 years requiring ICU admission in Queensland, Australia, between 1997 and 2019. Failure to meet the National Minimum Standard (NMS) in the National Assessment Program-Literacy and Numeracy (NAPLAN) assessment during primary and secondary school was the primary outcome. Routine ICU information was used to train machine learning classifiers. Models were trained, validated and tested using stratified nested cross-validation.
Results: 13,957 childhood ICU survivors with 37,200 corresponding NAPLAN tests after a median follow-up duration of 6 years were included. 14.7%, 17%, 15.6% and 16.6% failed to meet NMS in school grades 3, 5, 7 and 9. The model demonstrated an Area Under the Receiver Operating Characteristic curve (AUROC) of 0.8 (standard deviation SD, 0.01), with 51% specificity to reach 85% sensitivity [relative Area Under the Precision Recall Curve (rel-AUPRC) 3.42, SD 0.06]. Socio-economic status, illness severity, and neurological, congenital, and genetic disorders contributed most to the predictions. In children with no comorbidities admitted between 2009 and 2019, the model achieved a AUROC of 0.77 (SD 0.03) and a rel-AUPRC of 3.31 (SD 0.42).
Conclusion: A machine learning model using data available at time of ICU discharge predicted failure to meet minimum educational requirements at school age. Implementation of this prediction tool could assist in prioritizing patients for follow-up and targeting of rehabilitative measures. Keywords: Child, Intensive care, Machine learning, Neurodevelopment, Schoo
Genitourinary α/β Ratios in the CHHiP Trial the Fraction Size Sensitivity of Late Genitourinary Toxicity: Analysis of Alpha/Beta (α/β) Ratios in the CHHiP Trial
PURPOSE: Moderately hypofractionated external beam intensity-modulated radiotherapy (IMRT) for prostate cancer is now standard-of-care. Normal tissue toxicity responses to fraction size alteration are non-linear: the linear-quadratic model is a widely-used framework accounting for this, through the α/β ratio. Few α/β ratio estimates exist for human late genitourinary endpoints; here we provide estimates derived from a hypofractionation trial. METHODS AND MATERIALS: The XXXXXX trial randomised 3216 men with localised prostate cancer 1:1:1 between conventionally fractionated IMRT (74Gy/37 fractions (Fr)) and two moderately hypofractionated regimens (60Gy/20Fr & 57Gy/19Fr). Radiotherapy plan and suitable follow-up assessment was available for 2206 men. Three prospectively assessed clinician-reported toxicity scales were amalgamated for common genitourinary endpoints: Dysuria, Haematuria, Incontinence, Reduced flow/Stricture, Urine Frequency. Per endpoint, only patients with baseline zero toxicity were included. Three models for endpoint grade ≥1 (G1+) and G2+ toxicity were fitted: Lyman Kutcher-Burman (LKB) without equivalent dose in 2Gy/Fr (EQD2) correction [LKB-NoEQD2]; LKB with EQD2-correction [LKB-EQD2]; LKB-EQD2 with dose-modifying-factor (DMF) inclusion [LKB-EQD2-DMF]. DMFs were: age, diabetes, hypertension, pelvic surgery, prior transurethral resection of prostate (TURP), overall treatment time and acute genitourinary toxicity (G2+). Bootstrapping generated 95% confidence intervals and unbiased performance estimates. Models were compared by likelihood ratio test. RESULTS: The LKB-EQD2 model significantly improved performance over LKB-NoEQD2 for just three endpoints: Dysuria G1+ (α/β=2.0 Gy, 95%CI 1.2-3.2Gy), Haematuria G1+ (α/β=0.9 Gy, 95%CI 0.1-2.2Gy) and Haematuria G2+ (α/β=0.6Gy, 95%CI 0.1-1.7Gy). For these three endpoints, further incorporation of two DMFs improved on LKB-EQD2: acute genitourinary toxicity and Prior TURP (Haematuria G1+ only), but α/β ratio estimates remained stable. CONCLUSIONS: Inclusion of EQD2-correction significantly improved model fitting for Dysuria and Haematuria endpoints, where fitted α/β ratio estimates were low: 0.6-2 Gy. This suggests therapeutic gain for clinician-reported GU toxicity, through hypofractionation, might be lower than expected by typical late α/β ratio assumptions of 3-5 Gy
A review of mechanistic learning in mathematical oncology
Mechanistic learning refers to the synergistic combination of mechanistic mathematical modeling and data-driven machine or deep learning. This emerging field finds increasing applications in (mathematical) oncology. This review aims to capture the current state of the field and provides a perspective on how mechanistic learning may progress in the oncology domain. We highlight the synergistic potential of mechanistic learning and point out similarities and differences between purely data-driven and mechanistic approaches concerning model complexity, data requirements, outputs generated, and interpretability of the algorithms and their results. Four categories of mechanistic learning (sequential, parallel, extrinsic, intrinsic) of mechanistic learning are presented with specific examples. We discuss a range of techniques including physics-informed neural networks, surrogate model learning, and digital twins. Example applications address complex problems predominantly from the domain of oncology research such as longitudinal tumor response predictions or time-to-event modeling. As the field of mechanistic learning advances, we aim for this review and proposed categorization framework to foster additional collaboration between the data- and knowledge-driven modeling fields. Further collaboration will help address difficult issues in oncology such as limited data availability, requirements of model transparency, and complex input data which are embraced in a mechanistic learning framewor
A Century of Fractionated Radiotherapy: How Mathematical Oncology Can Break the Rules
Radiotherapy is involved in 50% of all cancer treatments and 40% of cancer cures. Most of these treatments are delivered in fractions of equal doses of radiation (Fractional Equivalent Dosing (FED)) in days to weeks. This treatment paradigm has remained unchanged in the past century and does not account for the development of radioresistance during treatment. Even if under-optimized, deviating from a century of successful therapy delivered in FED can be difficult. One way of exploring the infinite space of fraction size and scheduling to identify optimal fractionation schedules is through mathematical oncology simulations that allow for in silico evaluation. This review article explores the evidence that current fractionation promotes the development of radioresistance, summarizes mathematical solutions to account for radioresistance, both in the curative and non-curative setting, and reviews current clinical data investigating non-FED fractionated radiotherapy
Studying missingness in spinal cord injury data: challenges and impact of data imputation
BACKGROUND
In the last decades, medical research fields studying rare conditions such as spinal cord injury (SCI) have made extensive efforts to collect large-scale data. However, most analysis methods rely on complete data. This is particularly troublesome when studying clinical data as they are prone to missingness. Often, researchers mitigate this problem by removing patients with missing data from the analyses. Less commonly, imputation methods to infer likely values are applied.
OBJECTIVE
Our objective was to study how handling missing data influences the results reported, taking the example of SCI registries. We aimed to raise awareness on the effects of missing data and provide guidelines to be applied for future research projects, in SCI research and beyond.
METHODS
Using the Sygen clinical trial data (n = 797), we analyzed the impact of the type of variable in which data is missing, the pattern according to which data is missing, and the imputation strategy (e.g. mean imputation, last observation carried forward, multiple imputation).
RESULTS
Our simulations show that mean imputation may lead to results strongly deviating from the underlying expected results. For repeated measures missing at late stages (> = 6 months after injury in this simulation study), carrying the last observation forward seems the preferable option for the imputation. This simulation study could show that a one-size-fit-all imputation strategy falls short in SCI data sets.
CONCLUSIONS
Data-tailored imputation strategies are required (e.g., characterisation of the missingness pattern, last observation carried forward for repeated measures evolving to a plateau over time). Therefore, systematically reporting the extent, kind and decisions made regarding missing data will be essential to improve the interpretation, transparency, and reproducibility of the research presented