609 research outputs found

    Using machine learning to predict individual severity estimates of alcohol withdrawal syndrome in patients with alcohol dependence

    Get PDF
    Despite its high prevalence in diverse clinical settings, treatment of alcohol withdrawal syndrome (AWS) is mainly based on subjective clinical opinion. Without reliable predictors of potential harmful AWS outcomes at the individual patient’s level, decisions like provision of pharmacotherapy rely on resource-intensive in-patient monitoring. By contrast, an accurate risk prognosis would enable timely preemptive treatment, open up possibilities for safe out-patient care and lead to a more efficient use of health care resources. The aim of this project was to develop such tools using clinical and patient-reported information easily attainable at patient’s admission. To this end, a machine learning framework incorporating nested cross-validation, ensemble learning, and external validation was developed to retrieve accurate, generalizable prediction models for three meaningful AWS outcomes: (1) Separating mild and more severe AWS as defined by the established AWS scale, and directly identifying patients at risk of (2) delirium tremens as well as (3) withdrawal seizures. Based on 121 sociodemographic, clinical and laboratory-based variables, that were retrieved retrospectively from the patients’ charts, this classification paradigm was used to build predictive models in two cohorts of AWS patients at major detoxification wards in Munich (Ludwig-Maximilian-Universität München, n=389; Technische Universität München, n=805). Moderate to severe AWS cases were predicted with significant balanced accuracy (BAC) in both cohorts (LMU, BAC = 69.4%; TU, BAC = 55.9%). A post-hoc association between the models’ poor outcome predictions and higher clomethiazole doses further added to their clinical validity. While delirium tremens cases were accurately identified in the TU cohort (BAC = 75%), the framework yielded no significant model for withdrawal seizures. Variable importance analyses revealed that predictive patterns highly varied between both treatment sites and withdrawal outcomes. Besides several previously described variables (most notably, low platelet count and cerebral brain lesions), several new predictors were identified (history of blood pressure abnormalities, positive urine-based benzodiazepine screening and years of schooling), emphasizing the utility of data-driven, hypothesis-free prediction approaches. Due to limitations of the datasets as well as site-specific patient characteristics, the models did not generalize across treatment sites, highlighting the need to conduct strict validation procedures before implementing prediction tools in clinical care. In conclusion, this dissertation provides evidence on the utility of machine learning methods to enable personalized risk predictions for AWS severity. More specifically, nested-cross validation and ensemble learning could be used to ensure generalizable, clinically applicable predictions in future prospective research based on multi-center collaboration.Die prädiktive Einschätzung der Ausprägung von Entzugssymptomen bei Patient*innen mit Alkoholabhängigkeit beruht trotz jahrzehntelanger wissenschaftlicher Bemühungen weiterhin auf subjektiver klinischer Einschätzung. Entgiftungsbehandlungen werden daher weltweit vorwiegend im stationären Rahmen durchgeführt, um eine engmaschige klinische Überwachung zu gewährleisten. Da über 90 % der Entzugssyndrome mit lediglich milder vegetativer Symptomatik verlaufen, bindet dieses Vorgehen wertvolle Ressourcen. Datenbasierte Prädiktionstools könnten einen wichtigen Beitrag in Richtung einer individualisierten, akkuraten und verlässlichen Verlaufsbeurteilung leisten. Diese würde sichere ambulante Behandlungskonzepte, prophylaktische medikamentöse Behandlungen von Risikopatient*innen, sowie innovative Behandlungsforschung basierend auf stratifizierten Risikogruppen ermöglichen. Das Ziel dieser Arbeit war die Entwicklung solcher prädiktiven Tools für Patient*innen mit Alkoholentzugssyndrom (AES). Hierfür wurde ein innovatives Machine Learning Paradigma unter Verwendung von strikten Validierungsmethoden (Nested Cross-Validation und Out-of-Sample External Validation) verwendet, um generalisierbare, akkurate Prädiktionsmodelle für drei bedeutsame klinische Endpunkte des AES zu entwickeln: (1) die Klassifikation von milden in Abgrenzung zu moderat bis schwer ausgeprägten AES Verläufen, definiert nach einer hierfür etablierten klinischen Skala (AES Skala), sowie die direkte Identifikation der Komplikationen (2) Delirium tremens (DT) sowie von (3) zerebralen Entzugsanfällen (WS). Dieses Paradigma wurde unter Verwendung von 121 retrospektiv erfassten klinischen, laborbasierten, sowie soziodemographischen Variablen auf 1194 Patient*innen mit Alkoholabhängigkeit an zwei großen Entgiftungsstationen in München angewandt (Ludwig-Maximilian-Universität München, n=389; Technische Universität München, n=805). Moderate bis schwere AES Verläufe konnten an beiden Behandlungszentren mit einer signifikanten Genauigkeit (balanced accuracy [BAC]) prädiziert werden (LMU, BAC = 69.4%; TU, BAC = 55.9%). In einer post-hoc Analyse war die Prädiktion moderater bis schwerer Verläufe zudem mit höheren kumulativen Clomethiazol-Dosen assoziiert, was für die klinische Validität der Modelle spricht. Während DT in der TU Kohorte mit einer hohen Genauigkeit (BAC = 75%) identifiziert werden konnte, war die Prädiktion von Entzugsanfällen nicht erfolgreich. Eine explorative Analyse konnte zeigen, dass die prädiktive Bedeutsamkeit einzelner Variable sowohl zwischen den Behandlungszentren als auch den einzelnen Endpunkten deutlich variierte. Neben mehreren bereits in früheren wissenschaftlichen Arbeiten beschriebenen prädiktiv wertvollen Variablen (insbesondere einer durchschnittlich niedrigeren Thrombozytenzahl im Blut sowie von strukturellen zerebralen Läsionen) konnten hierbei mehrere neue Prädiktoren identifiziert werden (Blutdruckauffälligkeiten in der Vorgeschichte; positives Urinscreening auf Benzodiazepine; Anzahl der Schuljahre). Diese Ergebnisse unterstreichen den Wert von datenbasierten, hypothesen-freien Prädiktionsansätzen. Aufgrund von Limitationen des retrospektiven Datensatzes, wie der fehlenden zentrumsübergreifenden Verfügbarkeit einiger Variablen, sowie klinischen Besonderheiten der beiden Kohorten, ließen sich die Modelle am jeweils anderen Behandlungszentrum nicht validieren. Dieses Ergebnis unterstreicht die Notwendigkeit, die Generalisierbarkeit von Prädiktionsergebnissen adäquat zu testen, bevor hierauf basierende Tools für die klinische Praxis empfohlen werden. Solche Methoden wurden im Rahmen dieser Arbeit erstmalig in einem Forschungsprojekt zum AES verwendet. Zusammenfassend, zeigen die Ergebnisse dieser Dissertation erstmalig einen Nutzen von Machine Learning Ansätzen zur individualisierten Risikoprädiktion schwerer AES Verläufe an. Das hierbei verwendete cross-validierte Machine Learning Paradigma wäre ein mögliches Analyseverfahren, um in künftigen prospektiven Multi-Center-Studien verlässliche Prädikationsergebnisse mit hohem klinischen Anwendungspotential zu erreichen

    Clinical prediction modelling in oral health: A review of study quality and empirical examples of model development

    Get PDF
    Background Substantial efforts have been made to improve the reproducibility and reliability of scientific findings in health research. These efforts include the development of guidelines for the design, conduct and reporting of preclinical studies (ARRIVE), clinical trials (ROBINS-I, CONSORT), observational studies (STROBE), and systematic reviews and meta-analyses (PRISMA). In recent years, the use of prediction modelling has increased in the health sciences. Clinical prediction models use information at the individual patient level to estimate the probability of a health outcome(s). Such models offer the potential to assist in clinical decision-making and to improve medical care. Guidelines such as PROBAST (Prediction model Risk Of Bias Assessment Tool) have been recently published to further inform the conduct of prediction modelling studies. Related guidelines for the reporting of these studies, such as TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis) instrument, have also been developed. Since the early 2000s, oral health prediction models have been used to predict the risk of various types of oral conditions, including dental caries, periodontal diseases and oral cancers. However, there is a lack of information on the methodological quality and reporting transparency of the published oral health prediction modelling studies. As a consequence, and due to the unknown quality and reliability of these studies, it remains unclear to what extent it is possible to generalise their findings and to replicate their derived models. Moreover, there remains a need to demonstrate the conduct of prediction modelling studies in oral health field following the contemporary guidelines. This doctoral project addresses these issues using two systematic reviews and two empirical analyses. This thesis is the first comprehensive and systematic project reviewing the study quality and demonstrating the use of registry data and longitudinal cohorts to develop clinical prediction models in oral health. Aims • To identify and examine the quality of existing prediction modelling studies in the major fields of oral health.• To demonstrate the conduct and reporting of a prediction modelling study following current guidelines, incorporating machine learning algorithms and accounting for multiple sources of biases. Methods As one of the most prevalent oral conditions, chronic periodontitis was chosen as the exemplar pathology for the first part of this thesis. A systematic review was conducted to investigate the existing prediction models for the incidence and progression of this condition. Based upon this initial overview, a more comprehensive critical review was conducted to assess the methodological quality and completeness of reporting for prediction modelling studies in the field of oral health. The risk of bias in the existing literature was assessed using the PROBAST criteria, and the quality of study reporting was measured in accordance with the TRIPOD guidelines. Following these two reviews, this research project demonstrated the conduct and reporting of a clinical prediction modelling study using two empirical examples. Two types of analyses that are commonly used for two different types of outcome data were adopted: survival analysis for censored outcomes and logistic regression analysis for binary outcomes. Models were developed to 1) predict the three- and five-year disease-specific survival of patients with oral and pharyngeal cancers, based on 21,154 cases collected by a large cancer registry program in the US, the Surveillance, Epidemiology and End Results (SEER) program, and 2) to predict the occurrence of acute and persistent pain following root canal treatment, based on the electronic dental records of 708 adult patients collected by the National Practice-Based Research Network. In these two case studies, all prediction models were developed in five steps: (i) framing the research question; (ii) data acquisition and pre-processing; (iii) model generation; (iv) model validation and performance evaluation; and (v) model presentation and reporting. In accordance with the PROBAST recommendations, the risk of bias during the modelling process was reduced in the following aspects: • In the first case study, three types of biases were taken into account: (i) bias due to missing data was reduced by adopting compatible methods to conduct imputation; (ii) bias due to unmeasured predictors was tested by sensitivity analysis; and (iii) bias due to the initial choice of modelling approach was addressed by comparing tree-based machine learning algorithms (survival tree, random survival forest and conditional inference forest) with the traditional statistical model (Cox regression). • In the second case study, the following strategies were employed: (i) missing data were addressed by multiple imputation with missing indicator methods; (ii) a multilevel logistic regression approach was adopted for model development in order to fit Table of Contents xi the hierarchical structure of the data; (iii) model complexity was reduced using the Least Absolute Shrinkage and Selection Operator (LASSO) for predictor selection; and (iv) the models’ predictive performance was evaluated comprehensively by using the Area Under the Precision Recall Curve (AUPRC) in addition to the Area Under the Receiver Operating Characteristic curve (AUROC); (v) finally, and most importantly, given the existing criticism in the research community concerning the gender-based and racial bias in risk prediction models, we compared the models’ predictive performance built with different sets of predictors (including a clinical set, a sociodemographic set and a combination of both, the ‘general’ set). Results The first and second review studies indicated that, in the field of oral health, the popularity of multivariable prediction models has increased in recent years. Bias and variance are two components of the uncertainty (e.g., the mean squared error) in model estimation. However, the majority of the existing studies did not account for various sources of bias, such as measurement error and inappropriate handling of missing data. Moreover, non-transparent reporting and lack of reproducibility of the models were also identified in the existing oral health prediction modelling studies. These findings provided motivation to conduct two case studies aimed at demonstrating adherence to the contemporary guidelines and to best practice. In the third study, comparable predictive capabilities between Cox regression and the non-parametric tree-based machine learning algorithms were observed for predicting the survival of patients with oral and pharyngeal cancers. For example, the C-index for a Cox model and a random survival forest in predicting three-year survival were 0.82 and 0.84, respectively. A novelty of this study was the development of an online calculator designed to provide an open and transparent estimation of patients’ survival probability for up to five years after diagnosis. This calculator has clinical translational potential and could aid in patient stratification and treatment planning, at least in the context of ongoing research. In addition, the transparent reporting of this study was achieved by following the TRIPOD checklist and sharing all data and codes. In the fourth study, LASSO regression suggested that pre-treatment clinical factors were important in the development of one-week and six-month postoperative pain following root canal treatment. Among all the developed multilevel logistic models, models with a clinical set of predictors yielded similar predictive performance to models with a general set of predictors, while the models with sociodemographic predictors showed the weakest predictive ability. For example, for predicting one-week postoperative pain, the AUROC for models with clinical, sociodemographic and general predictors were 0.82, 0.68 and 0,84, respectively, and the AUPRC were 0.66, 0.40 and 0.72, respectively. Conclusion The significance of this research project is twofold. First, prediction models have been developed for potential clinical use in the context of various oral conditions. Second, this research represents the first attempt to standardise the conduct of this type of studies in oral health research. This thesis presents three conclusions: 1) Adherence to contemporary best practice guidelines such as PROBAST and TRIPOD is limited in the field of oral health research. In response, this PhD project disseminates these guidelines and leverages their advantages to develop effective prediction models for use in dentistry and oral health. 2) Use of appropriate procedures, accounting for and adapting to multiple sources of bias in model development, produces predictive tools of increased reliability and accuracy that hold the potential to be implemented in clinical practice. Therefore, for future prediction modelling research, it is important that data analysts work towards eliminating bias, regardless of the areas in which the models are employed. 3) Machine learning algorithms provide alternatives to traditional statistical models for clinical prediction purposes. Additionally, in the presence of clinical factors, sociodemographic characteristics contribute less to the improvement of models’ predictive performance or to providing cogent explanations of the variance in the models, regardless of the modelling approach. Therefore, it is timely to reconsider the use of sociodemographic characteristics in clinical prediction modelling research. It is suggested that this is a proportionate and evidence based strategy aimed at reducing biases in healthcare risk prediction that may be derived from gender and racial characteristics inherent in sociodemographic data sets.Thesis (Ph.D.) -- University of Adelaide, School of Public Health, 202

    Environmental and Statistical Performance Mapping Model for Underwater Acoustic Detection Systems

    Get PDF
    This manuscript describes a methodology to combine environmental models, acoustic signal predictions, statistical detection models and operations research to form a framework for calculating and communicating performance. This methodology has been applied to undersea target detection systems and has come to be known as Performance Surface modeling. The term Performance Surface refers to a geo-spatial representation of the predicted performance of one or more sensors constrained by all-source forecasts for a geophysical area of operations. Recent improvements in ocean, atmospheric and underwater acoustic models, along with advances in parallel computing provide an opportunity to forecast the effects of a complex and dynamic acoustic environment on undersea target detection system performance. This manuscript describes a new process that calculates performance in a straight-forward sonar-equation manner utilizing spatially complex and temporally dynamic environmental models. This performance model is constructed by joining environmental acoustic signal predictions with a detection model to form a probabilistic prediction which is then combined with probabilities of target location to produce conditional, joint and marginal probabilities. These joint and marginal probabilities become the scalar estimates of system performance. This manuscript contains two invited articles recently accepted for publication. The first article describes the Performance Surface model development with sections on current applications and future extensions to a more stochastic model. The second article is written from the operational perspective of a Naval commanding officer with co-authors from the active force. Performance Surface tools have been demonstrated at the Naval Oceanographic Office (NAVOCEANO) and the Naval Oceanographic Anti-Submarine Warfare (ASW) Center (NOAC) in support of recent naval exercises. The model also has recently been a major representation for the performance layer of the Naval Meteorological and Oceanographic Command (NAVMETOCCOM) in its Battlespace on Demand strategy for supporting the Fleet with oceanographic products

    Environmental and Statistical Performance Mapping Model for Underwater Acoustic Detection Systems

    Get PDF
    This manuscript describes a methodology to combine environmental models, acoustic signal predictions, statistical detection models and operations research to form a framework for calculating and communicating performance. This methodology has been applied to undersea target detection systems and has come to be known as Performance Surface modeling. The term Performance Surface refers to a geo-spatial representation of the predicted performance of one or more sensors constrained by all-source forecasts for a geophysical area of operations. Recent improvements in ocean, atmospheric and underwater acoustic models, along with advances in parallel computing provide an opportunity to forecast the effects of a complex and dynamic acoustic environment on undersea target detection system performance. This manuscript describes a new process that calculates performance in a straight-forward sonar-equation manner utilizing spatially complex and temporally dynamic environmental models. This performance model is constructed by joining environmental acoustic signal predictions with a detection model to form a probabilistic prediction which is then combined with probabilities of target location to produce conditional, joint and marginal probabilities. These joint and marginal probabilities become the scalar estimates of system performance. This manuscript contains two invited articles recently accepted for publication. The first article describes the Performance Surface model development with sections on current applications and future extensions to a more stochastic model. The second article is written from the operational perspective of a Naval commanding officer with co-authors from the active force. Performance Surface tools have been demonstrated at the Naval Oceanographic Office (NAVOCEANO) and the Naval Oceanographic Anti-Submarine Warfare (ASW) Center (NOAC) in support of recent naval exercises. The model also has recently been a major representation for the performance layer of the Naval Meteorological and Oceanographic Command (NAVMETOCCOM) in its Battlespace on Demand strategy for supporting the Fleet with oceanographic products

    Prediction of Medical Outcomes with Modern Modelling Techniques

    Get PDF
    Het doel van dit onderzoek is te onderzoeken onder welke omstandigheden en onder welke condities relatief moderne modelleringstechnieken zoals support vector machines, neural networks en random forests voordelen zouden kunnen hebben in medisch-wetenschappelijk onderzoek en in de medische praktijk in vergelijking met meer traditionele modelleringstechnieken, zoals lineaire regressie, logistische regressie en Cox regressie

    Machine Learning Applied to Clinical Laboratory Data in Spain for COVID-19 Outcome Prediction: Model Development and Validation

    Get PDF
    Artículo publicado en la revista "JOURNAL OF MEDICAL INTERNET RESEARCH", indexada en JCR (2019) en el Q1 de "HEALTH CARE SCIENCES & SERVICES" (5/102) y "MEDICAL INFORMATICS" (2/27); Impact Factor 5.03.Background: The COVID-19 pandemic is probably the greatest health catastrophe of the modern era. Spain’s health care system has been exposed to uncontrollable numbers of patients over a short period, causing the system to collapse. Given that diagnosis is not immediate, and there is no effective treatment for COVID-19, other tools have had to be developed to identify patients at the risk of severe disease complications and thus optimize material and human resources in health care. There are no tools to identify patients who have a worse prognosis than others. Objective: This study aimed to process a sample of electronic health records of patients with COVID-19 in order to develop a machine learning model to predict the severity of infection and mortality from among clinical laboratory parameters. Early patient classification can help optimize material and human resources, and analysis of the most important features of the model could provide more detailed insights into the disease. Methods: After an initial performance evaluation based on a comparison with several other well-known methods, the extreme gradient boosting algorithm was selected as the predictive method for this study. In addition, Shapley Additive Explanations was used to analyze the importance of the features of the resulting model. Results: After data preprocessing, 1823 confirmed patients with COVID-19 and 32 predictor features were selected. On bootstrap validation, the extreme gradient boosting classifier yielded a value of 0.97 (95% CI 0.96-0.98) for the area under the receiver operator characteristic curve, 0.86 (95% CI 0.80-0.91) for the area under the precision-recall curve, 0.94 (95% CI 0.92-0.95) for accuracy, 0.77 (95% CI 0.72-0.83) for the F-score, 0.93 (95% CI 0.89-0.98) for sensitivity, and 0.91 (95% CI 0.86-0.96) for specificity. The 4 most relevant features for model prediction were lactate dehydrogenase activity, C-reactive protein levels, neutrophil counts, and urea levels. Conclusions: Our predictive model yielded excellent results in the differentiating among patients who died of COVID-19, primarily from among laboratory parameter values. Analysis of the resulting model identified a set of features with the most significant impact on the prediction, thus relating them to a higher risk of mortality

    Integration of Prior Biological Knowledge into Support Vector Machines

    Get PDF
    Ein Ziel der klinischen Krebsforschung ist es, neue, prognostische Gensignaturen zu finden, die den klinischen Verlauf der Krankheit vorhersagen können. Um neue Gensignaturen oder Biomarker zu identizieren, nutzt man in der Bioinformatik oft Klassikationsmethoden. Allerdings verwenden die üblicherweise eingesetzten Verfahren ausschließlich Genexpressionsdaten und sehen Gene als unabhängig an. Mehrere, vor kurzem veröffentlichte, Studien konnten jedoch zeigen, dass sich die Qualität der Klassikation steigern lässt, wenn man Netzwerkwissen in den Klassikationsprozess einfließen lässt. Neben einem verbesserten Klassikationsergebnis wurde auch gezeigt, dass die ausgewählten Gene besser zu interpretieren sind und dass die Selektion der Gene stabiler wird. Aus diesen Gründen beschäftigt sich die vorliegende Arbeit mit Methoden, die die Vorhersagegenauigkeit verbessern indem sie neben Genexpressionsdaten auch Netzwerkwissen für die Klassikation berücksichtigen. Die Arbeit gibt einen Überblick über bestehende Methoden, die in der Lage sind, Netzwerkwissen in die Klassikation einfließen zu lassen sowie über Datenbanken die solches Wissen speichern. Außerdem beschreibt die Arbeit die Entwicklung einer neuen, netzwerkbasierten Klassikationsmethode, die in der Lage ist, die Konnektivität der Gene zu berücksichtigen. Die 'Support Vector Machine' (SVM) wurde als Grundlage des neuen Algorithmus ausgewählt. Normalerweise ist die SVM nicht in der Lage eine Genselektion durchzuführen, d.h. sie nutzt immer alle Gene um einen bestimmten Endpunkt vorherzusagen. Man kann die SVM allerdings mit dem 'Recursive Feature Elimination' (RFE) Algorithmus kombinieren, um eine Genselektion zu ermöglichen. RFE selektiert Gene anhand ihres Einflusses auf die, von der SVM gefundenen, Hyperebene. Das Sortierkriterium von RFE wurde mit einer modizierten Version von Google's PageRank-Algorithmus verändert. Die abgewandelte Version von PageRank nennt sich GeneRank und errechnet, basierend auf einem Graphen der aus einer Protein-Protein Interaktionsdatenbank erstellt wurde, ein Gewicht für jedes Gen. Dieses Gewicht wurde mit dem Sortierkriterium von RFE kombiniert, um das Netzwerkwissen in die Sortierung der Gene und damit in die Klassifikation zu integrieren. Wegen dieser Neugewichtung wurde der neuentwickelte Algorithmus 'Reweighted Recursive Feature Elimination' (RRFE) genannt. RRFE verfolgt die Annahme, dass Gene, die nur eine geringe Änderung in ihrer Expression aufweisen, die Chance haben sollten einen gesteigerten Einfluss auf die Klassikation zu nehmen, wenn sie stark vernetzt sind. Diese Annahme wurde durch die Kombination von GeneRank und RFE umgesetzt. Dadurch hilft RRFE den zugrundeliegenden, biologischen Vorgang besser zu verstehen. Außerdem trägt RRFE dazu bei, den Anteil an ungenutzen Informationen in den Daten zu verringern und funktionell wichtige Gene zu identifizieren. RRFE wurde auf einem integrierten und vier unabhängigen Brustkrebsdatensätzen getestet. Die Datensätze bestehen zusammen aus fast 800 Patienten. RRFE wurde verwendet, um den ERBB2-Status sowie das Risiko eines Brustkrebsrückfalls vorherzusagen. In den Analysen zeigte sich eine verbesserte Interpretierbarkeit und Stabilität der selektierten Gene. Desweiteren konnte auch die Genauigkeit der Klassikation gegenüber standard- sowie netzwerkbasierten Klassifikatoren gesteigert werden. Neben den theoretischen Grundlagen von RRFE stellt die Arbeit auch ein neues R-Paket vor, welches die Implementierungen von RRFE und weiterer netzwerkbasierter Klassikationsmethoden enthält. Ziel war es, die Nutzung von RRFE und anderen Methoden zu vereinfachen, um Entwicklern die Möglichkeit zu geben, die Güte ihrer neuentwickelten Algorithmen mit bereits bestehenden Verfahren zu vergleichen. Das Software-Paket beinhaltet Funktionen, welche zum Vergleichen von Klassikationsmethoden, dem Erstellen von Grafiken und zur Indentifizierung von Genen, die maßgeblich zur Klassikation beigetragen haben, nötig sind

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Preoperative 18F-Fdg Pet/CT and CT Radiomics for Identifying Aggressive Histopathological Subtypes in Early Stage Lung Adenocarcinoma

    Get PDF
    Lung adenocarcinoma (ADC) is the most common non-small cell lung cancer. Surgical resection is the primary treatment for early-stage lung ADC while lung-sparing surgery is an alternative for non-aggressive cases. Identifying histopathologic subtypes before surgery helps determine the optimal surgical approach. Predominantly solid or micropapillary (MIP) subtypes are aggressive and associated with a higher likelihood of recurrence and metastasis and lower survival rates. This study aims to non-invasively identify these aggressive subtypes using preoperative 18F-FDG PET/CT and diagnostic CT radiomics analysis. We retrospectively studied 119 patients with stage I lung ADC and tumors ≤ 2 cm, where 23 had aggressive subtypes (18 solid and 5 MIPs). Out of 214 radiomic features from the PET/CT and CT scans and 14 clinical parameters, 78 significant features (3 CT and 75 PET features) were identified through univariate analysis and hierarchical clustering with minimized feature collinearity. A combination of Support Vector Machine classifier and Least Absolute Shrinkage and Selection Operator built predictive models. Ten iterations of 10-fold cross-validation (10 ×10-fold CV) evaluated the model. A pair of texture feature (PET GLCM Correlation) and shape feature (CT Sphericity) emerged as the best predictor. The radiomics model significantly outperformed the conventional predictor SUVmax (accuracy: 83.5% vs. 74.7%, p = 9e-9) and identified aggressive subtypes by evaluating FDG uptake in the tumor and tumor shape. It also demonstrated a high negative predictive value of 95.6% compared to SUVmax (88.2%, p = 2e-10). The proposed radiomics approach could reduce unnecessary extensive surgeries for non-aggressive subtype patients, improving surgical decision-making for early-stage lung ADC patients

    Heterogeneidad tumoral en imágenes PET-CT

    Get PDF
    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Ciencias Físicas, Departamento de Estructura de la Materia, Física Térmica y Electrónica, leída el 28/01/2021Cancer is a leading cause of morbidity and mortality [1]. The most frequent cancers worldwide are non–small cell lung carcinoma (NSCLC) and breast cancer [2], being their management a challenging task [3]. Tumor diagnosis is usually made through biopsy [4]. However, medical imaging also plays an important role in diagnosis, staging, response to treatment, and recurrence assessment [5]. Tumor heterogeneity is recognized to be involved in cancer treatment failure, with worse clinical outcomes for highly heterogeneous tumors [6,7]. This leads to the existence of tumor sub-regions with different biological behavior (some more aggressive and treatment-resistant than others) [8-10]. Which are characterized by a different pattern of vascularization, vessel permeability, metabolism, cell proliferation, cell death, and other features, that can be measured by modern medical imaging techniques, including positron emission tomography/computed tomography (PET/CT) [10-12]. Thus, the assessment of tumor heterogeneity through medical images could allow the prediction of therapy response and long-term outcomes of patients with cancer [13]. PET/CT has become essential in oncology [14,15] and is usually evaluated through semiquantitative metabolic parameters, such as maximum/mean standard uptake value (SUVmax, SUVmean) or metabolic tumor volume (MTV), which are valuables as prognostic image-based biomarkers in several tumors [16-17], but these do not assess tumor heterogeneity. Likewise, fluorodeoxyglucose (18F-FDG) PET/CT is important to differentiate malignant from benign solitary pulmonary nodules (SPN), reducing so the number of patients who undergo unnecessary surgical biopsies. Several publications have shown that some quantitative image features, extracted from medical images, are suitable for diagnosis, tumor staging, the prognosis of treatment response, and long-term evolution of cancer patients [18-20]. The process of extracting and relating image features with clinical or biological variables is called “Radiomics” [9,20-24]. Radiomic parameters, such as textural features have been related directly to tumor heterogeneity [25]. This thesis investigated the relationships of the tumor heterogeneity, assessed by 18F-FDG-PET/CT texture analysis, with metabolic parameters and pathologic staging in patients with NSCLC, and explored the diagnostic performance of different metabolic, morphologic, and clinical criteria for classifying (malignant or not) of solitary pulmonary nodules (SPN). Furthermore, 18F-FDG-PET/CT radiomic features of patients with recurrent/metastatic breast cancer were used for constructing predictive models of response to the chemotherapy, based on an optimal combination of several feature selection and machine learning (ML) methods...El cáncer es una de las principales causas de morbilidad y mortalidad. Los más frecuentes son el carcinoma de pulmón de células no pequeñas (NSCLC) y el cáncer de mama, siendo su tratamiento un reto. El diagnóstico se suele realizar mediante biopsia. La heterogeneidad tumoral (HT) está implicada en el fracaso del tratamiento del cáncer, con peores resultados clínicos para tumores muy heterogéneos. Esta conduce a la existencia de subregiones tumorales con diferente comportamiento biológico (algunas más agresivas y resistentes al tratamiento); las cuales se caracterizan por diferentes patrones de vascularización, permeabilidad de los vasos sanguíneos, metabolismo, proliferación y muerte celular, que se pueden medir mediante imágenes médicas, incluida la tomografía por emisión de positrones/tomografía computarizada con fluorodesoxiglucosa (18F-FDG-PET/CT). La evaluación de la HT a través de imágenes médicas, podría mejorar la predicción de la respuesta al tratamiento y de los resultados a largo plazo, en pacientes con cáncer. La 18F-FDG-PET/CT es esencial en oncología, generalmente se evalúa con parámetros metabólicos semicuantitativos, como el valor de captación estándar máximo/medio (SUVmáx, SUVmedio) o el volumen tumoral metabólico (MTV), que tienen un gran valor pronóstico en varios tumores, pero no evalúan la HT. Asimismo, es importante para diferenciar los nódulos pulmonares solitarios (NPS) malignos de los benignos, reduciendo el número de pacientes que van a biopsias quirúrgicas innecesarias. Publicaciones recientes muestran que algunas características cuantitativas, extraídas de las imágenes médicas, son robustas para diagnóstico, estadificación, pronóstico de la respuesta al tratamiento y la evolución, de pacientes con cáncer. El proceso de extraer y relacionar estas características con variables clínicas o biológicas se denomina “Radiomica”. Algunos parámetros radiómicos, como la textura, se han relacionado directamente con la HT. Esta tesis investigó las relaciones entre HT, evaluada mediante análisis de textura (AT) de imágenes 18F-FDG-PET/CT, con parámetros metabólicos y estadificación patológica en pacientes con NSCLC, y exploró el rendimiento diagnóstico de diferentes criterios metabólicos, morfológicos y clínicos para la clasificación de NPS. Además, se usaron características radiómicas de imágenes 18F-FDG-PET/CT de pacientes con cáncer de mama recurrente/metastásico, para construir modelos predictivos de la respuesta a la quimioterapia, combinándose varios métodos de selección de características y aprendizaje automático (ML)...Fac. de Ciencias FísicasTRUEunpu
    corecore