463 research outputs found

    The Use of Data Balancing Algorithms to Correct for the Under-Representation of Female Patients in a Cardiovascular Dataset

    Get PDF
    Given that women are under-represented in medical datasets, and that machine learning classification algorithms are known to exhibit bias towards the majority class, the growing application of machine learning in the medical field risks resulting in worse medical outcomes for female patients. The Heart Failure Prediction (HFP) dataset is a historical dataset used for the training of models for the prediction of heart disease. This dataset contains significantly fewer female patients than male patients, and as such it is expected that models trained using this data will inherit a gender bias to favour male patients. This dissertation explores the use of different data re-sampling techniques (SMOTE, SMOTE-NC, SVM-SMOTE, Borderline-SMOTE, ADASYN, ROS, Near Miss, and RUS) for their ability to correct for the under-representation of female patients, and the use of these synthetic balanced datasets to reduce the bias observed in classification models trained with this data

    Machine Learning for the Early Detection of Acute Episodes in Intensive Care Units

    Get PDF
    In Intensive Care Units (ICUs), mere seconds might define whether a patient lives or dies. Predictive models capable of detecting acute events in advance may allow for anticipated interventions, which could mitigate the consequences of those events and promote a greater number of lives saved. Several predictive models developed for this purpose have failed to meet the high requirements of ICUs. This might be due to the complexity of anomaly prediction tasks, and the inefficient utilization of ICU data. Moreover, some essential intensive care demands, such as continuous monitoring, are often not considered when developing these solutions, making them unfit to real contexts. This work approaches two topics within the mentioned problem: the relevance of ICU data used to predict acute episodes and the benefits of applying Layered Learning (LL) techniques to counter the complexity of these tasks. The first topic was undertaken through a study on the relevance of information retrieved from physiological signals and clinical data for the early detection of Acute Hypotensive Episodes (AHE) in ICUs. Then, the potentialities of LL were accessed through an in-depth analysis of the applicability of a recently proposed approach on the same topic. Furthermore, different optimization strategies enabled by LL configurations were proposed, including a new approach aimed at false alarm reduction. The results regarding data relevance might contribute to a shift in paradigm in terms of information retrieved for AHE prediction. It was found that most of the information commonly used in the literature might be wrongly perceived as valuable, since only three features related to blood pressure measures presented actual distinctive traits. On another note, the different LL-based strategies developed confirm the versatile possibilities offered by this paradigm. Although these methodologies did not promote significant performance improvements in this specific context, they can be further explored and adapted to other domains.Em Unidades de Cuidados Intensivos (UCIs), meros segundos podem ser o fator determinante entre a vida e a morte de um paciente. Modelos preditivos para a previsão de eventos adversos podem promover intervenções antecipadas, com vista à mitigação das consequências destes eventos, e traduzir-se num maior número de vidas salvas. Múltiplos modelos desenvolvidos para este propósito não corresponderam às exigências das UCIs. Isto pode dever-se à complexidade de tarefas de previsão de anomalias e à ineficiência no uso da informação gerada em UCIs. Além disto, algumas necessidades inerentes à provisão de cuidados intensivos, tais como a monitorização contínua, são muitas vezes ignoradas no desenvolvimento destas soluções, tornando-as desadequadas para contextos reais. Este projeto aborda dois tópicos dentro da problemática introduzida, nomeadamente a relevância da informação usada para prever episódios agudos, e os benefícios de técnicas de Aprendizagem em Camadas (AC) para contrariar a complexidade destas tarefas. Numa primeira fase, foi conduzido um estudo sobre o impacto de diversos sinais fisiológicos e dados clínicos no contexto da previsão de episódios agudos de hipotensão. As potencialidades do paradigma de AC foram avaliadas através da análise de uma abordagem proposta recentemente para o mesmo caso de estudo. Nesta segunda fase, diversas estratégias de otimização compatíveis com configurações em camadas foram desenvolvidas, incluindo um modelo para reduzir falsos alarmes. Os resultados relativos à relevância da informação podem contribuir para uma mudança de paradigma em termos da informação usada para treinar estes modelos. A maior parte da informação poderá estar a ser erroneamente considerada como importante, uma vez que apenas três variáveis, deduzidas dos valores de pressão arterial, foram identificadas como realmente impactantes. Por outro lado, as diferentes estratégias baseadas em AC confirmaram a versatilidade oferecida por este paradigma. Apesar de não terem promovido melhorias significativas neste contexto, estes métodos podem ser adaptados a outros domínios

    Prediction of Sudden Cardiac Death Using Ensemble Classifiers

    Get PDF
    Sudden Cardiac Death (SCD) is a medical problem that is responsible for over 300,000 deaths per year in the United States and millions worldwide. SCD is defined as death occurring from within one hour of the onset of acute symptoms, an unwitnessed death in the absence of pre-existing progressive circulatory failures or other causes of deaths, or death during attempted resuscitation. Sudden death due to cardiac reasons is a leading cause of death among Congestive Heart Failure (CHF) patients. The use of Electronic Medical Records (EMR) systems has made a wealth of medical data available for research and analysis. Supervised machine learning methods have been successfully used for medical diagnosis. Ensemble classifiers are known to achieve better prediction accuracy than its constituent base classifiers. In an effort to understand the factors contributing to SCD, data on 2,521 patients were collected for the Sudden Cardiac Death in Heart Failure Trial (SCD-HeFT). The data included 96 features that were gathered over a period of 5 years. The goal of this dissertation was to develop a model that could accurately predict SCD based on available features. The prediction model used the Cox proportional hazards model as a score and then used the ExtraTreesClassifier algorithm as a boosting mechanism to create the ensemble. We tested the system at prediction points of 180 days and 365 days. Our best results were at 180-days with accuracy of 0.9624, specificity of 0.9915, and F1 score of 0.9607

    Bayesian Approach For Early Stage Event Prediction In Survival Data

    Get PDF
    Predicting event occurrence at an early stage in longitudinal studies is an important and challenging problem which has high practical value. As opposed to the standard classification and regression problems where a domain expert can provide the labels for the data in a reasonably short period of time, training data in such longitudinal studies must be obtained only by waiting for the occurrence of sufficient number of events. On the other hand, survival analysis aims at finding the underlying distribution for data that measure the length of time until the occurrence of an event. However, it cannot give an answer to the open question of how to forecast whether a subject will experience event by end of study having event occurrence information at early stage of survival data?\u27\u27. This problem exhibits two major challenges: 1) absence of complete information about event occurrence (censoring) and 2) availability of only a partial set of events that occurred during the initial phase of the study. Thus, the main objective of this work is to predict for which subject in the study event will occur at future based on few event information at the initial stages of a longitudinal study. In this thesis, we propose a novel approach to address the first challenge by introducing a new method for handling censored data using Kaplan-Meier estimator. The second challenge is tackled by effectively integrating Bayesian methods with an Accelerated Failure Time (AFT) model by adapting the prior probability of the event occurrence for future time points. In another word, we propose a novel Early Stage Prediction (ESP) framework for building event prediction models which are trained at early stages of longitudinal studies. More specifically, we extended the Naive Bayes, Tree-Augmented Naive Bayes (TAN) and Bayesian Network methods based on the proposed framework, and developed three algorithms, namely, ESP-NB, ESP-TAN and ESP-BN, to effectively predict event occurrence using the training data obtained at early stage of the study. The proposed framework is evaluated using a wide range of synthetic and real-world benchmark datasets. Our extensive set of experiments show that the proposed ESP framework is able to more accurately predict future event occurrences using only a limited amount of training data compared to the other alternative prediction methods

    Image Quality Assessment for Population Cardiac MRI: From Detection to Synthesis

    Get PDF
    Cardiac magnetic resonance (CMR) images play a growing role in diagnostic imaging of cardiovascular diseases. Left Ventricular (LV) cardiac anatomy and function are widely used for diagnosis and monitoring disease progression in cardiology and to assess the patient's response to cardiac surgery and interventional procedures. For population imaging studies, CMR is arguably the most comprehensive imaging modality for non-invasive and non-ionising imaging of the heart and great vessels and, hence, most suited for population imaging cohorts. Due to insufficient radiographer's experience in planning a scan, natural cardiac muscle contraction, breathing motion, and imperfect triggering, CMR can display incomplete LV coverage, which hampers quantitative LV characterization and diagnostic accuracy. To tackle this limitation and enhance the accuracy and robustness of the automated cardiac volume and functional assessment, this thesis focuses on the development and application of state-of-the-art deep learning (DL) techniques in cardiac imaging. Specifically, we propose new image feature representation types that are learnt with DL models and aimed at highlighting the CMR image quality cross-dataset. These representations are also intended to estimate the CMR image quality for better interpretation and analysis. Moreover, we investigate how quantitative analysis can benefit when these learnt image representations are used in image synthesis. Specifically, a 3D fisher discriminative representation is introduced to identify CMR image quality in the UK Biobank cardiac data. Additionally, a novel adversarial learning (AL) framework is introduced for the cross-dataset CMR image quality assessment and we show that the common representations learnt by AL can be useful and informative for cross-dataset CMR image analysis. Moreover, we utilize the dataset invariance (DI) representations for CMR volumes interpolation by introducing a novel generative adversarial nets (GANs) based image synthesis framework, which enhance the CMR image quality cross-dataset

    A prediction tool for dolutegravir associated hyperglycaemia among HIV patients in Uganda

    Get PDF
    A Project Report Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Embedded and Mobile Systems of the Nelson Mandela African Institution of Science and TechnologyThe initiation of Dolutegravir based antiretroviral therapy has provided a potent treatment option for persons living with human immunodeficiency Virus (PLHIV). However, clinical research has shown overwhelming evidence that use Dolutegravir (DTG) results into consequential hyperglycaemia. The incidence and prevalence rates of Dolutegravir associated hyperglycaemia among PLHIV are unknown. Therefore, identification of patients susceptible to dolutegravir associated hyperglycaemia is critical to lessening morbidity and mortality associated with uncontrolled high blood glucose level among patients on antiretroviral therapy (ART) and care. The current procedures practiced in screening for hyperglycemia among PLHIV being switched to DTG and DTG Associated Hyperglycemia related literature were appraised. Various machine learning classification algorithms were employed to come up with the most appropriate model. The purpose of the study was to develop an efficient DTG associated hyperglycaemia Screening tool for treatment experienced PLHIV being switched to DTG in Uganda. The study found Extreme Gradient Boost Classification model as the best with performance and evaluation metrics as follows: model accuracy is 0.99, probability to classify positives is 0.87, precision probability for predicting positives is 0.67, Area under the receiver operating Characteristic curve is 0.90, Area under a precision, recall curve as 0.86, F1 score is 0.76, and Cohen Kappa score as 0.72 inter alia. Therefore, the study recommends the adoption and use of the DTG associated hyperglycaemia screening tool while switching HIV treatment experienced patients to a DTG based regimen. This is because the world over is using Machine learning tools to support Medicare. The researcher also suggests stratifying the data by gender and build DTG associated prediction models for women and men separately because men are not affected in any by variables of pregnancy and post-menopausal phase like women. The research proses lessening of bureaucratic tendencies and high charges levied on research protocols and data acquisition as a way of promoting these innovative studies among others

    Predictive analytics framework for electronic health records with machine learning advancements : optimising hospital resources utilisation with predictive and epidemiological models

    Get PDF
    The primary aim of this thesis was to investigate the feasibility and robustness of predictive machine-learning models in the context of improving hospital resources’ utilisation with data- driven approaches and predicting hospitalisation with hospital quality assessment metrics such as length of stay. The length of stay predictions includes the validity of the proposed methodological predictive framework on each hospital’s electronic health records data source. In this thesis, we relied on electronic health records (EHRs) to drive a data-driven predictive inpatient length of stay (LOS) research framework that suits the most demanding hospital facilities for hospital resources’ utilisation context. The thesis focused on the viability of the methodological predictive length of stay approaches on dynamic and demanding healthcare facilities and hospital settings such as the intensive care units and the emergency departments. While the hospital length of stay predictions are (internal) healthcare inpatients outcomes assessment at the time of admission to discharge, the thesis also considered (external) factors outside hospital control, such as forecasting future hospitalisations from the spread of infectious communicable disease during pandemics. The internal and external splits are the thesis’ main contributions. Therefore, the thesis evaluated the public health measures during events of uncertainty (e.g. pandemics) and measured the effect of non-pharmaceutical intervention during outbreaks on future hospitalised cases. This approach is the first contribution in the literature to examine the epidemiological curves’ effect using simulation models to project the future hospitalisations on their strong potential to impact hospital beds’ availability and stress hospital workflow and workers, to the best of our knowledge. The main research commonalities between chapters are the usefulness of ensembles learning models in the context of LOS for hospital resources utilisation. The ensembles learning models anticipate better predictive performance by combining several base models to produce an optimal predictive model. These predictive models explored the internal LOS for various chronic and acute conditions using data-driven approaches to determine the most accurate and powerful predicted outcomes. This eventually helps to achieve desired outcomes for hospital professionals who are working in hospital settings

    Multidimensional embedded MEMS motion detectors for wearable mechanocardiography and 4D medical imaging

    Get PDF
    Background: Cardiovascular diseases are the number one cause of death. Of these deaths, almost 80% are due to coronary artery disease (CAD) and cerebrovascular disease. Multidimensional microelectromechanical systems (MEMS) sensors allow measuring the mechanical movement of the heart muscle offering an entirely new and innovative solution to evaluate cardiac rhythm and function. Recent advances in miniaturized motion sensors present an exciting opportunity to study novel device-driven and functional motion detection systems in the areas of both cardiac monitoring and biomedical imaging, for example, in computed tomography (CT) and positron emission tomography (PET). Methods: This Ph.D. work describes a new cardiac motion detection paradigm and measurement technology based on multimodal measuring tools — by tracking the heart’s kinetic activity using micro-sized MEMS sensors — and novel computational approaches — by deploying signal processing and machine learning techniques—for detecting cardiac pathological disorders. In particular, this study focuses on the capability of joint gyrocardiography (GCG) and seismocardiography (SCG) techniques that constitute the mechanocardiography (MCG) concept representing the mechanical characteristics of the cardiac precordial surface vibrations. Results: Experimental analyses showed that integrating multisource sensory data resulted in precise estimation of heart rate with an accuracy of 99% (healthy, n=29), detection of heart arrhythmia (n=435) with an accuracy of 95-97%, ischemic disease indication with approximately 75% accuracy (n=22), as well as significantly improved quality of four-dimensional (4D) cardiac PET images by eliminating motion related inaccuracies using MEMS dual gating approach. Tissue Doppler imaging (TDI) analysis of GCG (healthy, n=9) showed promising results for measuring the cardiac timing intervals and myocardial deformation changes. Conclusion: The findings of this study demonstrate clinical potential of MEMS motion sensors in cardiology that may facilitate in time diagnosis of cardiac abnormalities. Multidimensional MCG can effectively contribute to detecting atrial fibrillation (AFib), myocardial infarction (MI), and CAD. Additionally, MEMS motion sensing improves the reliability and quality of cardiac PET imaging.Moniulotteisten sulautettujen MEMS-liiketunnistimien käyttö sydänkardiografiassa sekä lääketieteellisessä 4D-kuvantamisessa Tausta: Sydän- ja verisuonitaudit ovat yleisin kuolinsyy. Näistä kuolemantapauksista lähes 80% johtuu sepelvaltimotaudista (CAD) ja aivoverenkierron häiriöistä. Moniulotteiset mikroelektromekaaniset järjestelmät (MEMS) mahdollistavat sydänlihaksen mekaanisen liikkeen mittaamisen, mikä puolestaan tarjoaa täysin uudenlaisen ja innovatiivisen ratkaisun sydämen rytmin ja toiminnan arvioimiseksi. Viimeaikaiset teknologiset edistysaskeleet mahdollistavat uusien pienikokoisten liiketunnistusjärjestelmien käyttämisen sydämen toiminnan tutkimuksessa sekä lääketieteellisen kuvantamisen, kuten esimerkiksi tietokonetomografian (CT) ja positroniemissiotomografian (PET), tarkkuuden parantamisessa. Menetelmät: Tämä väitöskirjatyö esittelee uuden sydämen kineettisen toiminnan mittaustekniikan, joka pohjautuu MEMS-anturien käyttöön. Uudet laskennalliset lähestymistavat, jotka perustuvat signaalinkäsittelyyn ja koneoppimiseen, mahdollistavat sydämen patologisten häiriöiden havaitsemisen MEMS-antureista saatavista signaaleista. Tässä tutkimuksessa keskitytään erityisesti mekanokardiografiaan (MCG), joihin kuuluvat gyrokardiografia (GCG) ja seismokardiografia (SCG). Näiden tekniikoiden avulla voidaan mitata kardiorespiratorisen järjestelmän mekaanisia ominaisuuksia. Tulokset: Kokeelliset analyysit osoittivat, että integroimalla usean sensorin dataa voidaan mitata syketiheyttä 99% (terveillä n=29) tarkkuudella, havaita sydämen rytmihäiriöt (n=435) 95-97%, tarkkuudella, sekä havaita iskeeminen sairaus noin 75% tarkkuudella (n=22). Lisäksi MEMS-kaksoistahdistuksen avulla voidaan parantaa sydämen 4D PET-kuvan laatua, kun liikeepätarkkuudet voidaan eliminoida paremmin. Doppler-kuvantamisessa (TDI, Tissue Doppler Imaging) GCG-analyysi (terveillä, n=9) osoitti lupaavia tuloksia sydänsykkeen ajoituksen ja intervallien sekä sydänlihasmuutosten mittaamisessa. Päätelmä: Tämän tutkimuksen tulokset osoittavat, että kardiologisilla MEMS-liikeantureilla on kliinistä potentiaalia sydämen toiminnallisten poikkeavuuksien diagnostisoinnissa. Moniuloitteinen MCG voi edistää eteisvärinän (AFib), sydäninfarktin (MI) ja CAD:n havaitsemista. Lisäksi MEMS-liiketunnistus parantaa sydämen PET-kuvantamisen luotettavuutta ja laatua

    A new methodology for modelling urban soundscapes: a psychometric revisitation of the current standard and a Bayesian approach for individual response prediction

    Get PDF
    Measuring how the urban sound environment is perceived by public space users, which is usually referred as urban soundscape, is a research field of particular in terest for a broad and multidisciplinary scientific community besides private and public agencies. The need for a tool to quantify soundscapes would provide much support to urban planning and design, so to public healthcare. Soundscape liter ature still does not show a unique strategy for addressing this topic. Soundscape definition, data collection, and analysis tools have been recently standardised and published in three respective ISO (International Organisation for Standardization) items. In particular, the third item of the ISO series defines the calculation of the soundscape experience of public space users by means of multiple Likert scales. In this thesis, with regards to the third item of the soundscape ISO series, the soundscape data analysis standard method is questioned and a correction paradigm is proposed. This thesis questiones the assumption of a point-wise superimposition match across the Likert scales used during the soundscape assessment task. In order to do that, the thesis presents a new method which introduces correction values, or metric, for adjusting the scales in accordance to the results of common scaling behaviours found across the investigated locations. In order to validate the results, the outcome of the new metric is used as tar get to predict the individual experience of soundscapes from the participants. In comparison to the current ISO output, the new correction values reveal to achievea better predictability in both linear and non-linear modelling by increasing the ac-curacy of prediction of individual responses up to 52.6% (8.3% higher than theaccuracy obtained with the standard method).Finally, the new metric is used to validate the collection of data samples acrossseveral locations on individual questionnaires responses. Models are trained, in aiterative way, on all the locations except the one used during the validation. Thisprocedure provides a strong validating framework for predicting individual subjectassessments belonging to locations totally unseen during the model training. The results show that the combination of the new metrics with the proposed modelling structure achieves good performance on individual responses across the dataset withan average accuracy above 54%. A new index for measuring the soundscape is fi-nally introduced based on the percentage of people agreeing on soundscape pleas-antness calculated from the new proposed metric and performing a r-squared valueequals to 0.87.The framework introduced is limited by cultural and linguistic factors. Indeed,different corrected metric space are expected to be found when data is collected from different countries or urban context. The current values found in this thesis areso expected to be valid in large British cities and eventually in international hub andcapital cities. In these scenarios the corrected metric would provide a more realisticand direction-invariant representation of how the urban soundscape is perceived compared to the current ISO tool, showing that some components in the circumplex model are perceived softer or stronger according to the dimension. Future research will need to understand better the limitations of this new ramework and to extendand compare it towards different urban, cultural, and linguistic contexts
    corecore