7 research outputs found
An in-depth performance analysis of the oversampling techniques for high-class imbalanced dataset
Class imbalance occurs when the distribution of classes between the majority and the minority classes is not the same. The data on imbalanced classes may vary from mild to severe. The effect of high-class imbalance may affect the overall classification accuracy since the model is most likely to predict most of the data that fall within the majority class. Such a model will give biased results, and the performance predictions for the minority class often have no impact on the model. The use of the oversampling technique is one way to deal with high-class imbalance, but only a few are used to solve data imbalance. This study aims for an in-depth performance analysis of the oversampling techniques to address the high-class imbalance problem. The addition of the oversampling technique will balance each class’s data to provide unbiased evaluation results in modeling. We compared the performance of Random Oversampling (ROS), ADASYN, SMOTE, and Borderline-SMOTE techniques. All oversampling techniques will be combined with machine learning methods such as Random Forest, Logistic Regression, and k-Nearest Neighbor (KNN). The test results show that Random Forest with Borderline-SMOTE gives the best value with an accuracy value of 0.9997, 0.9474 precision, 0.8571 recall, 0.9000 F1-score, 0.9388 ROC-AUC, and 0.8581 PRAUC of the overall oversampling technique
Predictive modelling of hospital readmissions in diabetic patients clusters
Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceDiabetes is a global public health problem with increasing incidence over the past 10 years. This disease's social and economic impacts are widely assessed worldwide, showing a direct and gradual decrease in the individual's ability to work, a gradual loss in the scale of quality of life and a burden on personal finances.
The recurrence of hospitalisation is one of the most significant indexes in measuring the quality of care and the opportunity to optimise resources. Numerous techniques identify the patient who will need to be readmitted, such as LACE and HOSPITAL.
The purpose of this study was to use a dataset related to the risk of hospital readmission in patients with Diabetes first to apply a clustering of subgroups by similarity. Then structures a predictive analysis with the main algorithms to identify the methodology of best performance.
Numerous approaches were performed to prepare the dataset for these two interventions. The results found in the first phase were two clusters based on the total number of hospital recurrences and others on total administrative costs, with K=3. In the second phase, the best algorithm found was Neural Network 3, with a ROC of 0.68 and a misclassification rate of 0.37.
When applied the same algorithm in the clusters, there were no gains in the confidence of the indexes, suggesting that there are no substantial gains in the division of subpopulations since the disease has the same behaviour and needs throughout its development
Machine learning risk prediction model for acute coronary syndrome and death from use of non-steroidal anti-inflammatory drugs in administrative data
Our aim was to investigate the usefulness of machine learning approaches on linked administrative health data at the population level in predicting older patients’ one-year risk of acute coronary syndrome and death following the use of non-steroidal anti-inflammatory drugs (NSAIDs). Patients from a Western Australian cardiovascular population who were supplied with NSAIDs between 1 Jan 2003 and 31 Dec 2004 were identified from Pharmaceutical Benefits Scheme data. Comorbidities from linked hospital admissions data and medication history were inputs. Admissions for acute coronary syndrome or death within one year from the first supply date were outputs. Machine learning classification methods were used to build models to predict ACS and death. Model performance was measured by the area under the receiver operating characteristic curve (AUC-ROC), sensitivity and specificity. There were 68,889 patients in the NSAIDs cohort with mean age 76 years and 54% were female. 1882 patients were admitted for acute coronary syndrome and 5405 patients died within one year after their first supply of NSAIDs. The multi-layer neural network, gradient boosting machine and support vector machine were applied to build various classification models. The gradient boosting machine achieved the best performance with an average AUC-ROC of 0.72 predicting ACS and 0.84 predicting death. Machine learning models applied to linked administrative data can potentially improve adverse outcome risk prediction. Further investigation of additional data and approaches are required to improve the performance for adverse outcome risk prediction
Aplicação do escore LACE para predição de readmissões hospitalares: Uma revisão / Using the LACE index for predicting hospital readmissions: A review
A readmissão hospitalar não planejada é um evento comum e gera impacto financeiro significativo para as organizações e sistemas de saúde. Ela pode estar relacionada com inúmeras causas como tratamentos incompletos, erros de medicação, problemas socioeconômicos, dentre outros. Devido a isso, torna-se importante identificar os pacientes sob maior risco. O objetivo desta revisão é verificar como vem sendo utilizado o escore LACE para a avaliação do risco de readmissão em diferentes contextos e qual a sua variação de performance. Utilizou-se as bases de dados Bireme e PubMed, incluindo todos os artigos que citassem o uso do LACE na readmissão hospitalar, excluindo artigos duplicados, revisões sistemáticas ou mapeamentos sistemáticos. Concluimos que o escore LACE apresentou variação de acurácia nos relatos incluídos nesta revisão e, apesar do seu potencial como ferramenta para triagem dos pacientes sob risco, necessita validação na população-alvo antes da sua adoção na prática clínica
Recommended from our members
Analysis for warning factors of type 2 diabetes mellitus complications with Markov blanket based on a Bayesian network model
Background and objective
Type 2 diabetes mellitus (T2DM) complications seriously affect the quality of life and could not be cured completely. Actions should be taken for prevention and self-management. Analysis of warning factors is beneficial for patients, on which some previous studies focused. They generally used the professional medical test factors or complete factors to predict and prevent, but it was inconvenient and impractical for patients to self-manage. With this in mind, this study built a Bayesian network (BN) model, from the perspective of diabetic patients’ self-management and prevention, to predict six complications of T2DM using the selected warning factors which patients could have access from medical examination. Furthermore, the model was analyzed to explore the relationships between physiological variables and T2DM complications, as well as the complications themselves. The model aims to help patients with T2DM self-manage and prevent themselves from complications.
Methods
The dataset was collected from a well-known data center called the National Health Clinical Center between 1st January 2009 and 31st December 2009. After preprocess and impute the data, a BN model merging expert knowledge was built with Bootstrap and Tabu search algorithm. Markov Blanket (MB) was used to select the warning factors and predict T2DM complications. Moreover, a Bayesian network without prior information (BN-wopi) model learned using 10-fold cross-validation both in structure and in parameters was added to compare with other classifiers learned using 10-fold cross-validation fairly. The warning factors were selected according the structure learned in each fold and were used to predict. Finally, the performance of two BN models using warning features were compared with Naïve Bayes model, Random Forest model, and C5.0 Decision Tree model, which used all features to predict. Besides, the validation parameters of the proposed model were also compared with those in existing studies using some other variables in clinical data or biomedical data to predict T2DM complications.
Results
Experimental results indicated that the BN models using warning factors performed statistically better than their counterparts using all other variables in predicting T2DM complications. In addition, the proposed BN model were effective and significant in predicting diabetic nephropathy (DN) (AUC: 0.831), diabetic foot (DF) (AUC: 0.905), diabetic macrovascular complications (DMV) (AUC: 0.753) and diabetic ketoacidosis (DK) (AUC: 0.877) with the selected warning factors compared with other experiments.
Conclusions
The warning factors of DN, DF, DMV, and DK selected by MB in this research might be able to help predict certain T2DM complications effectively, and the proposed BN model might be used as a general tool for prevention, monitoring, and self-management
An improved support vector machine-based diabetic readmission prediction
Cui S, Wang D, Wang Y, Yu P-W, Jin Y. An improved support vector machine-based diabetic readmission prediction. Computer Methods and Programs in Biomedicine. 2018;166:123-135.Background and objective
In healthcare systems, the cost of unplanned readmission accounts for a large proportion of total hospital payment. Hospital-specific readmission rate becomes a critical issue around the world. Quantification and early identification of unplanned readmission risks will improve the quality of care during hospitalization and reduce the occurrence of readmission. In clinical practice, medical workers generally use LACE score method to evaluate patient readmission risks, but this method usually performs poorly. With this in mind, this study presents a novel method combining support vector machine and genetic algorithm to build the risk prediction model, which simultaneously involves feature selection and the processing of imbalanced data. This model aims to provide decision support for clinicians during the discharge management of patients with diabetes.
Method
The experiments were conducted from a set of 8756 medical records with 50 different features about diabetic readmission. After preprocessing the data, an effective SMOTE-based method was proposed to solve the imbalance data problem. Further, in order to improve prediction performance, a hybrid feature selection mechanism was devised to select the important features. Subsequently, an improved support vector machine-based (SVM-based) method was developed and the genetic algorithm was used to tune the sensitive parameter of the algorithm. Finally, the five-fold cross-validation method was applied to compare the performance of proposed method with other methods (LACE score, logistic regression, naïve bayes, decision tree and feed forward neural networks).
Results
Experimental results indicate that the proposed SVM-based method achieves an accuracy of 81.02%, a sensitivity of 82.89%, a specificity of 79.23%, and outperforms other popular algorithms in identifying diabetic patients who may be readmitted.
Conclusions
Our research can improve the performance of clinic decision support systems for diabetic readmission, by which the readmission possibility as well as the waste of medical resources can be reduced
Koneoppiminen päätöksenteon tukijana diabetes mellituksen hoidossa
Koneoppiminen on yksi tekoälyn osa-alue, jota voidaan hyödyntää laajasti terveydenhuollossa erilaisiin käyttötarkoituksiin. Diabetes hoidossa koneoppimisteknologioiden käyttöönotto voi merkitä huomattavaa laadullista parannusta ja kustannustehokasta hoitoa. Tutkimuksen tavoitteena on tuottaa käyttökelpoista ja ohjeellistavaa tietoa koneoppimisen soveltamismahdollisuuksista toimivan kliinisen ei-tietämyskantaisen päätöksenteon tukijärjestelmän suunnittelumallin luomiseksi terveydenhuolto-organisaatioihin, terveydenhuollon ammattihenkilökunnan kliinisen päätöksenteon edistämiseksi.
Tutkimuksen teoreettisen viitekehyksen muodostaa koneoppiminen terveydenhoidossa ja kliininen päätöksenteko. Tutkimuksen osioita ovat koneoppimisen sovellettavuus diabeteshoitoon, koneoppimisen soveltaminen diabetes hoitotulosten ennustamiseen ja koneoppiminen diabeteksen diagnosointityökaluna. Tutkimusmenetelmä on kvalitatiivinen, integroiva kirjallisuuskatsaus. Aineisto kerättiin useasta eri tietokannasta, ja se muodostuu pääasiassa tieteellisistä katsaus-, tutkimus- ja konferenssiartikkeleista. Tutkimuksen aineisto analysoitiin ymmärtämään pyrkivällä laadullisella analyysilla. Tämä tehtiin induktiivisella lähestymistavalla aineistolähtöisenä sisällönanalyysina.
Integroivan kirjallisuuskatsauksen synteesin pohjalta saatu tutkimustulos vastaa esitettyihin tutkimuskysymyksiin ja määrittelee toimivan ei-tietämyskantaisen kptj:n vaatimuksia järjestelmän varsinaista suunnittelua ja teknistä toteutusta varten. Tulokset osoittavat, että koneoppimistekniikoista syväoppiminen, ohjaamaton oppiminen, ohjattu oppiminen, yhteen liittynyt koneoppiminen ja äärimmäinen oppimiskone ovat niitä koneoppimisalgoritmeja, joita pitäisi integroida mukaan ei-tietoon-perustuvaan kliiniseen päätöksenteon tukijärjestelmään, varsinaisen kliinisen päätöksenteko prosessin tukemiseksi diabetes hoidossa. Tutkimuksen tuloksia on selostettu tarkemmin diskussio kappaleessa ja rajoitukset on myös pyritty tuomaan esille