7 research outputs found

    An in-depth performance analysis of the oversampling techniques for high-class imbalanced dataset

    Get PDF
    Class imbalance occurs when the distribution of classes between the majority and the minority classes is not the same. The data on imbalanced classes may vary from mild to severe. The effect of high-class imbalance may affect the overall classification accuracy since the model is most likely to predict most of the data that fall within the majority class.  Such a model will give biased results, and the performance predictions for the minority class often have no impact on the model. The use of the oversampling technique is one way to deal with high-class imbalance, but only a few are used to solve data imbalance. This study aims for an in-depth performance analysis of the oversampling techniques to address the high-class imbalance problem. The addition of the oversampling technique will balance each class’s data to provide unbiased evaluation results in modeling. We compared the performance of Random Oversampling (ROS), ADASYN, SMOTE, and Borderline-SMOTE techniques. All oversampling techniques will be combined with machine learning methods such as Random Forest, Logistic Regression, and k-Nearest Neighbor (KNN). The test results show that Random Forest with Borderline-SMOTE gives the best value with an accuracy value of 0.9997, 0.9474 precision, 0.8571 recall, 0.9000 F1-score, 0.9388 ROC-AUC, and 0.8581 PRAUC of the overall oversampling technique

    Predictive modelling of hospital readmissions in diabetic patients clusters

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceDiabetes is a global public health problem with increasing incidence over the past 10 years. This disease's social and economic impacts are widely assessed worldwide, showing a direct and gradual decrease in the individual's ability to work, a gradual loss in the scale of quality of life and a burden on personal finances. The recurrence of hospitalisation is one of the most significant indexes in measuring the quality of care and the opportunity to optimise resources. Numerous techniques identify the patient who will need to be readmitted, such as LACE and HOSPITAL. The purpose of this study was to use a dataset related to the risk of hospital readmission in patients with Diabetes first to apply a clustering of subgroups by similarity. Then structures a predictive analysis with the main algorithms to identify the methodology of best performance. Numerous approaches were performed to prepare the dataset for these two interventions. The results found in the first phase were two clusters based on the total number of hospital recurrences and others on total administrative costs, with K=3. In the second phase, the best algorithm found was Neural Network 3, with a ROC of 0.68 and a misclassification rate of 0.37. When applied the same algorithm in the clusters, there were no gains in the confidence of the indexes, suggesting that there are no substantial gains in the division of subpopulations since the disease has the same behaviour and needs throughout its development

    Machine learning risk prediction model for acute coronary syndrome and death from use of non-steroidal anti-inflammatory drugs in administrative data

    Get PDF
    Our aim was to investigate the usefulness of machine learning approaches on linked administrative health data at the population level in predicting older patients’ one-year risk of acute coronary syndrome and death following the use of non-steroidal anti-inflammatory drugs (NSAIDs). Patients from a Western Australian cardiovascular population who were supplied with NSAIDs between 1 Jan 2003 and 31 Dec 2004 were identified from Pharmaceutical Benefits Scheme data. Comorbidities from linked hospital admissions data and medication history were inputs. Admissions for acute coronary syndrome or death within one year from the first supply date were outputs. Machine learning classification methods were used to build models to predict ACS and death. Model performance was measured by the area under the receiver operating characteristic curve (AUC-ROC), sensitivity and specificity. There were 68,889 patients in the NSAIDs cohort with mean age 76 years and 54% were female. 1882 patients were admitted for acute coronary syndrome and 5405 patients died within one year after their first supply of NSAIDs. The multi-layer neural network, gradient boosting machine and support vector machine were applied to build various classification models. The gradient boosting machine achieved the best performance with an average AUC-ROC of 0.72 predicting ACS and 0.84 predicting death. Machine learning models applied to linked administrative data can potentially improve adverse outcome risk prediction. Further investigation of additional data and approaches are required to improve the performance for adverse outcome risk prediction

    Aplicação do escore LACE para predição de readmissões hospitalares: Uma revisão / Using the LACE index for predicting hospital readmissions: A review

    Get PDF
    A readmissão hospitalar não planejada é um evento comum e gera impacto financeiro significativo para as organizações e sistemas de saúde. Ela pode estar relacionada com inúmeras causas como tratamentos incompletos, erros de medicação, problemas socioeconômicos, dentre outros. Devido a isso, torna-se importante identificar os pacientes sob maior risco. O objetivo desta revisão é verificar como vem sendo utilizado o escore LACE para a avaliação do risco de readmissão em diferentes contextos e qual a sua variação de performance. Utilizou-se as bases de dados Bireme e PubMed, incluindo todos os artigos que citassem o uso do LACE na readmissão hospitalar, excluindo  artigos duplicados, revisões sistemáticas ou mapeamentos sistemáticos. Concluimos que o escore LACE apresentou variação de acurácia nos relatos incluídos nesta revisão e, apesar do seu potencial como ferramenta para triagem dos pacientes sob risco, necessita validação na população-alvo antes da sua adoção na prática clínica

    An improved support vector machine-based diabetic readmission prediction

    No full text
    Cui S, Wang D, Wang Y, Yu P-W, Jin Y. An improved support vector machine-based diabetic readmission prediction. Computer Methods and Programs in Biomedicine. 2018;166:123-135.Background and objective In healthcare systems, the cost of unplanned readmission accounts for a large proportion of total hospital payment. Hospital-specific readmission rate becomes a critical issue around the world. Quantification and early identification of unplanned readmission risks will improve the quality of care during hospitalization and reduce the occurrence of readmission. In clinical practice, medical workers generally use LACE score method to evaluate patient readmission risks, but this method usually performs poorly. With this in mind, this study presents a novel method combining support vector machine and genetic algorithm to build the risk prediction model, which simultaneously involves feature selection and the processing of imbalanced data. This model aims to provide decision support for clinicians during the discharge management of patients with diabetes. Method The experiments were conducted from a set of 8756 medical records with 50 different features about diabetic readmission. After preprocessing the data, an effective SMOTE-based method was proposed to solve the imbalance data problem. Further, in order to improve prediction performance, a hybrid feature selection mechanism was devised to select the important features. Subsequently, an improved support vector machine-based (SVM-based) method was developed and the genetic algorithm was used to tune the sensitive parameter of the algorithm. Finally, the five-fold cross-validation method was applied to compare the performance of proposed method with other methods (LACE score, logistic regression, naïve bayes, decision tree and feed forward neural networks). Results Experimental results indicate that the proposed SVM-based method achieves an accuracy of 81.02%, a sensitivity of 82.89%, a specificity of 79.23%, and outperforms other popular algorithms in identifying diabetic patients who may be readmitted. Conclusions Our research can improve the performance of clinic decision support systems for diabetic readmission, by which the readmission possibility as well as the waste of medical resources can be reduced

    Koneoppiminen päätöksenteon tukijana diabetes mellituksen hoidossa

    Get PDF
    Koneoppiminen on yksi tekoälyn osa-alue, jota voidaan hyödyntää laajasti terveydenhuollossa erilaisiin käyttötarkoituksiin. Diabetes hoidossa koneoppimisteknologioiden käyttöönotto voi merkitä huomattavaa laadullista parannusta ja kustannustehokasta hoitoa. Tutkimuksen tavoitteena on tuottaa käyttökelpoista ja ohjeellistavaa tietoa koneoppimisen soveltamismahdollisuuksista toimivan kliinisen ei-tietämyskantaisen päätöksenteon tukijärjestelmän suunnittelumallin luomiseksi terveydenhuolto-organisaatioihin, terveydenhuollon ammattihenkilökunnan kliinisen päätöksenteon edistämiseksi. Tutkimuksen teoreettisen viitekehyksen muodostaa koneoppiminen terveydenhoidossa ja kliininen päätöksenteko. Tutkimuksen osioita ovat koneoppimisen sovellettavuus diabeteshoitoon, koneoppimisen soveltaminen diabetes hoitotulosten ennustamiseen ja koneoppiminen diabeteksen diagnosointityökaluna. Tutkimusmenetelmä on kvalitatiivinen, integroiva kirjallisuuskatsaus. Aineisto kerättiin useasta eri tietokannasta, ja se muodostuu pääasiassa tieteellisistä katsaus-, tutkimus- ja konferenssiartikkeleista. Tutkimuksen aineisto analysoitiin ymmärtämään pyrkivällä laadullisella analyysilla. Tämä tehtiin induktiivisella lähestymistavalla aineistolähtöisenä sisällönanalyysina. Integroivan kirjallisuuskatsauksen synteesin pohjalta saatu tutkimustulos vastaa esitettyihin tutkimuskysymyksiin ja määrittelee toimivan ei-tietämyskantaisen kptj:n vaatimuksia järjestelmän varsinaista suunnittelua ja teknistä toteutusta varten. Tulokset osoittavat, että koneoppimistekniikoista syväoppiminen, ohjaamaton oppiminen, ohjattu oppiminen, yhteen liittynyt koneoppiminen ja äärimmäinen oppimiskone ovat niitä koneoppimisalgoritmeja, joita pitäisi integroida mukaan ei-tietoon-perustuvaan kliiniseen päätöksenteon tukijärjestelmään, varsinaisen kliinisen päätöksenteko prosessin tukemiseksi diabetes hoidossa. Tutkimuksen tuloksia on selostettu tarkemmin diskussio kappaleessa ja rajoitukset on myös pyritty tuomaan esille
    corecore