2,580 research outputs found

    PREDICTION OF SURVIVAL OF HEART FAILURE PATIENTS USING RANDOM FOREST

    Get PDF
    Human survival, one of the roles that is controlled by the heart, makes the heart need to be guarded and be aware of its damage. Heart failure is the final stage of all heart disease. The medical record tool can measure symptoms, body features, and clinical laboratory test values, which can be used to perform biostatistical analyzes but to highlight patterns and correlations not detected by medical doctors. So technology assistance is needed to do this in order to predict the survival of heart failure patients. With data mining techniques used in the available history data, namely the Heart Failure Clinical Records dataset of 299 instances on 13 features used the Random Forest algorithm, Decision Tree, KNN, Support Vector Machine, Artificial Neural Network and Naïve Bayes with resample and SMOTE sampling techniques. The highest accuracy with the resample sampling technique in the random forest is 94.31% and the SMOTE technique used in the random forest produces an accuracy of 85.82% higher than other algorithms. &nbsp

    Developing Prediction Models for Kidney Stone Disease

    Get PDF
    Kidney stone disease has become more prevalent through the years, leading to high treatment cost and associated health risks. In this study, we explore a large medical database and machine learning methods to extract features and construct models for diagnosing kidney stone disease. Data of 46,250 patients and 58,976 hospital admissions were extracted and analyzed, including patients’ demographic information, diagnoses, vital signs, and laboratory measurements of the blood and urine. We compared the kidney stone (KDS) patients to patients with abdominal and back pain (ABP), patients diagnosed with nephritis, nephrosis, renal sclerosis, chronic kidney disease, or acute and unspecified renal failure (NCA), patients diagnosed with urinary tract infections and other diseases of the kidneys and the uterus (OKU), and patients with other conditions (OTH). We built logistic regression models and random forest models to determine the best prediction outcome. For the KDS vs. ABP group, a logistic regression model using the five variables including age, mean respiratory rate, blood chloride, blood creatinine, and blood CO2 levels from the patients’ first lab results gave the best prediction accuracy of 0.699. This model maximized sensitivity with a value of 0.726. For KDS vs. NCA we found that a logistic regression using the Elixhauser score and blood urea nitrogen (BUN) values from the first lab results for patients with first admittance produced the best outcome, with an accuracy of 0.883 and maximized specificity of 0.898. For KDS vs. OKU a logistic regression using the estimated glomerular filtration rate (EGFR) calculated from the average lab values gave the best outcome, with an accuracy of 0.852 and maximized specificity of 0.922. Finally, a logistic regression using age, EGFR, BUN, blood creatinine, and blood CO2 gave the best outcome for KDS vs. OTH, with an accuracy of 0.894 and maximized specificity of 0.903. This research gives the medical field models to potentially use on kidney stone patients. It also provides a steppingstone for researchers to build off if they want to build kidney stone models for a different population of patients

    Feature engineering in biomedical data processing- a case study

    Get PDF
    Günümüzde sağlık alanında yapılan yapay zekâ çalışmalarının en önemli girdisi sağlık verisidir. Sağlık verisinin alan bilgisi uzmanları ve hekimler tarafından toplanması ve makine öğrenme algoritmalarında eğitilmesi oldukça zahmetli bir iş olup bu verilerin doğru algoritma ve parametreler ile işlenmesi, çalışmaların başarısını ortaya koymaktadır. Bu nedenlerden ötürü sağlık verisini işlemek isteyen akademisyenlere yol gösterici olması arzusu ile bir biyomedikal veri seti üzerinde özellik mühendisliği pilot çalışması amaçlandı. Bu amaç doğrultusunda uluslararası bir veri tabanından kalp yetmezliği ile ilgili örnek bir veri seti kullanıldı. Bu tezin amacına uygun olarak belirlenen veriler üzerinde yapay zekâ yöntemleri ve parametre optimizasyonu için farklı modeller kurularak deneysel çalışmalar yapıldı. Yapılan bu çalışmada veri seti üzerinde tahmine dayalı öğrenme modelleri kullanılarak hangi yapay zekâ algoritmalarının hangi parametre setleri ile en doğru sonuca ulaşıldığı raporlandı. Sonuçlar incelendiğinde özellik mühendisliğinin veri seti üzerindeki olumlu-olumsuz performans değişimlerini kıyaslayarak karar destek sistemi oluşturmak isteyen akademisyenlere önerilerde bulunuldu. Gelecek çalışmalara zemin olacağı düşünülen bu çalışmanın farklı alanlardaki sağlık verileri için de örnek alınabileceği öngörülmektedir.Today, the most important input of artificial intelligence studies in the field of health is medical data. The collection of medical data by field specialists and physicians and training the machine learning algorithms is a very laborious task and processing these data with the right algorithms and parameters determines the success of the study. For these reasons, a dataset on heart failure from an international database was used as a model study by feature engineering on a biomedical dataset, with the desire to guide academics who want to process health data. For this thesis, experimental studies were carried out for parameter optimization with artificial intelligence methods. In this study, which artificial intelligence algorithm performs best is specified, by using predictive learning models on the data set. When the results were examined, suggestions were made to the academicians who wanted to create a decision support system by comparing the positive-negative performance changes on the feature engineering dataset. This study is believed to form a basis for future studies, which also may set an example for health data in different fields

    Interpretable Survival Analysis for Heart Failure Risk Prediction

    Full text link
    Survival analysis, or time-to-event analysis, is an important and widespread problem in healthcare research. Medical research has traditionally relied on Cox models for survival analysis, due to their simplicity and interpretability. Cox models assume a log-linear hazard function as well as proportional hazards over time, and can perform poorly when these assumptions fail. Newer survival models based on machine learning avoid these assumptions and offer improved accuracy, yet sometimes at the expense of model interpretability, which is vital for clinical use. We propose a novel survival analysis pipeline that is both interpretable and competitive with state-of-the-art survival models. Specifically, we use an improved version of survival stacking to transform a survival analysis problem to a classification problem, ControlBurn to perform feature selection, and Explainable Boosting Machines to generate interpretable predictions. To evaluate our pipeline, we predict risk of heart failure using a large-scale EHR database. Our pipeline achieves state-of-the-art performance and provides interesting and novel insights about risk factors for heart failure

    A New Efficiency Improvement of Ensemble Learning for Heart Failure Classification by Least Error Boosting

    Get PDF
    Heart failure is a very common disease, often a silent threat. It's also costly to treat and detect. There is also a steadily higher incidence rate of the disease at present. Although researchers have developed classification algorithms. Cardiovascular disease data were used by various ensemble learning methods, but the classification efficiency was not high enough due to the cumulative error that can occur from any weak learner effect and the accuracy of the vote-predicted class label. The objective of this research is the development of a new algorithm that improves the efficiency of the classification of patients with heart failure. This paper proposes Least Error Boosting (LEBoosting), a new algorithm that improves adaboost.m1's performance for higher classification accuracy. The learning algorithm finds the lowest error among various weak learners to be used to identify the lowest possible errors to update distribution to create the best final hypothesis in classification. Our trial will use the heart failure clinical records dataset, which contains 13 features of cardiac patients. Performance metrics are measured through precision, recall, f-measure, accuracy, and the ROC curve. Results from the experiment found that the proposed method had high performance compared to naïve bayes, k-NN,and decision tree, and outperformed other ensembles including bagging, logitBoost, LPBoost, and adaboost.m1, with an accuracy of 98.89%, and classified the capabilities of patients who died accurately as well compared to decision tree and bagging, which were completely indistinguishable. The findings of this study found that LEBoosting was able to maximize error reductions in the weak learner's training process from any weak learner to maximize the effectiveness of cardiology classifiers and to provide theoretical guidance to develop a model for analysis and prediction of heart disease. The novelty of this research is to improve original ensemble learning by finding the weak learner with the lowest error in order to update the best distribution to the final hypothesis, which will give LEBoosting the highest classification efficiency. Doi: 10.28991/ESJ-2023-07-01-010 Full Text: PD

    Reinforcing synthetic data for meticulous survival prediction of patients suffering from left ventricular systolic dysfunction

    Get PDF
    Congestive heart failure is among leading genesis of concern that requires an immediate medical attention. Among various cardiac disorders, left ventricular systolic dysfunction is one of the well known cardiovascular disease which causes sudden congestive heart failure. The irregular functioning of a heart can be diagnosed through some of the clinical attributes, such as ejection fraction, serum creatinine etcetera. However, due to availability of a limited data related to the death events of patients suffering from left ventricular systolic dysfunction, a critical level of thresholds of clinical attributes can not be estimated with higher precision. Hence, this paper proposes a novel pseudo reinforcement learning algorithm which overcomes a problem of majority class skewness in a limited dataset by appending a synthetic dataset across minority data space. The proposed pseudo agent in the algorithm continuously senses the state of the dataset (pseudo environment) and takes an appropriate action to populate the dataset resulting into higher reward. In addition, the paper also investigates the role of statistically significant clinical attributes such as age, ejection fraction, serum creatinine etc., which tends to efficiently predict the association of death events of the patients suffering from left ventricular systolic dysfunctio

    Circulation

    Get PDF
    IntroductionHeart failure with preserved ejection fraction (HFpEF) is a heterogeneous clinical syndrome in need of improved phenotypic classification. We sought to evaluate whether unbiased clustering analysis using dense phenotypic data (\u201cphenomapping\u201d) could identify phenotypically distinct HFpEF categories.Methods and ResultsWe prospectively studied 397 HFpEF patients and performed detailed clinical, laboratory, electrocardiographic, and echocardiographic phenotyping of the study participants. We used several statistical learning algorithms, including unbiased hierarchical cluster analysis of phenotypic data (67 continuous variables) and penalized model-based clustering to define and characterize mutually exclusive groups comprising a novel classification of HFpEF. All phenomapping analyses were performed blinded to clinical outcomes, and Cox regression was used to demonstrate the clinical validity of phenomapping. The mean age was 65\ub112 years, 62% were female, 39% were African-American, and comorbidities were common. Although all patients met published criteria for the diagnosis of HFpEF, phenomapping analysis classified study participants into 3 distinct groups that differed markedly in clinical characteristics, cardiac structure/function, invasive hemodynamics, and outcomes (e.g., pheno-group #3 had an increased risk of HF hospitalization [hazard ratio 4.2, 95% CI 2.0\u20139.1] even after adjustment for traditional risk factors [P<0.001]). The HFpEF pheno-group classification, including its ability to stratify risk, was successfully replicated in a prospective validation cohort (n=107).ConclusionsPhenomapping results in novel classification of HFpEF. Statistical learning algorithms, applied to dense phenotypic data, may allow for improved classification of heterogeneous clinical syndromes, with the ultimate goal of defining therapeutically homogeneous patient subclasses.K08 HL098361/HL/NHLBI NIH HHS/United StatesUL1 TR000150/TR/NCATS NIH HHS/United StatesR01 HL107557/HL/NHLBI NIH HHS/United StatesR01 HL107577/HL/NHLBI NIH HHS/United StatesK08 HL093861/HL/NHLBI NIH HHS/United StatesDP2HL123228/DP/NCCDPHP CDC HHS/United StatesDP2 HL123228/HL/NHLBI NIH HHS/United States2016-01-20T00:00:00Z25398313PMC4302027vault:141

    Algoritma ClusterMix K-Prototypes Untuk Menangkap Karakteristik Pasien Berdasarkan Variabel Penciri Mortalitas Pasien Dengan Gagal Jantung

    Get PDF
    Cardiovascular Disease  (CVD) atau penyakit kardiovaskular adalah salah satu penyebab utama kematian cukup besar di seluruh dunia yang berujung pada kejadian gagal jantung. Organiasasi kesehatan WHO menyebutkan jumlah orang yang  meninggal karena penyakit kardiovaskuler akibat gagal jantung setiap tahun memiliki rata-rata 17,9 juta kematian setiap tahunnya, yaitu sekitar 31 persen dari total kematian secara global. Pendeteksian faktor mortalitas pasien gagal jantung perlu dibentuk segmentasi yang berguna untuk memperkecil peluang terjadinya kematian akibat  gagal jantung. Salah satunya dengan menggunakan variabel penciri mortalitas akibat gagal jantung dengan cara menerapkan algoritma k-prototypes. Hasil penggerombolan terbentuk 2 kluster yang dianggap optimal berdasarkan nilai koefisien silhouette tertinggi yaitu sebesar 0.5777. Hasil penelitian dilakukan segementasi pasien dengan variabel penciri mortalitas pasien gagal jantung yang menunjukan bahwa kluster 1 merupakan gerombol pasien yang memiliki resiko rendah terhadap peluang mortalitas akibat gagal jantung dan kluster 2 merupakan gerombol pasien dengan karaktistik pasien dengan resiko yang tinggi terhadap peluang mortalitas akibat gagal jantung. Segementasi tersebut didasari dari nilai rata-rata setiap variabel penciri  dari faktor mortalitas gagal jantung pada setiap kluster yang dibandingkan dengan kondisi normal pada variabel serum creatine, ejection fraction, usia, serum sodium, tekanan darah, anemia, creatinine phosphokinase, plateles, merokok, jenis kelamin dan diabetes

    INTEGRATION OF SVM AND SMOTE-NC FOR CLASSIFICATION OF HEART FAILURE PATIENTS

    Get PDF
    SMOTE (Synthetic Minority Over-sampling Technique) and SMOTE-NC (SMOTE for Nominal and Continuous features) are variations of the original SMOTE algorithm designed to handle imbalanced datasets with continuous and nominal features. The primary difference lies in their ability to generate synthetic examples for the minority class when dealing with continuous and nominal features. We employed a dataset comprising continuous and nominal features from heart failure patients. The distribution of patients' statuses, either deceased or alive, exhibited an imbalance. To address this, we executed a data balancing procedure using SMOTE-NC before conducting the classification analysis with SVM. It was found that the combination of SVM and SMOTE-NC methods gave better results than the SVM method, seen from the higher level of accuracy and F1 score. F1 gives less sensitivity to class imbalance compared to accuracy. Suppose there is a significant imbalance in the number of instances between classes. In that case, the F1 score can be a more informative metric for evaluating a classifier's performance, especially when the minority class is of interest
    corecore