21 research outputs found

    Rule-based Disease Classification using Text Mining on Symptoms Extraction from Electronic Medical Records in Indonesian

    Get PDF
    Recently, electronic medical record (EMR) has become the source of many insights for clinicians and hospital management. EMR stores much important information and new knowledge regarding many aspects for hospital and clinician competitive advantage. It is valuable not only for mining data patterns saved in it regarding the patient symptoms, medication, and treatment, but also it is the box deposit of many new strategies and future trends in the medical world. However, EMR remains a challenge for many clinicians because of its unstructured form. Information extraction helps in finding valuable information in unstructured data. In this paper, information on disease symptoms in the form of text data is the focus of this study. Only the highest prevalence rate of diseases in Indonesia, such as tuberculosis, malignant neoplasm, diabetes mellitus, hypertensive, and renal failure, are analyzed. Pre-processing techniques such as data cleansing and correction play a significant role in obtaining the features. Since the amount of data is imbalanced, SMOTE technique is implemented to overcome this condition. The process of extracting symptoms from EMR data uses a rule-based algorithm. Two algorithms were implemented to classify the disease based on the features, namely SVM and Random Forest. The result showed that the rule-based symptoms extraction works well in extracting valuable information from the unstructured EMR. The classification performance on all algorithms with accuracy in SVM 78% and RF 89%

    Development and validation of explainable machine learning models for risk of mortality in transcatheter aortic valve implantation: TAVI risk machine scores.

    Get PDF
    AIMS Identification of high-risk patients and individualized decision support based on objective criteria for rapid discharge after transcatheter aortic valve implantation (TAVI) are key requirements in the context of contemporary TAVI treatment. This study aimed to predict 30-day mortality following TAVI based on machine learning (ML) using data from the German Aortic Valve Registry. METHODS AND RESULTS Mortality risk was determined using a random forest ML model that was condensed in the newly developed TAVI Risk Machine (TRIM) scores, designed to represent clinically meaningful risk modelling before (TRIMpre) and in particular after (TRIMpost) TAVI. Algorithm was trained and cross-validated on data of 22 283 patients (729 died within 30 days post-TAVI) and generalisation was examined on data of 5864 patients (146 died). TRIMpost demonstrated significantly better performance than traditional scores [C-statistics value, 0.79; 95% confidence interval (CI)] [0.74; 0.83] compared to Society of Thoracic Surgeons (STS) with C-statistics value 0.69; 95%-CI [0.65; 0.74]). An abridged (aTRIMpost) score comprising 25 features (calculated using a web interface) exhibited significantly higher performance than traditional scores (C-statistics value, 0.74; 95%-CI [0.70; 0.78]). Validation on external data of 6693 patients (205 died within 30 days post-TAVI) of the Swiss TAVI Registry confirmed significantly better performance for the TRIMpost (C-statistics value 0.75, 95%-CI [0.72; 0.79]) compared to STS (C-statistics value 0.67, CI [0.63; 0.70]). CONCLUSION TRIM scores demonstrate good performance for risk estimation before and after TAVI. Together with clinical judgement, they may support standardised and objective decision-making before and after TAVI

    Characterizing Clustering Models of High-dimensional Remotely Sensed Data Using Subsampled Field-subfield Spatial Cross-validated Random Forests

    Get PDF
    Clustering models are regularly used to construct meaningful groups of observations within complex datasets, and they are an exceptional tool for spatial exploratory analysis. The clusters detected in a recent spatio-temporal cluster analysis of leaf area index (LAI) in the Columbia River Basin (CRB) require further investigation since they are only derived using a single greenness metric. It is of great interest to further understand how greening indices can be used to determine separation of sites across an array of remotely sensed environmental attributes. In this prior work, there are highly localized minority clusters that were detected to be most dissimilar from the remaining clusters as determined by annual variation in remotely sensed LAI. The objective of this study is to discern what other environmental factors are important predictors of cluster allocation from the mentioned cluster analysis, and secondarily, to construct a predictive model that prioritizes minority clusters. A random forest classification is considered to examine the importance of various site attributes in predicting cluster allocation. To satisfy these objectives, I propose an application-specific process that integrates spatial sub-sampling and cross-validation to improve the interpretability and utility of random forests for spatially autocorrelated, highly-localized, and unbalanced class-size response variables. The final random forest model identifies that the cluster allocation, using only LAI, separates sites significantly across many other environmental attributes, and further that elevation, slope, and water storage potential are the most important predictors of cluster allocation. Most importantly, the class errors rates for the clusters that are most dissimilar, as detected by the cluster model, have the best misclassification rates which fulfills the secondary objective of aligning the priorities of a predictive model with a prior cluster model

    Computational Approaches Based On Image Processing for Automated Disease Identification On Chili Leaf Images: A Review

    Get PDF
    Chili, an important crop whose fruit is used as a spice, is significantly hampered by the existence of chili diseases. While these diseases pose a significant concern to farmers since they impair the supply of spices to the market, they can be managed and monitored to lessen their impact. Therefore, identifying chili diseases using a pertinent approach is of enormous importance. Over the years, the growth of computational approaches based on image processing has found its application in automated disease identification, leading to the availability of a reliable monitoring tool that produces promising findings for the chili. Numerous research papers on identifying chili diseases using the approaches have been published. Still, to the best knowledge of the author, there has not been a proper attempt to analyze these papers to describe the many steps of diagnosis, including pre-processing, segmentation, extraction of features, as well as identification techniques. Thus, a total of 50 research paper publications on the identification of chili diseases, with publication dates spanning from 2013 to 2021, are reviewed in this paper. Through the findings in this paper, it becomes feasible to comprehend the development trend for the application of computational approaches based on image processing in the identification of chili diseases, as well as the challenges and future directions that require attention from the present research community.&nbsp

    The Global Conflict Risk Index: Artificial intelligence for conflict prevention

    Get PDF
    The Global Conflict Risk Index (GCRI), which was designed by the European Commission’s Joint Research Centre (JRC), is the quantitative starting point of the EU’s conflict Early Warning System. Taking into consideration the needs of policy-makers to prioritize actions towards conflict prevention, the GCRI expresses the statistical risk of violent conflict in a given country in the upcoming one to four years. It is based on open source data and grounded in the assumption that the occurrence of conflict is linked to structural conditions, which are used to compute the probability and intensity of conflicts. While the initial GCRI model was estimated by means of linear and logistic regression models, this report presents a new GCRI model based on the Artificial Intelligence (AI) random forest (RF) approach. The models’ hyperparameters are optimized using a ten-fold cross validation. Overall, it is demonstrated that the random forest GCRI models are internally stable, not overfitting, and have a good predictive power. The precision and accuracy metrics are above 98%, both for the national power and subnational power conflict models. The AI GCRI, as a supplementary modelling method for the European conflict prevention policy agenda, is scientifically robust as a baseline quantitative evaluation of armed conflict risk additional to the linear and logistic regression GCRI.JRC.E.1-Disaster Risk Managemen

    Experimental evaluation of ensemble classifiers for imbalance in Big Data

    Get PDF
    Datasets are growing in size and complexity at a pace never seen before, forming ever larger datasets known as Big Data. A common problem for classification, especially in Big Data, is that the numerous examples of the different classes might not be balanced. Some decades ago, imbalanced classification was therefore introduced, to correct the tendency of classifiers that show bias in favor of the majority class and that ignore the minority one. To date, although the number of imbalanced classification methods have increased, they continue to focus on normal-sized datasets and not on the new reality of Big Data. In this paper, in-depth experimentation with ensemble classifiers is conducted in the context of imbalanced Big Data classification, using two popular ensemble families (Bagging and Boosting) and different resampling methods. All the experimentation was launched in Spark clusters, comparing ensemble performance and execution times with statistical test results, including the newest ones based on the Bayesian approach. One very interesting conclusion from the study was that simpler methods applied to unbalanced datasets in the context of Big Data provided better results than complex methods. The additional complexity of some of the sophisticated methods, which appear necessary to process and to reduce imbalance in normal-sized datasets were not effective for imbalanced Big Data.“la Caixa” Foundation, Spain, under agreement LCF/PR/PR18/51130007. This work was supported by the Junta de Castilla y León, Spain under project BU055P20 (JCyL/FEDER, UE) co-financed through European Union FEDER funds, and by the Consejería de Educación of the Junta de Castilla y León and the European Social Fund, Spain through a pre-doctoral grant (EDU/1100/2017)

    Modelling for Radiation Treatment Outcome

    Get PDF
    Modelling of tumour control probability (TCP) and normal tissue complication probability (NTCP) has been continuously used to estimate the therapeutic window of radiotherapy. In recent years, available data on tumour and normal tissue biology and from multimodal imaging have increased substantially, in particular, due to image-guided radiotherapy (see previous chapters of this book) and novel high-throughput sequencing technologies. Accordingly, more complex modelling algorithms are applied and issues of data quality, structured modelling procedures, and model validation need to be addressed. This chapter outlines general modelling principles in the era of big data, provides definitions of classical TCP and NTCP models, and presents two applications of outcome modelling in radiotherapy: the model-based approach for assigning patients to photon or proton-beam therapy and radiomics analyses based on clinical imaging data.</p
    corecore