1,814 research outputs found

    Imbalance Learning and Its Application on Medical Datasets

    Get PDF
    To gain more valuable information from the increasing large amount of data, data mining has been a hot topic that attracts growing attention in this two decades. One of the challenges in data mining is imbalance learning, which refers to leaning from imbalanced datasets. The imbalanced datasets is dominated by some classes (majority) and other under-represented classes (minority). The imbalanced datasets degrade the learning ability of traditional methods, which are designed on the assumption that all classes are balanced and have equal misclassification costs, leading to the poor performance on the minority classes. This phenomenon is usually called the class imbalance problem. However, it is usually the minority classes of more interest and importance, such as sick cases in the medical dataset. Additionally, traditional methods are optimized to achieve maximum accuracy, which is not suitable for evaluating the performance on imbalanced datasets. From the view of data space, class imbalance could be classified as extrinsic imbalance and intrinsic imbalance. Extrinsic imbalance is caused by external factors, such as data transmission or data storage, while intrinsic imbalance means the dataset is inherently imbalanced due to its nature.  As extrinsic imbalance could be fixed by collecting more samples, this thesis mainly focus on on two scenarios of the intrinsic imbalance,  machine learning for imbalanced structured datasets and deep learning for imbalanced image datasets.  Normally, the solutions for the class imbalance problem are named as imbalance learning methods, which could be grouped into data-level methods (re-sampling), algorithm-level (re-weighting) methods and hybrid methods. Data-level methods modify the class distribution of the training dataset to create balanced training sets, and typical examples are over-sampling and under-sampling. Instead of modifying the data distribution, algorithm-level methods adjust the misclassification cost to alleviate the class imbalance problem, and one typical example is cost sensitive methods. Hybrid methods usually combine data-level methods and algorithm-level methods. However, existing imbalance learning methods encounter different kinds of problems. Over-sampling methods increase the minority samples to create balanced training sets, which might lead the trained model overfit to the minority class. Under-sampling methods create balanced training sets by discarding majority samples, which lead to the information loss and poor performance of the trained model. Cost-sensitive methods usually need assistance from domain expert to define the misclassification costs which are task specified. Thus, the generalization ability of cost-sensitive methods is poor. Especially, when it comes to the deep learning methods under class imbalance, re-sampling methods may introduce large computation cost and existing re-weighting methods could lead to poor performance. The object of this dissertation is to understand features difference under class imbalance, to improve the classification performance on structured datasets or image datasets. This thesis proposes two machine learning methods for imbalanced structured datasets and one deep learning method for imbalance image datasets. The proposed methods are evaluated on several medical datasets, which are intrinsically imbalanced.  Firstly, we study the feature difference between the majority class and the minority class of an imbalanced medical dataset, which is collected from a Chinese hospital. After data cleaning and structuring, we get 3292 kidney stone cases treated by Percutaneous Nephrolithonomy from 2012 to 2019. There are 651 (19.78% ) cases who have postoperative complications, which makes the complication prediction an imbalanced classification task. We propose a sampling-based method SMOTE-XGBoost and implement it to build a postoperative complication prediction model. Experimental results show that the proposed method outperforms classic machine learning methods. Furthermore, traditional prediction models of Percutaneous Nephrolithonomy are designed to predict the kidney stone status and overlook complication related features, which could degrade their prediction performance on complication prediction tasks. To this end, we merge more features into the proposed sampling-based method and further improve the classification performance. Overall, SMOTE-XGBoost achieves an AUC of 0.7077 which is 41.54% higher than that of S.T.O.N.E. nephrolithometry, a traditional prediction model of Percutaneous Nephrolithonomy. After reviewing the existing machine learning methods under class imbalance, we propose a novel ensemble learning approach called Multiple bAlance Subset Stacking (MASS). MASS first cuts the majority class into multiple subsets by the size of the minority set, and combines each majority subset with the minority set as one balanced subsets. In this way, MASS could overcome the problem of information loss because it does not discard any majority sample. Each balanced subset is used to train one base classifier. Then, the original dataset is feed to all the trained base classifiers, whose output are used to generate the stacking dataset. One stack model is trained by the staking dataset to get the optimal weights for the base classifiers. As the stacking dataset keeps the same labels as the original dataset, which could avoid the overfitting problem. Finally, we can get an ensembled strong model based on the trained base classifiers and the staking model. Extensive experimental results on three medical datasets show that MASS outperforms baseline methods.  The robustness of MASS is proved over implementing different base classifiers. We design a parallel version MASS to reduce the training time cost. The speedup analysis proves that Parallel MASS could reduce training time cost greatly when applied on large datasets. Specially, Parallel MASS reduces 101.8% training time compared with MASS at most in our experiments.  When it comes to the class imbalance problem of image datasets, existing imbalance learning methods suffer from the problem of large training cost and poor performance.  After introducing the problem of implementing resampling methods on image classification tasks, we demonstrate issues of re-weighting strategy using class frequencies through the experimental result on one medical image dataset.  We propose a novel re-weighting method Hardness Aware Dynamic loss to solve the class imbalance problem of image datasets. After each training epoch of deep neural networks, we compute the classification hardness of each class. We will assign higher class weights to the classes have large classification hardness values and vice versa in the next epoch. In this way, HAD could tune the weight of each sample in the loss function dynamically during the training process. The experimental results prove that HAD significantly outperforms the state-of-the-art methods. Moreover, HAD greatly improves the classification accuracies of minority classes while only making a small compromise of majority class accuracies. Especially, HAD loss improves 10.04% average precision compared with the best baseline, Focal loss, on the HAM10000 dataset. At last, I conclude this dissertation with our contributions to the imbalance learning, and provide an overview of potential directions for future research, which include extensions of the three proposed methods, development of task-specified algorithms, and fixing the challenges of within-class imbalance.2021-06-0

    Transforming urinary stone disease management by artificial intelligence-based methods: A comprehensive review

    Get PDF
    Objective: To provide a comprehensive review on the existing research and evi-dence regarding artificial intelligence (AI) applications in the assessment and management of urinary stone disease.Methods: A comprehensive literature review was performed using PubMed, Scopus, and Google Scholar databases to identify publications about innovative concepts or supporting applica-tions of AI in the improvement of every medical procedure relating to stone disease. The terms "endourology", "artificial intelligence", "machine learning", and "urolithiasis"were used for searching eligible reports, while review articles, articles referring to automated procedures without AI application, and editorial comments were excluded from the final set of publica-tions. The search was conducted from January 2000 to September 2023 and included manu-scripts in the English language.Results: A total of 69 studies were identified. The main subjects were related to the detection of urinary stones, the prediction of the outcome of conservative or operative management, the optimization of operative procedures, and the elucidation of the relation of urinary stone chemistry with various factors.Conclusion: AI represents a useful tool that provides urologists with numerous amenities, which explains the fact that it has gained ground in the pursuit of stone disease management perfection. The effectiveness of diagnosis and therapy can be increased by using it as an alter-native or adjunct to the already existing data. However, little is known concerning the poten-tial of this vast field. Electronic patient records, containing big data, offer AI the opportunity to develop and analyze more precise and efficient diagnostic and treatment algorithms. Never-theless, the existing applications are not generalizable in real-life practice, and high-quality studies are needed to establish the integration of AI in the management of urinary stone dis-ease.CNN ; CNN

    ROLE OF MACHINE VISION FOR IDENTIFICATION OF KIDNEY STONES USING MULTI FEATURES ANALYSIS

    Get PDF
    The purpose of this study is to highlight the significance of machine vision for the Classification of kidney stone identification. A novel optimized fused texture features frame work was designed to identify the stones in kidney.  A fused 234 texture feature namely (GLCM, RLM and Histogram) feature set was acquired by each region of interest (ROI). It was observed that on each image 8 ROI’s of sizes (16x16, 20x20 and 22x22) were taken. It was difficult to handle a large feature space 280800 (1200x234). Now to overcome this data handling issue we have applied feature optimization technique namely POE+ACC and acquired 30 most optimized features set for each ROI. The optimized fused features data set 3600(1200x30) was used to four machine vision Classifiers that is Random Forest, MLP, j48 and Naïve Bayes. Finally, it was observed that Random Forest provides best results of 90% accuracy on ROI 22x22 among the above discussed deployed Classifier

    Radiomics in urolithiasis: Systematic review of current applications, limitations, and future directions

    Get PDF
    Radiomics is increasingly applied to the diagnosis, management, and outcome prediction of various urological conditions. Urolithiasis is a common benign condition with a high incidence and recurrence rate. The purpose of this scoping review is to evaluate the current evidence of the application of radiomics in urolithiasis, especially its utility in diagnostics and therapeutics. An electronic literature search on radiomics in the setting of urolithiasis was conducted on PubMed, EMBASE, and Scopus from inception to 21 March 2022. A total of 7 studies were included. Radiomics has been successfully applied in the field of urolithiasis to differentiate phleboliths from calculi and classify stone types and composition pre-operatively. More importantly, it has also been utilized to predict outcomes and complications after endourological procedures. Although radiomics in urolithiasis is still in its infancy, it has the potential for large-scale implementation. Its greatest potential lies in the correlation with conventional established diagnostic and therapeutic factors

    Minimally Invasive Urological Procedures and Related Technological Developments

    Get PDF
    The landscape of minimally invasive urological intervention is changing. A lot of new innovations and technological developments have happened over the last 3 decades. Laparoscopy and robotic surgery have revolutionised kidney and prostate cancer treatment, with more minimally invasive procedures now being carried out than ever before. At the same time, technological advancements and the use of laser have changed the face of endourology. Several new innovative treatments are now commonplace for benign prostate enlargement (BPE). Management of prostate cancer now involves procedures such as robotic prostatectomy, brachytherapy, radiotherapy, cryotherapy and HIFU. Robotic partial nephrectomy and cryotherapy have changed the face of renal cancer. En-bloc resection of bladder cancer is challenging the traditional management of non-muscle invasive bladder cancer and becoming commonplace, while robotic cystectomy is also gaining popularity for muscle invasive bladder cancer. Newer surgical intervention related to BPE includes laser (holmium, thulium and green light), water-based treatment (Rezum, Aquablation) and other minimally invasive procedures such as prostate artery embolisation (PAE) and Urolift. Endourological procedures have incorporated newer laser types and settings such as moses technology, disposable ureteroscopes (URS) and minimisation of percutaneous nephrolithotomy (PCNL) instruments. All these technological innovations and improvements have led to shorter hospital stay, reduced cost, potential reduction in complications and improvement in the quality of life (QoL)

    Efficient Feature Selection and ML Algorithm for Accurate Diagnostics

    Get PDF
    Machine learning algorithms have been deployed in numerous optimization, prediction and classification problems. This has endeared them for application in fields such as computer networks and medical diagnosis. Although these machine learning algorithms achieve convincing results in these fields, they face numerous challenges when deployed on imbalanced dataset. Consequently, these algorithms are often biased towards majority class, hence unable to generalize the learning process. In addition, they are unable to effectively deal with high-dimensional datasets. Moreover, the utilization of conventional feature selection techniques from a dataset based on attribute significance render them ineffective for majority of the diagnosis applications. In this paper, feature selection is executed using the more effective Neighbour Components Analysis (NCA). During the classification process, an ensemble classifier comprising of K-Nearest Neighbours (KNN), Naive Bayes (NB), Decision Tree (DT) and Support Vector Machine (SVM) is built, trained and tested. Finally, cross validation is carried out to evaluate the developed ensemble model. The results shows that the proposed classifier has the best performance in terms of precision, recall, F-measure and classification accuracy

    Application and Extension of Weighted Quantile Sum Regression for the Development of a Clinical Risk Prediction Tool

    Get PDF
    In clinical settings, the diagnosis of medical conditions is often aided by measurement of various serum biomarkers through the use of laboratory tests. These biomarkers provide information about different aspects of a patient’s health and the overall function of different organs. In this dissertation, we develop and validate a weighted composite index that aggregates the information from a variety of health biomarkers covering multiple organ systems. The index can be used for predicting all-cause mortality and could also be used as a holistic measure of overall physiological health status. We refer to it as the Health Status Metric (HSM). Validation analysis shows that the HSM is predictive of long-term mortality risk and exhibits a robust association with concurrent chronic conditions, recent hospital utilization, and self-rated health. We develop the HSM using Weighted Quantile Sum (WQS) regression (Gennings et al., 2013; Carrico, 2013), a novel penalized regression technique that imposes nonnegativity and unit-sum constraints on the coefficients used to weight index components. In this dissertation, we develop a number of extensions to the WQS regression technique and apply them to the construction of the HSM. We introduce a new guided approach for the standardization of index components which accounts for potential nonlinear relationships with the outcome of interest. An extended version of the WQS that accommodates interaction effects among index components is also developed and implemented. In addition, we demonstrate that ensemble learning methods borrowed from the field of machine learning can be used to improve the predictive power of the WQS index. Specifically, we show that the use of techniques such as weighted bagging, the random subspace method and stacked generalization in conjunction with the WQS model can produce an index with substantially enhanced predictive accuracy. Finally, practical applications of the HSM are explored. A comparative study is performed to evaluate the feasibility and effectiveness of a number of ‘real-time’ imputation strategies in potential software applications for computing the HSM. In addition, the efficacy of the HSM as a predictor of hospital readmission is assessed in a cohort of emergency department patients
    corecore