4 research outputs found

    Feature selection of unbalanced breast cancer data using particle swarm optimization

    Get PDF
    Breast cancer is one of the significant deaths causing diseases of women around the globe. Therefore, high accuracy in cancer prediction models is vital to improving patients’ treatment quality and survivability rate. In this work, we presented a new method namely improved balancing particle swarm optimization (IBPSO) algorithm to predict the stage of breast cancer using unbalanced surveillance epidemiology and end result (USEER) data. The work contributes in two directions. First, design and implement an improved particle swarm optimization (IPSO) algorithm to avoid the local minima while reducing USEER data’s dimensionality. The improvement comes primarily through employing the cross-over ability of the genetic algorithm as a fitness function while using the correlation-based function to guide the selection task to a minimal feature subset of USEER sufficiently to describe the universe. Second, develop an improved synthetic minority over-sampling technique (ISMOTE) that avoid overfitting problem while efficiently balance USEER. ISMOTE generates the new objects based on the average of the two objects with the smallest and largest distance from the centroid object of the minority class. The experiments and analysis show that the proposed IBPSO is feasible and effective, outperforms other state-of-the-art methods; in minimizing the features with an accuracy of 98.45%

    Adversarial Learning on Incomplete and Imbalanced Medical Data for Robust Survival Prediction of Liver Transplant Patients

    Get PDF
    The scarcity of liver transplants necessitates prioritizing patients based on their health condition to minimize deaths on the waiting list. Recently, machine learning methods have gained popularity for automatizing liver transplant allocation systems, which enables prompt and suitable selection of recipients. Nevertheless, raw medical data often contain complexities such as missing values and class imbalance that reduce the reliability of the constructed model. This paper aims at eliminating the respective challenges to ensure the reliability of the decision-making process. To this aim, we first propose a novel deep learning method to simultaneously eliminate these challenges and predict the patients\u27 survival chance. Secondly, a hybrid framework is designed that contains three main modules for missing data imputation, class imbalance learning, and classification, each of which employing multiple advanced techniques for the given task. Furthermore, these two approaches are compared and evaluated using a real clinical case study. The experimental results indicate the robust and superior performance of the proposed deep learning method in terms of F-measure and area under the receiver operating characteristic curve (AUC)

    Adversarial Learning on Incomplete and Imbalanced Medical Data for Robust Survival Prediction of Liver Transplant Patients

    Get PDF
    The scarcity of liver transplants necessitates prioritizing patients based on their health condition to minimize deaths on the waiting list. Recently, machine learning methods have gained popularity for automatizing liver transplant allocation systems, which enables prompt and suitable selection of recipients. Nevertheless, raw medical data often contain complexities such as missing values and class imbalance that reduce the reliability of the constructed model. This paper aims at eliminating the respective challenges to ensure the reliability of the decision-making process. To this aim, we first propose a novel deep learning method to simultaneously eliminate these challenges and predict the patients\u27 survival chance. Secondly, a hybrid framework is designed that contains three main modules for missing data imputation, class imbalance learning, and classification, each of which employing multiple advanced techniques for the given task. Furthermore, these two approaches are compared and evaluated using a real clinical case study. The experimental results indicate the robust and superior performance of the proposed deep learning method in terms of F-measure and area under the receiver operating characteristic curve (AUC)

    Improved K-means clustering algorithms : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science, Massey University, New Zealand

    Get PDF
    K-means clustering algorithm is designed to divide the samples into subsets with the goal that maximizes the intra-subset similarity and inter-subset dissimilarity where the similarity measures the relationship between two samples. As an unsupervised learning technique, K-means clustering algorithm is considered one of the most used clustering algorithms and has been applied in a variety of areas such as artificial intelligence, data mining, biology, psychology, marketing, medicine, etc. K-means clustering algorithm is not robust and its clustering result depends on the initialization, the similarity measure, and the predefined cluster number. Previous research focused on solving a part of these issues but has not focused on solving them in a unified framework. However, fixing one of these issues does not guarantee the best performance. To improve K-means clustering algorithm, one of the most famous and widely used clustering algorithms, by solving its issues simultaneously is challenging and significant. This thesis conducts an extensive research on K-means clustering algorithm aiming to improve it. First, we propose the Initialization-Similarity (IS) clustering algorithm to solve the issues of the initialization and the similarity measure of K-means clustering algorithm in a unified way. Specifically, we propose to fix the initialization of the clustering by using sum-of-norms (SON) which outputs the new representation of the original samples and to learn the similarity matrix based on the data distribution. Furthermore, the derived new representation is used to conduct K-means clustering. Second, we propose a Joint Feature Selection with Dynamic Spectral (FSDS) clustering algorithm to solve the issues of the cluster number determination, the similarity measure, and the robustness of the clustering by selecting effective features and reducing the influence of outliers simultaneously. Specifically, we propose to learn the similarity matrix based on the data distribution as well as adding the ranked constraint on the Laplacian matrix of the learned similarity matrix to automatically output the cluster number. Furthermore, the proposed algorithm employs the L2,1-norm as the sparse constraints on the regularization term and the loss function to remove the redundant features and reduce the influence of outliers respectively. Third, we propose a Joint Robust Multi-view (JRM) spectral clustering algorithm that conducts clustering for multi-view data while solving the initialization issue, the cluster number determination, the similarity measure learning, the removal of the redundant features, and the reduction of outlier influence in a unified way. Finally, the proposed algorithms outperformed the state-of-the-art clustering algorithms on real data sets. Moreover, we theoretically prove the convergences of the proposed optimization methods for the proposed objective functions
    corecore