3,763 research outputs found

    Artificial immune systems based committee machine for classification application

    Get PDF
    This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.A new adaptive learning Artificial Immune System (AIS) based committee machine is developed in this thesis. The new proposed approach efficiently tackles the general problem of clustering high-dimensional data. In addition, it helps on deriving useful decision and results related to other application domains such classification and prediction. Artificial Immune System (AIS) is a branch of computational intelligence field inspired by the biological immune system, and has gained increasing interest among researchers in the development of immune-based models and techniques to solve diverse complex computational or engineering problems. This work presents some applications of AIS techniques to health problems, and a thorough survey of existing AIS models and algorithms. The main focus of this research is devoted to building an ensemble model integrating different AIS techniques (i.e. Artificial Immune Networks, Clonal Selection, and Negative Selection) for classification applications to achieve better classification results. A new AIS-based ensemble architecture with adaptive learning features is proposed by integrating different learning and adaptation techniques to overcome individual limitations and to achieve synergetic effects through the combination of these techniques. Various techniques related to the design and enhancements of the new adaptive learning architecture are studied, including a neuro-fuzzy based detector and an optimizer using particle swarm optimization method to achieve enhanced classification performance. An evaluation study was conducted to show the performance of the new proposed adaptive learning ensemble and to compare it to alternative combining techniques. Several experiments are presented using different medical datasets for the classification problem and findings and outcomes are discussed. The new adaptive learning architecture improves the accuracy of the ensemble. Moreover, there is an improvement over the existing aggregation techniques. The outcomes, assumptions and limitations of the proposed methods with its implications for further research in this area draw this research to its conclusion

    Statistical Analysis with Machine and Neural Learning-Based Model on Cardiovascular Diseases and Stroke Prediction

    Get PDF
    Several risk factors, such as hypertension, hyperlipidemia, and an irregular heart rhythm, make an early diagnosis of cardiovascular disease challenging. Reducing cardiac risk calls for precise diagnosis and therapy. Clinical practice in the healthcare business is likely to evolve in tandem as a result of advancements in machine learning. Therefore, scientists and doctors need to acknowledge machine learning's significance. The fundamental purpose of this research is to a reliable analyzing Risk Factors for Cardiovascular Disease method that makes use of machine learning. Classifying well-known cardiovascular datasets But, on the other hand, is a job for state-of-the-art machine learning techniques and neural network algorithms. Several statistical and visualization indicators were used to assess the efficacy of the suggested approaches and to determine the optimal machine-learning and neural-network approach. Using these modeling methods acquired high and accurate accuracy on stroke and heart disease prediction

    Country-level pandemic risk and preparedness classification based on COVID-19 data: A machine learning approach

    Get PDF
    In this work we present a three-stage Machine Learning strategy to country-level risk classification based on countries that are reporting COVID-19 information. A K% binning discretisation (K = 25) is used to create four risk groups of countries based on the risk of transmission (coronavirus cases per million population), risk of mortality (coronavirus deaths per million population), and risk of inability to test (coronavirus tests per million population). The four risk groups produced by K% binning are labelled as ‘low’, ‘medium-low’, ‘medium-high’, and ‘high’. Coronavirus-related data are then removed and the attributes for prediction of the three types of risk are given as the geopolitical and demographic data describing each country. Thus, the calculation of class label is based on coronavirus data but the input attributes are country-level information regardless of coronavirus data. The three four-class classification problems are then explored and benchmarked through leave-one-country-out cross validation to find the strongest model, producing a Stack of Gradient Boosting and Decision Tree algorithms for risk of transmission, a Stack of Support Vector Machine and Extra Trees for risk of mortality, and a Gradient Boosting algorithm for the risk of inability to test. It is noted that high risk for inability to test is often coupled with low risks for transmission and mortality, therefore the risk of inability to test should be interpreted first, before consideration is given to the predicted transmission and mortality risks. Finally, the approach is applied to more recent risk levels to data from September 2020 and weaker results are noted due to the growth of international collaboration detracting useful knowledge from country-level attributes which suggests that similar machine learning approaches are more useful prior to situations later unfolding

    ROBUST DETECTION OF CORONARY HEART DISEASE USING MACHINE LEARNING ALGORITHMS

    Get PDF
    Predicting whether or not someone will get heart or cardiac disease is now one of the most difficult jobs in the area of medicine. Heart disease is responsible for the deaths of about one person per minute in the contemporary age. Processing the vast amounts of data that are generated in the field of healthcare is an important application for data science. Because predicting cardiac disease is a difficult undertaking, there is a pressing need to automate the prediction process to minimize the dangers that are connected with it and provide the patient with timely warning. The chapter one in this thesis report highlights the importance of this problem and identifies the need to augment the current technological efforts to produce relatively more accurate system in facilitating the timely decision about the problem. The chapter one also presents the current literature about the theories and systems developed and assessed in this direction.This thesis work makes use of the dataset on cardiac illness that can be found in the machine learning repository at UCI. Using a variety of data mining strategies, such as Naive Bayes, Decision Tree, Support Vector Machine (SVM), K-Nearest Neighbor (K-NN), and Random Forest, the work that has been reported in this thesis estimates the likelihood that a patient would develop heart disease and can categorize the patient\u27s degree of risk. The performance of chosen classifiers is tested on chosen feature space with help of feature selection algorithm. On Cleveland heart datasets of heart disease, the models were placed for training and testing. To assess the usefulness and strength of each model, several performance metrics are utilized, including sensitivity, accuracy, AUC, specificity, ROC curve and F1-score. The effort behind this research leads to conduct a comparative analysis by computing the performance of several machine learning algorithms. The results of the experiment demonstrate that the Random Forest and Support Vector machine algorithms achieved the best level of accuracy (94.50% and 91.73% respectively) on selected feature space when compared to the other machine learning methods that were employed. Thus, these two classifiers turned out to be promising classifiers for heart disease prediction. The computational complexity of each classifier was also investigated. Based on the computational complexity and comparative experimental results, a robust heart disease prediction is proposed for an embedded platform, where benefits of multiple classifiers are accumulated. The system proposes that heart disease detection is possible with higher confidence if and only if many of these classifiers detect it. In the end, results of experimental work are concluded and possible future strategies in enhancing this effort are discussed

    Methods to Improve the Prediction Accuracy and Performance of Ensemble Models

    Get PDF
    The application of ensemble predictive models has been an important research area in predicting medical diagnostics, engineering diagnostics, and other related smart devices and related technologies. Most of the current predictive models are complex and not reliable despite numerous efforts in the past by the research community. The performance accuracy of the predictive models have not always been realised due to many factors such as complexity and class imbalance. Therefore there is a need to improve the predictive accuracy of current ensemble models and to enhance their applications and reliability and non-visual predictive tools. The research work presented in this thesis has adopted a pragmatic phased approach to propose and develop new ensemble models using multiple methods and validated the methods through rigorous testing and implementation in different phases. The first phase comprises of empirical investigations on standalone and ensemble algorithms that were carried out to ascertain their performance effects on complexity and simplicity of the classifiers. The second phase comprises of an improved ensemble model based on the integration of Extended Kalman Filter (EKF), Radial Basis Function Network (RBFN) and AdaBoost algorithms. The third phase comprises of an extended model based on early stop concepts, AdaBoost algorithm, and statistical performance of the training samples to minimize overfitting performance of the proposed model. The fourth phase comprises of an enhanced analytical multivariate logistic regression predictive model developed to minimize the complexity and improve prediction accuracy of logistic regression model. To facilitate the practical application of the proposed models; an ensemble non-invasive analytical tool is proposed and developed. The tool links the gap between theoretical concepts and practical application of theories to predict breast cancer survivability. The empirical findings suggested that: (1) increasing the complexity and topology of algorithms does not necessarily lead to a better algorithmic performance, (2) boosting by resampling performs slightly better than boosting by reweighting, (3) the prediction accuracy of the proposed ensemble EKF-RBFN-AdaBoost model performed better than several established ensemble models, (4) the proposed early stopped model converges faster and minimizes overfitting better compare with other models, (5) the proposed multivariate logistic regression concept minimizes the complexity models (6) the performance of the proposed analytical non-invasive tool performed comparatively better than many of the benchmark analytical tools used in predicting breast cancers and diabetics ailments. The research contributions to ensemble practice are: (1) the integration and development of EKF, RBFN and AdaBoost algorithms as an ensemble model, (2) the development and validation of ensemble model based on early stop concepts, AdaBoost, and statistical concepts of the training samples, (3) the development and validation of predictive logistic regression model based on breast cancer, and (4) the development and validation of a non-invasive breast cancer analytic tools based on the proposed and developed predictive models in this thesis. To validate prediction accuracy of ensemble models, in this thesis the proposed models were applied in modelling breast cancer survivability and diabetics’ diagnostic tasks. In comparison with other established models the simulation results of the models showed improved predictive accuracy. The research outlines the benefits of the proposed models, whilst proposes new directions for future work that could further extend and improve the proposed models discussed in this thesis

    Integrated Machine Learning and Bioinformatics Approaches for Prediction of Cancer-Driving Gene Mutations

    Get PDF
    Cancer arises from the accumulation of somatic mutations and genetic alterations in cell division checkpoints and apoptosis, this often leads to abnormal tumor proliferation. Proper classification of cancer-linked driver mutations will considerably help our understanding of the molecular dynamics of cancer. In this study, we compared several cancer-specific predictive models for prediction of driver mutations in cancer-linked genes that were validated on canonical data sets of functionally validated mutations and applied to a raw cancer genomics data. By analyzing pathogenicity prediction and conservation scores, we have shown that evolutionary conservation scores play a pivotal role in the classification of cancer drivers and were the most informative features in the driver mutation classification. Through extensive comparative analysis with structure-functional experiments and multicenter mutational calling data from PanCancer Atlas studies, we have demonstrated the robustness of our models and addressed the validity of computational predictions. We evaluated the performance of our models using the standard diagnostic metrics such as sensitivity, specificity, area under the curve and F-measure. To address the interpretability of cancer-specific classification models and obtain novel insights about molecular signatures of driver mutations, we have complemented machine learning predictions with structure-functional analysis of cancer driver mutations in several key tumor suppressor genes and oncogenes. Through the experiments carried out in this study, we found that evolutionary-based features have the strongest signal in the machine learning classification VII of driver mutations and provide orthogonal information to the ensembled-based scores that are prominent in the ranking of feature importance
    corecore