12 research outputs found

    Determinant of Covariance Matrix Model Coupled with AdaBoost Classification Algorithm for EEG Seizure Detection

    Get PDF
    Experts usually inspect electroencephalogram (EEG) recordings page-by-page in order to identify epileptic seizures, which leads to heavy workloads and is time consuming. However, the efficient extraction and effective selection of informative EEG features is crucial in assisting clinicians to diagnose epilepsy accurately. In this paper, a determinant of covariance matrix (Cov–Det) model is suggested for reducing EEG dimensionality. First, EEG signals are segmented into intervals using a sliding window technique. Then, Cov–Det is applied to each interval. To construct a features vector, a set of statistical features are extracted from each interval. To eliminate redundant features, the Kolmogorov–Smirnov (KST) and Mann–Whitney U (MWUT) tests are integrated, the extracted features ranked based on KST and MWUT metrics, and arithmetic operators are adopted to construe the most pertinent classified features for each pair in the EEG signal group. The selected features are then fed into the proposed AdaBoost Back-Propagation neural network (AB_BP_NN) to effectively classify EEG signals into seizure and free seizure segments. Finally, the AB_BP_NN is compared with several classical machine learning techniques; the results demonstrate that the proposed mode of AB_BP_NN provides insignificant false positive rates, simpler design, and robustness in classifying epileptic signals. Two datasets, the Bern–Barcelona and Bonn datasets, are used for performance evaluation. The proposed technique achieved an average accuracy of 100% and 98.86%, respectively, for the Bern–Barcelona and Bonn datasets, which is considered a noteworthy improvement compared to the current state-of-the-art methods

    Predictive Ensemble Modelling: An Experimental Comparison of Boosting Implementation Methods

    Get PDF
    This paper presents the empirical comparison of boosting implementation by reweighting and resampling methods. The goal of this paper is to determine which of the two methods performs better. In the study, we used four algorithms namely: Decision Stump, Neural Network, Random Forest and Support Vector Machine as base classifiers and AdaBoost as a technique to develop various ensemble models. We applied 10-fold cross validation method in measuring and evaluating the performance metrics of the models. The results show that in both methods the average of the correctly classified and incorrectly classified are relatively the same. However, average values of the RMSE in both methods are insignificantly different. The results further show that the two methods are independent of the datasets and the base classier used. Additionally, we found that the complexity of the chosen ensemble technique and boosting method does not necessarily lead to better performance

    ACCURACY DRIVEN ARTIFICIAL NEURAL NETWORKS IN STOCK MARKET PREDICTION

    Get PDF
    ABSTRAC

    Methods to Improve the Prediction Accuracy and Performance of Ensemble Models

    Get PDF
    The application of ensemble predictive models has been an important research area in predicting medical diagnostics, engineering diagnostics, and other related smart devices and related technologies. Most of the current predictive models are complex and not reliable despite numerous efforts in the past by the research community. The performance accuracy of the predictive models have not always been realised due to many factors such as complexity and class imbalance. Therefore there is a need to improve the predictive accuracy of current ensemble models and to enhance their applications and reliability and non-visual predictive tools. The research work presented in this thesis has adopted a pragmatic phased approach to propose and develop new ensemble models using multiple methods and validated the methods through rigorous testing and implementation in different phases. The first phase comprises of empirical investigations on standalone and ensemble algorithms that were carried out to ascertain their performance effects on complexity and simplicity of the classifiers. The second phase comprises of an improved ensemble model based on the integration of Extended Kalman Filter (EKF), Radial Basis Function Network (RBFN) and AdaBoost algorithms. The third phase comprises of an extended model based on early stop concepts, AdaBoost algorithm, and statistical performance of the training samples to minimize overfitting performance of the proposed model. The fourth phase comprises of an enhanced analytical multivariate logistic regression predictive model developed to minimize the complexity and improve prediction accuracy of logistic regression model. To facilitate the practical application of the proposed models; an ensemble non-invasive analytical tool is proposed and developed. The tool links the gap between theoretical concepts and practical application of theories to predict breast cancer survivability. The empirical findings suggested that: (1) increasing the complexity and topology of algorithms does not necessarily lead to a better algorithmic performance, (2) boosting by resampling performs slightly better than boosting by reweighting, (3) the prediction accuracy of the proposed ensemble EKF-RBFN-AdaBoost model performed better than several established ensemble models, (4) the proposed early stopped model converges faster and minimizes overfitting better compare with other models, (5) the proposed multivariate logistic regression concept minimizes the complexity models (6) the performance of the proposed analytical non-invasive tool performed comparatively better than many of the benchmark analytical tools used in predicting breast cancers and diabetics ailments. The research contributions to ensemble practice are: (1) the integration and development of EKF, RBFN and AdaBoost algorithms as an ensemble model, (2) the development and validation of ensemble model based on early stop concepts, AdaBoost, and statistical concepts of the training samples, (3) the development and validation of predictive logistic regression model based on breast cancer, and (4) the development and validation of a non-invasive breast cancer analytic tools based on the proposed and developed predictive models in this thesis. To validate prediction accuracy of ensemble models, in this thesis the proposed models were applied in modelling breast cancer survivability and diabetics’ diagnostic tasks. In comparison with other established models the simulation results of the models showed improved predictive accuracy. The research outlines the benefits of the proposed models, whilst proposes new directions for future work that could further extend and improve the proposed models discussed in this thesis

    To weight or not to weight, that is the question: the design of a composite indicator of landscape fragmentation

    Get PDF
    Composite indicators (CIs), i.e., combinations of many indicators in a unique synthetizing measure, are useful for disentangling multisector phenomena. Prominent questions concern indicators’ weighting, which implies time-consuming activities and should be properly justified. Landscape fragmentation (LF), the subdivision of habitats in smaller and more isolated patches, has been studied through the composite index of landscape fragmentation (CILF). It was originally proposed by us as an unweighted combination of three LF indicators for the study of the phenomenon in Sardinia, Italy. In this paper, we aim at presenting a weighted release of the CILF and at developing the Hamletian question of whether weighting is worthwhile or not. We focus on the sensitivity of the composite to different algorithms combining three weighting patterns (equalization, extraction by principal component analysis, and expert judgment) and three indicators aggregation rules (weighted average mean, weighted geometric mean, and weighted generalized geometric mean). The exercise provides the reader with meaningful results. Higher sensitivity values signal that the effort of weighting leads to more informative composites. Otherwise, high robustness does not mean that weighting was not worthwhile. Weighting per se can be beneficial for more acceptable and viable decisional processes

    Grupiranje zasnovano na skupljanju dokaza s vjerojatnosno-neizrazitim C-means pristupom za dijagnozu bolesti

    Get PDF
    Traditionally, supervised machine learning methods are the first choice for tasks involving classification of data. This study provides a non-conventional hybrid alternative technique (pEAC) that blends the Possibilistic Fuzzy C-Means (PFCM) as base cluster generating algorithm into the ‘standard’ Evidence Accumulation Clustering (EAC) clustering method. The PFCM coalesces the separate properties of the Possibilistic C-Means (PCM) and Fuzzy C-Means (FCM) algorithms into a sophisticated clustering algorithm. Notwithstanding the tremendous capabilities offered by this hybrid technique, in terms of structure, it resembles the hEAC and fEAC ensemble clustering techniques that are realised by integrating the K-Means and FCM clustering algorithms into the EAC technique. To validate the new technique’s effectiveness, its performance on both synthetic and real medical datasets was evaluated alongside individual runs of well-known clustering methods, other unsupervised ensemble clustering techniques and some supervised machine learning methods. Our results show that the proposed pEAC technique outperformed the individual runs of the clustering methods and other unsupervised ensemble techniques in terms accuracy for the diagnosis of hepatitis, cardiovascular, breast cancer, and diabetes ailments that were used in the experiments. Remarkably, compared alongside selected supervised machine learning classification models, our proposed pEAC ensemble technique exhibits better diagnosing accuracy for the two breast cancer datasets that were used, which suggests that even at the cost of none labelling of data, the proposed technique offers efficient medical data classification.Tradicionalno, metode nadziranog strojnog uÄŤenja predstavljaju prvi izbor za zadatke koji ukljuÄŤuju klasifikaciju podataka. Ovo istraĹľivanje prikazuje nekonvencionalnu hibridnu alternativnu (pEAC) tehniku koja kombinira vjerojatnosno-neizraziti C-Means (PFCM) kao osnovni algoritam grupiranja u standardno grupiranje korištenjem grupiranja zasnovanog na skupljanju dokaza (EAC). PFCM objedinjuje zasebna svojstva vjerojatnosnog C-Means (PCM) i neizrazitog C-Means (FCM) algoritama u sofisticirani algoritam grupiranja. Usprkos ogromnim mogućnostima koje nudi ova tehnika, u smislu strukture, ona je nalik cjelovitim hEAC i fEAC tehnikama grupiranja realiziranim integracijom K-Means i FCM algoritama grupiranja u EAC tehniku.Kako bi se validirala uÄŤinkovitost, njeno ponašanje je ispitano na sintetiÄŤkim i stvarnim medicinskim podacima te su provedene usporedbe s pojedinaÄŤnim široko rasprostranjenim metodama, drugim nenadziranim tehnikama grupiranja i nekim nadziranim metodama uÄŤenja. Rezultat prikazuje kako predloĹľena pEAC tehnika nadmašuje pojedine metode grupiranja i druge tehnike nenadziranog uÄŤenja u smislu toÄŤnosti u dijagnozi hepatitisa, kadiovaskularnih bolesti, raka dojke i dijabetesa, korištenih u eksperimentu.ZnaÄŤajno, u usporedbi s odabranim nadziranim modelima klasifikacije, predloĹľena pEAC tehnika pokazuje bolju toÄŤnost dijagnoze na dvama korištenim bazama podataka za rak dojke, što ukazuje na to da ÄŤak i bez oznaÄŤenih podataka predloĹľena tehnika nudi efikasnu klasifikaciju medicinskih podataka

    Secure steganography, compression and diagnoses of electrocardiograms in wireless body sensor networks

    Get PDF
    Submission of this completed form results in your thesis/project being lodged online at the RMIT Research Repository. Further information about the RMIT Research Repository is available at http://researchbank.rmit.edu.au Please complete abstract and keywords below for cataloguing and indexing your thesis/project. Abstract (Minimum 200 words, maximum 500 words) The usage of e-health applications is increasing in the modern era. Remote cardiac patients monitoring application is an important example of these e-health applications. Diagnosing cardiac disease in time is of crucial importance to save many patients lives. More than 3.5 million Australians suffer from long-term cardiac diseases. Therefore, in an ideal situation, a continuous cardiac monitoring system should be provided for this large number of patients. However, health-care providers lack the technology required to achieve this objective. Cloud services can be utilized to fill the technology gap for health-care providers. However, three main problems prevent health-care providers from using cloud services. Privacy, performance and accuracy of diagnoses. In this thesis we are addressing these three problems. To provide strong privacy protection services, two steganography techniques are proposed. Both techniques could achieve promising results in terms of security and distortion measurement. The differences between original and resultant watermarked ECG signals were less then 1%. Accordingly, the resultant ECG signal can be still used for diagnoses purposes, and only authorized persons who have the required security information, can extract the hidden secret data in the ECG signal. Consequently, to solve the performance problem of storing huge amount of data concerning ECG into the cloud, two types of compression techniques are introduced: Fractal based lossy compression technique and Gaussian based lossless compression technique. This thesis proves that, fractal models can be efficiently used in ECG lossy compression. Moreover, the proposed fractal technique is a multi-processing ready technique that is suitable to be implemented inside a cloud to make use of its multi processing capability. A high compression ratio could be achieved with low distortion effects. The Gaussian lossless compression technique is proposed to provide a high compression ratio. Moreover, because the compressed files are stored in the cloud, its services should be able to provide automatic diagnosis capability. Therefore, cloud services should be able to diagnose compressed ECG files without undergoing a decompression stage to reduce additional processing overhead. Accordingly, the proposed Gaussian compression provides the ability to diagnose the resultant compressed file. Subsequently, to make use of this homomorphic feature of the proposed Gaussian compression algorithm, in this thesis we have introduced a new diagnoses technique that can be used to detect life-threatening cardiac diseases such as Ventricular Tachycardia and Ventricular Fibrillation. The proposed technique is applied directly to the compressed ECG files without going through the decompression stage. The proposed technique could achieve high accuracy results near to 100% for detecting Ventricular Arrhythmia and 96% for detecting Left Bundle Branch Block. Finally, we believe that in this thesis, the first steps towards encouraging health-care providers to use cloud services have been taken. However, this journey is still long

    Improving Prediction Accuracy of Breast Cancer Survivability and Diabetes Diagnosis via RBF Networks trained with EKF models

    Get PDF
    The continued reliance on machine learning algorithms and robotic devices in the medical and engineering practices has prompted the need for the accuracy prediction of such devices. It has attracted many researchers in recent years and has led to the development of various ensembles and standalone models to address prediction accuracy issues. This study was carried out to investigate the integration of EKF, RBF networks and AdaBoost as an ensemble model to improve prediction accuracy. In this study we proposed a model termed EKF-RBFN-ADABOOST

    Developing artificial intelligence models for classification of brain disorder diseases based on statistical techniques

    Get PDF
    The Abstract is currently unavailable, due to the thesis being under Embargo

    EXPLAINABLE FEATURE- AND DECISION-LEVEL FUSION

    Get PDF
    Information fusion is the process of aggregating knowledge from multiple data sources to produce more consistent, accurate, and useful information than any one individual source can provide. In general, there are three primary sources of data/information: humans, algorithms, and sensors. Typically, objective data---e.g., measurements---arise from sensors. Using these data sources, applications such as computer vision and remote sensing have long been applying fusion at different levels (signal, feature, decision, etc.). Furthermore, the daily advancement in engineering technologies like smart cars, which operate in complex and dynamic environments using multiple sensors, are raising both the demand for and complexity of fusion. There is a great need to discover new theories to combine and analyze heterogeneous data arising from one or more sources. The work collected in this dissertation addresses the problem of feature- and decision-level fusion. Specifically, this work focuses on fuzzy choquet integral (ChI)-based data fusion methods. Most mathematical approaches for data fusion have focused on combining inputs relative to the assumption of independence between them. However, often there are rich interactions (e.g., correlations) between inputs that should be exploited. The ChI is a powerful aggregation tool that is capable modeling these interactions. Consider the fusion of m sources, where there are 2m unique subsets (interactions); the ChI is capable of learning the worth of each of these possible source subsets. However, the complexity of fuzzy integral-based methods grows quickly, as the number of trainable parameters for the fusion of m sources scales as 2m. Hence, we require a large amount of training data to avoid the problem of over-fitting. This work addresses the over-fitting problem of ChI-based data fusion with novel regularization strategies. These regularization strategies alleviate the issue of over-fitting while training with limited data and also enable the user to consciously push the learned methods to take a predefined, or perhaps known, structure. Also, the existing methods for training the ChI for decision- and feature-level data fusion involve quadratic programming (QP). The QP-based learning approach for learning ChI-based data fusion solutions has a high space complexity. This has limited the practical application of ChI-based data fusion methods to six or fewer input sources. To address the space complexity issue, this work introduces an online training algorithm for learning ChI. The online method is an iterative gradient descent approach that processes one observation at a time, enabling the applicability of ChI-based data fusion on higher dimensional data sets. In many real-world data fusion applications, it is imperative to have an explanation or interpretation. This may include providing information on what was learned, what is the worth of individual sources, why a decision was reached, what evidence process(es) were used, and what confidence does the system have on its decision. However, most existing machine learning solutions for data fusion are black boxes, e.g., deep learning. In this work, we designed methods and metrics that help with answering these questions of interpretation, and we also developed visualization methods that help users better understand the machine learning solution and its behavior for different instances of data
    corecore