11 research outputs found

    A comparison of machine learning algorithms for multilabel classification of CAN

    Get PDF
    This article is devoted to the investigation and comparison of several important machine learning algorithms in their ability to obtain multilabel classifications of the stages of cardiac autonomic neuropathy (CAN). Data was collected by the Diabetes Complications Screening Research Initiative at Charles Sturt University. Our experiments have achieved better results than those published previously in the literature for similar CAN identification tasks

    Rule-based classifiers and meta classifiers for identification of cardiac autonomic neuropathy progression

    Get PDF
    We investigate and compare several rule-based classifiers and meta classifiers in their ability to obtain multi-class classifications of cardiac autonomic neuropathy (CAN) and its progression. The best results obtained in our experiments are significantly better than the outcomes published previously in the literature for analogous CAN identification tasks or simpler binary classification tasks

    Decision trees and multi-level ensemble classifiers for neurological diagnostics

    Full text link
    Cardiac autonomic neuropathy (CAN) is a well known complication of diabetes leading to impaired regulation of blood pressure and heart rate, and increases the risk of cardiac associated mortality of diabetes patients. The neurological diagnostics of CAN progression is an important problem that is being actively investigated. This paper uses data collected as part of a large and unique Diabetes Screening Complications Research Initiative (DiScRi) in Australia with data from numerous tests related to diabetes to classify CAN progression. The present paper is devoted to recent experimental investigations of the effectiveness of applications of decision trees, ensemble classifiers and multi-level ensemble classifiers for neurological diagnostics of CAN. We present the results of experiments comparing the effectiveness of ADTree, J48, NBTree, RandomTree, REPTree and SimpleCart decision tree classifiers. Our results show that SimpleCart was the most effective for the DiScRi data set in classifying CAN. We also investigated and compared the effectiveness of AdaBoost, Bagging, MultiBoost, Stacking, Decorate, Dagging, and Grading, based on Ripple Down Rules as examples of ensemble classifiers. Further, we investigated the effectiveness of these ensemble methods as a function of the base classifiers, and determined that Random Forest performed best as a base classifier, and AdaBoost, Bagging and Decorate achieved the best outcomes as meta-classifiers in this setting. Finally, we investigated the meta-classifiers that performed best in their ability to enhance the performance further within the framework of a multi-level classification paradigm. Experimental results show that the multi-level paradigm performed best when Bagging and Decorate were combined in the construction of a multi-level ensemble classifier

    Machine learning algorithms for analysis of DNA data sets

    Full text link
    The applications of machine learning algorithms to the analysis of data sets of DNA sequences are very important. The present chapter is devoted to the experimental investigation of applications of several machine learning algorithms for the analysis of a JLA data set consisting of DNA sequences derived from non-coding segments in the junction of the large single copy region and inverted repeat A of the chloroplast genome in Eucalyptus collected by Australian biologists. Data sets of this sort represent a new situation, where sophisticated alignment scores have to be used as a measure of similarity. The alignment scores do not satisfy properties of the Minkowski metric, and new machine learning approaches have to be investigated. The authors' experiments show that machine learning algorithms based on local alignment scores achieve very good agreement with known biological classes for this data set. A new machine learning algorithm based on graph partitioning performed best for clustering of the JLA data set. Our novel k-committees algorithm produced most accurate results for classification. Two new examples of synthetic data sets demonstrate that the authors' k-committees algorithm can outperform both the Nearest Neighbour and k-medoids algorithms simultaneously

    Automatic generation of meta classifiers with large levels for distributed computing and networking

    Full text link
    This paper is devoted to a case study of a new construction of classifiers. These classifiers are called automatically generated multi-level meta classifiers, AGMLMC. The construction combines diverse meta classifiers in a new way to create a unified system. This original construction can be generated automatically producing classifiers with large levels. Different meta classifiers are incorporated as low-level integral parts of another meta classifier at the top level. It is intended for the distributed computing and networking. The AGMLMC classifiers are unified classifiers with many parts that can operate in parallel. This make it easy to adopt them in distributed applications. This paper introduces new construction of classifiers and undertakes an experimental study of their performance. We look at a case study of their effectiveness in the special case of the detection and filtering of phishing emails. This is a possible important application area for such large and distributed classification systems. Our experiments investigate the effectiveness of combining diverse meta classifiers into one AGMLMC classifier in the case study of detection and filtering of phishing emails. The results show that new classifiers with large levels achieved better performance compared to the base classifiers and simple meta classifiers classifiers. This demonstrates that the new technique can be applied to increase the performance if diverse meta classifiers are included in the system

    Empirical investigation of decision tree ensembles for monitoring cardiac complications of diabetes

    Full text link
    Cardiac complications of diabetes require continuous monitoring since they may lead to increased morbidity or sudden death of patients. In order to monitor clinical complications of diabetes using wearable sensors, a small set of features have to be identified and effective algorithms for their processing need to be investigated. This article focuses on detecting and monitoring cardiac autonomic neuropathy (CAN) in diabetes patients. The authors investigate and compare the effectiveness of classifiers based on the following decision trees: ADTree, J48, NBTree, RandomTree, REPTree, and SimpleCart. The authors perform a thorough study comparing these decision trees as well as several decision tree ensembles created by applying the following ensemble methods: AdaBoost, Bagging, Dagging, Decorate, Grading, MultiBoost, Stacking, and two multi-level combinations of AdaBoost and MultiBoost with Bagging for the processing of data from diabetes patients for pervasive health monitoring of CAN. This paper concentrates on the particular task of applying decision tree ensembles for the detection and monitoring of cardiac autonomic neuropathy using these features. Experimental outcomes presented here show that the authors' application of the decision tree ensembles for the detection and monitoring of CAN in diabetes patients achieved better performance parameters compared with the results obtained previously in the literature

    Applying clustering and ensemble clustering approaches to phishing profiling

    Get PDF
    This paper describes a novel approach to profiling phishing emails based on the combination of multi- ple independent clusterings of the email documents. Each clustering is motivated by a natural representa- tion of the emails. A data set of 2048 phishing emails provided by a major Australian financial institution was pre-processed to extract features describing the textual content, hyperlinks and orthographic struc- ture of the emails. Independent clusterings using dif- ferent techniques were performed on each representa- tion, and these clusterings were then ensembled using a variety of consensus functions. This paper concen- trates on using several clustering approaches to de- termine the most likely number of phishing groups and explores ways in which individual and combined results relate. The approach suggests a number of phishing groups and the structure of the approach can aid the development of profiles based on the in- dividual clusters. The actual profiling is not carried out in this paper

    Phishing detection and traceback mechanism

    Full text link
     Isredza Rahmi A Hamid’s thesis entitled Phishing Detection and Trackback Mechanism. The thesis investigates detection of phishing attacks through email, novel method to profile the attacker and tracking the attack back to the origin
    corecore