29 research outputs found

    Decision trees and multi-level ensemble classifiers for neurological diagnostics

    Full text link
    Cardiac autonomic neuropathy (CAN) is a well known complication of diabetes leading to impaired regulation of blood pressure and heart rate, and increases the risk of cardiac associated mortality of diabetes patients. The neurological diagnostics of CAN progression is an important problem that is being actively investigated. This paper uses data collected as part of a large and unique Diabetes Screening Complications Research Initiative (DiScRi) in Australia with data from numerous tests related to diabetes to classify CAN progression. The present paper is devoted to recent experimental investigations of the effectiveness of applications of decision trees, ensemble classifiers and multi-level ensemble classifiers for neurological diagnostics of CAN. We present the results of experiments comparing the effectiveness of ADTree, J48, NBTree, RandomTree, REPTree and SimpleCart decision tree classifiers. Our results show that SimpleCart was the most effective for the DiScRi data set in classifying CAN. We also investigated and compared the effectiveness of AdaBoost, Bagging, MultiBoost, Stacking, Decorate, Dagging, and Grading, based on Ripple Down Rules as examples of ensemble classifiers. Further, we investigated the effectiveness of these ensemble methods as a function of the base classifiers, and determined that Random Forest performed best as a base classifier, and AdaBoost, Bagging and Decorate achieved the best outcomes as meta-classifiers in this setting. Finally, we investigated the meta-classifiers that performed best in their ability to enhance the performance further within the framework of a multi-level classification paradigm. Experimental results show that the multi-level paradigm performed best when Bagging and Decorate were combined in the construction of a multi-level ensemble classifier

    Performance evaluation of multi-tier ensemble classifiers for phishing websites

    Get PDF
    This article is devoted to large multi-tier ensemble classifiers generated as ensembles of ensembles and applied to phishing websites. Our new ensemble construction is a special case of the general and productive multi-tier approach well known in information security. Many efficient multi-tier classifiers have been considered in the literature. Our new contribution is in generating new large systems as ensembles of ensembles by linking a top-tier ensemble to another middletier ensemble instead of a base classifier so that the top~ tier ensemble can generate the whole system. This automatic generation capability includes many large ensemble classifiers in two tiers simultaneously and automatically combines them into one hierarchical unified system so that one ensemble is an integral part of another one. This new construction makes it easy to set up and run such large systems. The present article concentrates on the investigation of performance of these new multi~tier ensembles for the example of detection of phishing websites. We carried out systematic experiments evaluating several essential ensemble techniques as well as more recent approaches and studying their performance as parts of multi~level ensembles with three tiers. The results presented here demonstrate that new three-tier ensemble classifiers performed better than the base classifiers and standard ensembles included in the system. This example of application to the classification of phishing websites shows that the new method of combining diverse ensemble techniques into a unified hierarchical three-tier ensemble can be applied to increase the performance of classifiers in situations where data can be processed on a large computer

    Improving classifications for cardiac autonomic neuropathy using multi-level ensemble classifiers and feature selection based on random forest

    Full text link
    This paper is devoted to empirical investigation of novel multi-level ensemble meta classifiers for the detection and monitoring of progression of cardiac autonomic neuropathy, CAN, in diabetes patients. Our experiments relied on an extensive database and concentrated on ensembles of ensembles, or multi-level meta classifiers, for the classification of cardiac autonomic neuropathy progression. First, we carried out a thorough investigation comparing the performance of various base classifiers for several known sets of the most essential features in this database and determined that Random Forest significantly and consistently outperforms all other base classifiers in this new application. Second, we used feature selection and ranking implemented in Random Forest. It was able to identify a new set of features, which has turned out better than all other sets considered for this large and well-known database previously. Random Forest remained the very best classier for the new set of features too. Third, we investigated meta classifiers and new multi-level meta classifiers based on Random Forest, which have improved its performance. The results obtained show that novel multi-level meta classifiers achieved further improvement and obtained new outcomes that are significantly better compared with the outcomes published in the literature previously for cardiac autonomic neuropathy

    Automatic generation of meta classifiers with large levels for distributed computing and networking

    Full text link
    This paper is devoted to a case study of a new construction of classifiers. These classifiers are called automatically generated multi-level meta classifiers, AGMLMC. The construction combines diverse meta classifiers in a new way to create a unified system. This original construction can be generated automatically producing classifiers with large levels. Different meta classifiers are incorporated as low-level integral parts of another meta classifier at the top level. It is intended for the distributed computing and networking. The AGMLMC classifiers are unified classifiers with many parts that can operate in parallel. This make it easy to adopt them in distributed applications. This paper introduces new construction of classifiers and undertakes an experimental study of their performance. We look at a case study of their effectiveness in the special case of the detection and filtering of phishing emails. This is a possible important application area for such large and distributed classification systems. Our experiments investigate the effectiveness of combining diverse meta classifiers into one AGMLMC classifier in the case study of detection and filtering of phishing emails. The results show that new classifiers with large levels achieved better performance compared to the base classifiers and simple meta classifiers classifiers. This demonstrates that the new technique can be applied to increase the performance if diverse meta classifiers are included in the system

    Empirical investigation of decision tree ensembles for monitoring cardiac complications of diabetes

    Full text link
    Cardiac complications of diabetes require continuous monitoring since they may lead to increased morbidity or sudden death of patients. In order to monitor clinical complications of diabetes using wearable sensors, a small set of features have to be identified and effective algorithms for their processing need to be investigated. This article focuses on detecting and monitoring cardiac autonomic neuropathy (CAN) in diabetes patients. The authors investigate and compare the effectiveness of classifiers based on the following decision trees: ADTree, J48, NBTree, RandomTree, REPTree, and SimpleCart. The authors perform a thorough study comparing these decision trees as well as several decision tree ensembles created by applying the following ensemble methods: AdaBoost, Bagging, Dagging, Decorate, Grading, MultiBoost, Stacking, and two multi-level combinations of AdaBoost and MultiBoost with Bagging for the processing of data from diabetes patients for pervasive health monitoring of CAN. This paper concentrates on the particular task of applying decision tree ensembles for the detection and monitoring of cardiac autonomic neuropathy using these features. Experimental outcomes presented here show that the authors' application of the decision tree ensembles for the detection and monitoring of CAN in diabetes patients achieved better performance parameters compared with the results obtained previously in the literature

    Fusi贸n de algoritmos bayesianos y 谩rboles de clasificaci贸n como propuesta para la clasificaci贸n supervisada de fallos de equipos en un laboratorio de c贸mputos

    Get PDF
    Los algoritmos basados en redes bayesianas y 谩rboles de decisi贸n representan m茅todos que han resultado eficientes para la resoluci贸n de problemas de clasificaci贸n. Este trabajo pretende combinar estos algoritmos con el objetivo de obtener un modelo h铆brido que permita aprovechar y combinar las ventajas de ambos. Con esta estrategia se pretende aumentar la precisi贸n en los resultados de la clasificaci贸n supervisada. Este trabajo pretende detallar cual es el grado de precisi贸n en la exactitud, cuando los algoritmos bayesianos son combinados con los 谩rboles de decisi贸n utilizando como recurso los m茅todos de fusi贸n o ensamble Grading y Vote. Los modelos h铆bridos resultantes ser谩n aplicados para la clasificaci贸n de eventos de fallos en equipos pertenecientes a un laboratorio de c贸mputos, con el prop贸sito de aumentar su disponibilidad y mantenibilidad.Eje: Agentes y Sistemas Inteligentes.Red de Universidades con Carreras en Inform谩tica (RedUNCI

    GA-stacking: Evolutionary stacked generalization

    Get PDF
    Stacking is a widely used technique for combining classi铿乪rs and improving prediction accuracy. Early research in Stacking showed that selecting the right classi铿乪rs, their parameters and the meta-classi铿乪rs was a critical issue. Most of the research on this topic hand picks the right combination of classi铿乪rs and their parameters. Instead of starting from these initial strong assumptions, our approach uses genetic algorithms to search for good Stacking con铿乬urations. Since this can lead to over铿乼ting, one of the goals of this paper is to empirically evaluate the overall ef铿乧iency of the approach. A second goal is to compare our approach with the current best Stacking building techniques. The results show that our approach 铿乶ds Stacking con铿乬urations that, in the worst case, perform as well as the best techniques, with the advantage of not having to manually set up the structure of the Stacking system.This work has been partially supported by the Spanish MCyT under projects TRA2007-67374-C02-02 and TIN-2005-08818-C04. Also, it has been supported under MEC grant by TIN2005-08945-C06-05. We thank anonymous reviewers for their helpful comments.Publicad

    Sistema de soporte de decisi贸n para la gesti贸n de fallos en equipos industriales, basado en m茅todos de ensamble

    Get PDF
    Los fallos en equipos industriales representan eventos cr铆ticos en el 谩mbito de cualquier organizaci贸n. Su clasificaci贸n y caracterizaci贸n representa un factor importante que apoya el proceso de toma de decisiones en las actividades de mantenimiento. La Miner铆a de Datos ha desempe帽ado un rol significativo en la evaluaci贸n y clasificaci贸n de los fallos presentados. Los algoritmos basados en redes bayesianas y 谩rboles de decisi贸n han sido utilizados, de manera individual y en conjunto, para la construcci贸n de modelos de clasificaci贸n h铆bridos, con el prop贸sito de la evaluaci贸n y caracterizaci贸n de fallos. Este trabajo propone el desarrollo de modelos h铆bridos usando los m茅todos de ensamble Grading y Vote, combinando las t茅cnicas de redes bayesianas (BayesNet y Naive BayesUpdateable) y 谩rboles de decisi贸n (RandomTree). Se determina la precisi贸n de los m茅todos de ensamble con los distintos algoritmos, mediante experimentos con el mismo set de datos particionado.Sociedad Argentina de Inform谩tica e Investigaci贸n Operativ
    corecore