51 research outputs found

    Empirical investigation of decision tree ensembles for monitoring cardiac complications of diabetes

    Full text link
    Cardiac complications of diabetes require continuous monitoring since they may lead to increased morbidity or sudden death of patients. In order to monitor clinical complications of diabetes using wearable sensors, a small set of features have to be identified and effective algorithms for their processing need to be investigated. This article focuses on detecting and monitoring cardiac autonomic neuropathy (CAN) in diabetes patients. The authors investigate and compare the effectiveness of classifiers based on the following decision trees: ADTree, J48, NBTree, RandomTree, REPTree, and SimpleCart. The authors perform a thorough study comparing these decision trees as well as several decision tree ensembles created by applying the following ensemble methods: AdaBoost, Bagging, Dagging, Decorate, Grading, MultiBoost, Stacking, and two multi-level combinations of AdaBoost and MultiBoost with Bagging for the processing of data from diabetes patients for pervasive health monitoring of CAN. This paper concentrates on the particular task of applying decision tree ensembles for the detection and monitoring of cardiac autonomic neuropathy using these features. Experimental outcomes presented here show that the authors' application of the decision tree ensembles for the detection and monitoring of CAN in diabetes patients achieved better performance parameters compared with the results obtained previously in the literature

    Improving classifications for cardiac autonomic neuropathy using multi-level ensemble classifiers and feature selection based on random forest

    Full text link
    This paper is devoted to empirical investigation of novel multi-level ensemble meta classifiers for the detection and monitoring of progression of cardiac autonomic neuropathy, CAN, in diabetes patients. Our experiments relied on an extensive database and concentrated on ensembles of ensembles, or multi-level meta classifiers, for the classification of cardiac autonomic neuropathy progression. First, we carried out a thorough investigation comparing the performance of various base classifiers for several known sets of the most essential features in this database and determined that Random Forest significantly and consistently outperforms all other base classifiers in this new application. Second, we used feature selection and ranking implemented in Random Forest. It was able to identify a new set of features, which has turned out better than all other sets considered for this large and well-known database previously. Random Forest remained the very best classier for the new set of features too. Third, we investigated meta classifiers and new multi-level meta classifiers based on Random Forest, which have improved its performance. The results obtained show that novel multi-level meta classifiers achieved further improvement and obtained new outcomes that are significantly better compared with the outcomes published in the literature previously for cardiac autonomic neuropathy

    A Comparative Study of Machine Learning Classifiers for Credit Card Fraud Detection

    Get PDF
    Now a day’s credit card transactions have been gaining popularity with the growth of e-commerce and shows tremendous opportunity for the future. Therefore, due to surge of credit card transaction, it is a crying need to secure it . Though the vendors and credit card providing authorities are showing dedication to secure the details of these transactions, researchers are searching new scopes or techniques to ensure absolute security which is the demand of time. To detect credit card fraud, along with other technologies, applications of machine learning and computational intelligence can be used and plays a vital role. For detecting credit card anomaly, this paper analyzes and compares some popular classifier algorithms. Moreover, this paper focuses on the performance of the classifiers. UCSD -FICO Data Mining Contest 2009 dataset were used to measure the performance of the classifiers. The final results of the experiment suggest that (1) meta and tree classifiers perform better than other types of classifiers, (2) though classification accuracy rate is high but fraud detection success rate is low. Finally, fraud detection rate  should be taken into consideration to assess the performance of the classifiers in a credit card fraud detection system

    Bagged Randomized Conceptual Machine Learning Method

    Get PDF
    Formal concept analysis (FCA) is a scientific approach aiming to investigate, analyze and represent the conceptual knowledge deduced from the data in conceptual structures (lattice). Recently many researchers are counting on the potentials of FCA to resolve or contribute addressing machine learning problems. However, some of these heuristics are still far from achieving this goal. In another context, ensemble-learning methods are deemed effective in addressing the classification problem, in addition, introducing randomness to ensemble learning found effective in certain scenarios. We exploit the potentials of FCA and the notion of randomness in ensemble learning, and propose a new machine learning method based on random conceptual decomposition. We also propose a novel approach for rule optimization. We develop an effective learning algorithm that is capable of handling some of learning problem aspects, with results that are comparable to other ensemble learning algorithms

    Landslide susceptibility mapping using machine learning: A literature survey

    Get PDF
    Landslide is a devastating natural disaster, causing loss of life and property. It is likely to occur more frequently due to increasing urbanization, deforestation, and climate change. Landslide susceptibility mapping is vital to safeguard life and property. This article surveys machine learning (ML) models used for landslide susceptibility mapping to understand the current trend by analyzing published articles based on the ML models, landslide causative factors (LCFs), study location, datasets, evaluation methods, and model performance. Existing literature considered in this comprehensive survey is systematically selected using the ROSES protocol. The trend indicates a growing interest in the field. The choice of LCFs depends on data availability and case study location; China is the most studied location, and area under the receiver operating characteristic curve (AUC) is considered the best evaluation metric. Many ML models have achieved an AUC value > 0.90, indicating high reliability of the susceptibility map generated. This paper also discusses the recently developed hybrid, ensemble, and deep learning (DL) models in landslide susceptibility mapping. Generally, hybrid, ensemble, and DL models outperform conventional ML models. Based on the survey, a few recommendations and future works which may help the new researchers in the field are also presented.Web of Science1413art. no. 302

    Performance evaluation of multi-tier ensemble classifiers for phishing websites

    Get PDF
    This article is devoted to large multi-tier ensemble classifiers generated as ensembles of ensembles and applied to phishing websites. Our new ensemble construction is a special case of the general and productive multi-tier approach well known in information security. Many efficient multi-tier classifiers have been considered in the literature. Our new contribution is in generating new large systems as ensembles of ensembles by linking a top-tier ensemble to another middletier ensemble instead of a base classifier so that the top~ tier ensemble can generate the whole system. This automatic generation capability includes many large ensemble classifiers in two tiers simultaneously and automatically combines them into one hierarchical unified system so that one ensemble is an integral part of another one. This new construction makes it easy to set up and run such large systems. The present article concentrates on the investigation of performance of these new multi~tier ensembles for the example of detection of phishing websites. We carried out systematic experiments evaluating several essential ensemble techniques as well as more recent approaches and studying their performance as parts of multi~level ensembles with three tiers. The results presented here demonstrate that new three-tier ensemble classifiers performed better than the base classifiers and standard ensembles included in the system. This example of application to the classification of phishing websites shows that the new method of combining diverse ensemble techniques into a unified hierarchical three-tier ensemble can be applied to increase the performance of classifiers in situations where data can be processed on a large computer

    Modelo de ensembles multiníveis para classificadores

    Get PDF
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Ciência da Computação, Florianópolis, 2018.Um comitê de máquinas, ou ensemble, é uma combinação de diversos classificadores por meio de uma estratégia pré-estabelecida. Seu uso tem sido comum na literatura para garantir um aumento de generali- zação nos problemas de classificação. Entretanto, é fundamental o uso de uma boa estratégia de diversidade para assegurar a qualidade dos resultados. Para tanto, a presente pesquisa propõe a construção de um modelo multinível, onde a decisão final é realizada por meio da com- binação das saídas de ensembles. A esse modelo refere-se aqui como comitê de ensembles. O tema da presente dissertação buscou avançar o estado da arte ao propor uma estratégia para a realização do comitê de ensembles. Propôs-se ainda a combinação de ensembles que tenham em sua formação classificadores com similaridades entre si. Dessa forma, cada ensemble do comitê especializa-se em determinado paradigma de aprendizagem (família). Busca-se com isso um aumento ainda maior da diversidade. A aplicação do modelo proposto (nível 2) ocorreu em bases de dados públicas com diferentes características e sua avaliação foi mensurada por meio da acurácia, área sob a curva ROC (AUC) e tempo de execução. Os resultados mostraram semelhanças de desem- penho dos níveis 0 e 1. O modelo proposto conseguiu um crescimento médio de até 14% e 10% em relação à, respectivamente, acurácia e área sob a curva ROC dos níveis 0 e 1. A família que apresentou os me- lhores resultados foi a Bayesiana. Os resultados demonstraram que o desempenho da família bayesiana foi 949 vezes mais rápido no tempo de execução que o comitê de ensembles com os resultados de acurácia e área sob a curva ROC mais estáveis e levemente superior às demais famílias (nível 1). Por fim, a análise estatística, com um nível de sig- nificância de 5% (a = 0, 05), comprovou o bom desempenho do comitê de ensembles em quase todas as comparações em relação aos demais níveis tanto em termos de acurácia quanto de área sob a curva ROC, embora com um alto tempo de execução.Abstract : A committee machine, or ensemble, is a combination of several classifi- ers by means of a pre-established strategy. Its use has been common in the literature to ensure an increase the generalization in classification problems. However, a good diversity strategy is essential to ensure the quality of results. Therefore, the present research proposes the cons- truction of a multi-level model, where the final decision is made through the combination of ensembles outputs. This model is referred to here as an committee ensembles. The theme of this dissertation sought to advance the state of the art by proposing a strategy for the accomplish- ment of the committee ensembles. It s also proposed the combination of ensembles that have in their formation classifiers with similarities among themselves. Therefore, each committee ensemble specializes in a particular learning paradigm (family). An increase in diversity is thus sought. The validation of the proposed method (level 2) use public da- tabases with different characteristics and its evaluation was measured by means of accuracy, area under the ROC curve (AUC) and processing time. The results showed similarities of performance of levels 0 and 1. The proposed model achieved an average growth of up to 14% and 10% in relation to, respectively, accuracy and area under the ROC curve of levels 0 and 1. The family that presented the best results was Bayesian. The results showed that the performance of the Bayesian family was 949 times faster in the execution time than committee ensembles with the results of accuracy and area under the ROC curve more stable and slightly superior to the other families (level 1). Our results are statisti- cally analyzed with a significance level of 5% (a = 0.05), which proved the increased good performance of the ensembles committee in almost all comparisons in relation to other levels both in terms of accuracy and area under the ROC curve, although with a high execution time

    Analyzing and enhancing music mood classification : an empirical study

    Get PDF
    In the computer age, managing large data repositories is one of the common challenges, especially for music data. Categorizing, manipulating, and refining music tracks are among the most complex tasks in Music Information Retrieval (MIR). Classification is one of the core functions in MIR, which classifies music data from different perspectives, from genre to instrument to mood. The primary focus of this study is on music mood classification. Mood is a subjective phenomenon in MIR, which involves different considerations, such as psychology, musicology, culture, and social behavior. One of the most significant prerequisitions in music mood classification is answering these questions: what combination of acoustic features helps us to improve the accuracy of classification in this area? What type of classifiers is appropriate in music mood classification? How can we increase the accuracy of music mood classification using several classifiers? To find the answers to these questions, we empirically explored different acoustic features and classification schemes on the mood classification in music data. Also, we found the two approaches to use several classifiers simultaneously to classify music tracks using mood labels automatically. These methods contain two voting procedures; namely, Plurality Voting and Borda Count. These approaches are categorized into ensemble techniques, which combine a group of classifiers to reach better accuracy. The proposed ensemble methods are implemented and verified through empirical experiments. The results of the experiments have shown that these proposed approaches could improve the accuracy of music mood classification

    Do we need hundreds of classifiers to solve real world classification problems?

    Get PDF
    We evaluate 179 classifiers arising from 17 families (discriminant analysis, Bayesian, neural networks, support vector machines, decision trees, rule-based classifiers, boosting, bagging, stacking, random forests and other ensembles, generalized linear models, nearest-neighbors, partial least squares and principal component regression, logistic and multinomial regression, multiple adaptive regression splines and other methods), implemented in Weka, R (with and without the caret package), C and Matlab, including all the relevant classifiers available today. We use 121 data sets, which represent the whole UCI data base (excluding the large- scale problems) and other own real problems, in order to achieve significant conclusions about the classifier behavior, not dependent on the data set collection. The classifiers most likely to be the bests are the random forest (RF) versions, the best of which (implemented in R and accessed via caret) achieves 94.1% of the maximum accuracy overcoming 90% in the 84.3% of the data sets. However, the difference is not statistically significant with the second best, the SVM with Gaussian kernel implemented in C using LibSVM, which achieves 92.3% of the maximum accuracy. A few models are clearly better than the remaining ones: random forest, SVM with Gaussian and polynomial kernels, extreme learning machine with Gaussian kernel, C5.0 and avNNet (a committee of multi-layer perceptrons implemented in R with the caret package). The random forest is clearly the best family of classifiers (3 out of 5 bests classifiers are RF), followed by SVM (4 classifiers in the top-10), neural networks and boosting ensembles (5 and 3 members in the top-20, respectively)We would like to acknowledge support from the Spanish Ministry of Science and Innovation (MICINN), which supported this work under projects TIN2011-22935 and TIN2012-32262S

    A window-based time series feature extraction method

    Get PDF
    This study proposes a robust similarity score-based time series feature extraction method that is termed as Window-based Time series Feature ExtraCtion (WTC). Specifically, WTC generates domain-interpretable results and involves significantly low computational complexity thereby rendering itself useful for densely sampled and populated time series datasets. In this study, WTC is applied to a proprietary action potential (AP) time series dataset on human cardiomyocytes and three precordial leads from a publicly available electrocardiogram (ECG) dataset. This is followed by comparing WTC in terms of predictive accuracy and computational complexity with shapelet transform and fast shapelet transform (which constitutes an accelerated variant of the shapelet transform). The results indicate that WTC achieves a slightly higher classification performance with significantly lower execution time when compared to its shapelet-based alternatives. With respect to its interpretable features, WTC has a potential to enable medical experts to explore definitive common trends in novel datasets. © 2017 Elsevier Lt
    • …
    corecore