24 research outputs found

    Identify error-sensitive patterns by decision tree

    Full text link
    © Springer International Publishing Switzerland 2015. When errors are inevitable during data classification, finding a particular part of the classification model which may be more susceptible to error than others, when compared to finding an Achilles’ heel of the model in a casual way, may help uncover specific error-sensitive value patterns and lead to additional error reduction measures. As an initial phase of the investigation, this study narrows the scope of problem by focusing on decision trees as a pilot model, develops a simple and effective tagging method to digitize individual nodes of a binary decision tree for node-level analysis, to link and track classification statistics for each node in a transparent way, to facilitate the identification and examination of the potentially “weakest” nodes and error-sensitive value patterns in decision trees, to assist cause analysis and enhancement development. This digitization method is not an attempt to re-develop or transform the existing decision tree model, but rather, a pragmatic node ID formulation that crafts numeric values to reflect the tree structure and decision making paths, to expand post-classification analysis to detailed node-level. Initial experiments have shown successful results in locating potentially high-risk attribute and value patterns; this is an encouraging sign to believe this study worth further exploration

    Feature Selection using Tabu Search with Learning Memory: Learning Tabu Search

    Get PDF
    International audienceFeature selection in classification can be modeled as a com-binatorial optimization problem. One of the main particularities of this problem is the large amount of time that may be needed to evaluate the quality of a subset of features. In this paper, we propose to solve this problem with a tabu search algorithm integrating a learning mechanism. To do so, we adapt to the feature selection problem, a learning tabu search algorithm originally designed for a railway network problem in which the evaluation of a solution is time-consuming. Experiments are conducted and show the benefit of using a learning mechanism to solve hard instances of the literature

    Engine Misfire Detection with Pervasive Mobile Audio

    No full text
    We address the problem of detecting whether an engine is misfiring by using machine learning techniques on transformed audio data collected from a smartphone. We recorded audio samples in an uncontrolled environment and extracted Fourier, Wavelet and Mel-frequency Cepstrum features from normal and abnormal engines. We then implemented Fisher Score and Relief Score based variable ranking to obtain an informative reduced feature set for training and testing classification algorithms. Using this feature set, we were able to obtain a model accuracy of over 99 % using a linear SVM applied to outsample data. This application of machine learning to vehicle subsystem monitoring simplifies traditional engine diagnostics, aiding vehicle owners in the maintenance process and opening up new avenues for pervasive mobile sensing and automotive diagnostics. Keywords: Pervasive sensing, Mobile phones, Sound classification, Audio processing, Fault detection, Machine learnin

    A filter-dominating hybrid sequential forward floating search method for feature subset selection in high-dimensional space

    No full text
    Sequential forward floating search (SFFS) has been well recognized as one of the best feature selection methods. This paper proposes a filter-dominating hybrid SFFS method, aiming at high efficiency and insignificant accuracy sacrifice for high-dimensional feature subset selection. Experiments with this new hybrid approach have been conducted on five feature data sets, with different combinations of classifier and separability index as alternative criteria for evaluating the performance of potential feature subsets. The classifiers under consideration include linear discriminate analysis classifier, support vector machine, and K-nearest neighbors classifier, and the separability indexes include the Davies-Bouldin index and a mutual information based index. Experimental results have demonstrated the advantages and usefulness of the proposed method in high-dimensional feature subset selection. © 2012 Springer-Verlag Berlin Heidelberg

    An optimized artificial neural network model for the prediction of rate of hazardous chemical and healthcare waste generation at the national level

    No full text
    This paper presents a development of general regression neural network (a form of artificial neural network) models for the prediction of annual quantities of hazardous chemical and healthcare waste at the national level. Hazardous waste is being generated from many different sources and therefore it is not possible to conduct accurate predictions of the total amount of hazardous waste using traditional methodologies. Since they represent about 40% of the total hazardous waste in the European Union, chemical and healthcare waste were specifically selected for this research. Broadly available social, economic, industrial and sustainability indicators were used as input variables and the optimal sets were selected using correlation analysis and sensitivity analysis. The obtained values of coefficients of determination for the final models were 0.999 for the prediction of chemical hazardous waste and 0.975 for the prediction of healthcare and biological hazardous waste. The predicting capabilities of the models for both types of waste are high, since there were no predictions with errors greater than 25%. Also, results of this research demonstrate that the human development index can replace gross domestic product and in this context even represent a better indicator of socio-economic conditions at the national level
    corecore