11,951 research outputs found

    On The Stability of Interpretable Models

    Full text link
    Interpretable classification models are built with the purpose of providing a comprehensible description of the decision logic to an external oversight agent. When considered in isolation, a decision tree, a set of classification rules, or a linear model, are widely recognized as human-interpretable. However, such models are generated as part of a larger analytical process. Bias in data collection and preparation, or in model's construction may severely affect the accountability of the design process. We conduct an experimental study of the stability of interpretable models with respect to feature selection, instance selection, and model selection. Our conclusions should raise awareness and attention of the scientific community on the need of a stability impact assessment of interpretable models

    Feature Selection Techniques for Wood Density Prediction in Forest Dataset

    Get PDF
    Feature selection becomes important, especially in data sets with a large number of variables and features. It will eliminate unimportant variables and increase classification accuracy and performance. Because of the increasing growth of data in numerous industries, some data is high-dimensional and contains essential and complex hidden linkages, posing new problems to feature selection: i) How to extract the underlying available relationships from the data, and ii) How to apply the learnt relations to improve feature selection? To address these issues, we use the six feature selection approach as a pre-processing step in the analysis to avoid over fitting and potential model under performance. Which can learn and apply the underlying sample relations and feature relations for feature selection. This study compared six feature selection approaches (Pearson Coefficient, Correlation matrix, Variable Importance, Forward selection, and Backward Elimination) for determining the decomposition level of forest trees. Our trials clearly provide a comparative evaluation of the Wrapper approach from several angles. Furthermore, we compare the dataset result with critical attributes to obtain the highest percentage accuracy. The experimental results show that the wrapper technique outperforms all other methods in all experiment groups

    Minimum Redundancy Maximum Relevance(mRMR) Based Feature Selection Technique for Pattern Classification System

    Get PDF
    Feature Selection is an important hurdle in classification systems. We study how to select good features by making the covariance matrix of each sample data set and extracting the features from it .Then, we try to find out the length of each sample by finding the error rate .We perform experimental comparison of our algorithm and other methods using two data sets(binary and functional) and three different classifiers(support vector machine, linear discriminant analysis and naïve Bayes).The results show that the MRMR features are less correlated with each other as compared to other methods and hence improves the classification accuracy

    Effectiveness Of Alternative Heuristic Algorithms For Identifying Indicative Minimum Requirements For Conservation Reserves

    Get PDF
    We compared the results of 30 heuristic reserve selection algorithms on the same large data set. Twelve of the algorithms were for presence-absence representation goals, designed to find a set of sites to represent all the land types in the study region at least once. Eighteen algorithms were intended to represent a minimum percentage of the total area of each land type. We varied the rules of the algorithms systematically to find the influence of individual rules or sequences of rules on efficiency of representation. Rankings of the algorithms according to relative numbers or areas of selected sites needed to achieve a specified representation target varied between the full data set and a subset and so appear to be data-dependent. We also ran optimizing algorithms to indicate the degree of suboptimality of the heuristics. For the presence-absence problems, the optimizing algorithms had the advantage of guaranteeing an optimal solution but had much longer running times than the heuristics. They showed that the solutions from good heuristics were 5-10% larger than optimal. The optimizing algorithms failed to solve the proportional area problems, although heuristics solved them quickly. Both heuristics and optimizing algorithms have important roles to play in conservation planning. The choice of method will depend on the size of data sets, the representation goal, the required time for analysis, and the importance of a guaranteed optimal solution

    Role of Feature Selection in Building High Performance Heart Disease Prediction Systems

    Get PDF
    In the last few years, there has been a tremendous rise in the number of deaths due to heart diseases all over the world. In low- and middle-income countries, heart diseases are usually not detected in early stages which makes the treatment difficult. Early diagnosis can help significantly in preventing these diseases. Machine learning-based prediction systems offer a cost-effective and efficient way to diagnose these diseases in an early stage. Research is being carried out to increase the performance of these systems. Redundant and irrelevant features in the medical dataset deteriorate the performance of prediction systems. In this paper, an exhaustive study has been done to improve the performance of the prediction systems by applying 4 feature selection algorithms. Experimental results prove that the use of feature selection algorithms provides a substantial increase in accuracy and speed of execution of the prediction system. The prediction system proposed in this study shall prove to be a great help to prevent heart diseases by enabling the medical practitioners to detect heart diseases in early stages
    corecore