13,750 research outputs found

    Inducing safer oblique trees without costs

    Get PDF
    Decision tree induction has been widely studied and applied. In safety applications, such as determining whether a chemical process is safe or whether a person has a medical condition, the cost of misclassification in one of the classes is significantly higher than in the other class. Several authors have tackled this problem by developing cost-sensitive decision tree learning algorithms or have suggested ways of changing the distribution of training examples to bias the decision tree learning process so as to take account of costs. A prerequisite for applying such algorithms is the availability of costs of misclassification. Although this may be possible for some applications, obtaining reasonable estimates of costs of misclassification is not easy in the area of safety. This paper presents a new algorithm for applications where the cost of misclassifications cannot be quantified, although the cost of misclassification in one class is known to be significantly higher than in another class. The algorithm utilizes linear discriminant analysis to identify oblique relationships between continuous attributes and then carries out an appropriate modification to ensure that the resulting tree errs on the side of safety. The algorithm is evaluated with respect to one of the best known cost-sensitive algorithms (ICET), a well-known oblique decision tree algorithm (OC1) and an algorithm that utilizes robust linear programming

    Nonparametric Two-Group Classification: Concepts and a SAS-Based Software Package

    Get PDF
    In this paper, we introduce BestClass, a set of SAS macros, available in the mainframe and workstation environment, designed for solving two-group classification problems using a class of recently developed nonparametric classification methods. The criteria used to estimate the classification function are based on either minimizing a function of the absolute deviations from the surface which separates the groups, or directly minimizing a function of the number of misclassified entities in the training sample. The solution techniques used by BestClass to estimate the classification rule utilize the mathematical programming routines of the SAS/OR@ software. Recently, a number of research studies have reported that under certain data conditions this class of classification methods can provide more accurate classification results than existing methods, such as Fisher's linear discriminant function and logistic regression. However, these robust classification methods have not yet been implemented in the major statistical packages, and hence are beyond the reach of those statistical analysts who are unfamiliar with mathematical programming techniques. We use a limited simulation experiment and an example to compare and contrast properties of the methods included in BestClass with existing parametric and nonparametric methods. We believe that BestClass contributes significantly to the field of nonparametric classification analysis, in that it provides the statistical community with convenient access to this recently developed class of methods. BestClass is available from the authors

    A Multi-Gene Genetic Programming Application for Predicting Students Failure at School

    Full text link
    Several efforts to predict student failure rate (SFR) at school accurately still remains a core problem area faced by many in the educational sector. The procedure for forecasting SFR are rigid and most often times require data scaling or conversion into binary form such as is the case of the logistic model which may lead to lose of information and effect size attenuation. Also, the high number of factors, incomplete and unbalanced dataset, and black boxing issues as in Artificial Neural Networks and Fuzzy logic systems exposes the need for more efficient tools. Currently the application of Genetic Programming (GP) holds great promises and has produced tremendous positive results in different sectors. In this regard, this study developed GPSFARPS, a software application to provide a robust solution to the prediction of SFR using an evolutionary algorithm known as multi-gene genetic programming. The approach is validated by feeding a testing data set to the evolved GP models. Result obtained from GPSFARPS simulations show its unique ability to evolve a suitable failure rate expression with a fast convergence at 30 generations from a maximum specified generation of 500. The multi-gene system was also able to minimize the evolved model expression and accurately predict student failure rate using a subset of the original expressionComment: 14 pages, 9 figures, Journal paper. arXiv admin note: text overlap with arXiv:1403.0623 by other author

    Nontraditional Approaches to Statistical Classification: Some Perspectives on Lp-Norm Methods

    Get PDF
    The body of literature on classification method which estimate boundaries between the groups (classes) by optimizing a function of the L_{p}-norm distances of observations in each group from these boundaries, is maturing fast. The number of published research articles on this topic, especially on mathematical programming (MP) formulations and techniques for L_{p}-norm classification, is now sizable. This paper highlights historical developments that have defined the field, and looks ahead at challenges that may shape new research directions in the next decade. In the first part, the paper summarizes basic concepts and ideas, and briefly reviews past research. Throughout, an attempt is made to integrate a number of the most important L_{p}-norm methods proposed to date within a unified framework, emphasizing their conceptual differences and similarities, rather than focusing on mathematical detail. In the second part, the paper discusses several potential directions for future research in this area. The long-term prospects of L_{p}-norm classification (and discriminant) research may well hinge upon whether or not the channels of communication between on the one hand researchers active in L_{p}-norm classification, who tend to have their roots primarily in decision sciences, the management sciences, computer sciences and engineering, and on the other hand practitioners and researchers in the statistical classification community, will be improved. This paper offers potential reasons for the lack of communication between these groups, and suggests ways in which L_{p}-norm research may be strengthened from a statistical viewpoint. The results obtained in L_{p}-norm classification studies are clearly relevant and of importance to all researchers and practitioners active in classification and discrimination analysis. The paper also briefly discusses artificial neural networks, a promising nontraditional method for classification which has recently emerged, and suggests that it may be useful to explore hybrid classification methods that take advantage of the complementary strengths of different methods, e.g., neural network and L_{p}-norm methods

    Pilot interaction with automated airborne decision making systems

    Get PDF
    An investigation was made of interaction between a human pilot and automated on-board decision making systems. Research was initiated on the topic of pilot problem solving in automated and semi-automated flight management systems and attempts were made to develop a model of human decision making in a multi-task situation. A study was made of allocation of responsibility between human and computer, and discussed were various pilot performance parameters with varying degrees of automation. Optimal allocation of responsibility between human and computer was considered and some theoretical results found in the literature were presented. The pilot as a problem solver was discussed. Finally the design of displays, controls, procedures, and computer aids for problem solving tasks in automated and semi-automated systems was considered

    Worst-Case Linear Discriminant Analysis as Scalable Semidefinite Feasibility Problems

    Full text link
    In this paper, we propose an efficient semidefinite programming (SDP) approach to worst-case linear discriminant analysis (WLDA). Compared with the traditional LDA, WLDA considers the dimensionality reduction problem from the worst-case viewpoint, which is in general more robust for classification. However, the original problem of WLDA is non-convex and difficult to optimize. In this paper, we reformulate the optimization problem of WLDA into a sequence of semidefinite feasibility problems. To efficiently solve the semidefinite feasibility problems, we design a new scalable optimization method with quasi-Newton methods and eigen-decomposition being the core components. The proposed method is orders of magnitude faster than standard interior-point based SDP solvers. Experiments on a variety of classification problems demonstrate that our approach achieves better performance than standard LDA. Our method is also much faster and more scalable than standard interior-point SDP solvers based WLDA. The computational complexity for an SDP with mm constraints and matrices of size dd by dd is roughly reduced from O(m3+md3+m2d2)\mathcal{O}(m^3+md^3+m^2d^2) to O(d3)\mathcal{O}(d^3) (m>dm>d in our case).Comment: 14 page
    corecore