1,323 research outputs found

    Using Output Codes for Two-class Classification Problems

    Get PDF
    Error-correcting output codes (ECOCs) have been widely used in many applications for multi-class classification problems. The problem is that ECOCs cannot be ap- plied directly on two-class datasets. The goal of this thesis is to design and evaluate an approach to solve this problem, and then investigate whether the approach can yield better classification models. To be able to use ECOCs, we turn two-class datasets into multi-class datasets first, by using clustering. With the resulting multi-class datasets in hand, we evalu- ate three different encoding methods for ECOCs: exhaustive coding, random coding and a “pre-defined” code that is found using random search. The exhaustive coding method has the highest error-correcting abilities. However, this method is limited due to the exponential growth of bit columns in the codeword matrix precluding it from being used for problems with large numbers of classes. Random coding can be used to cover situations with large numbers of classes in the data. To improve on completely random matrices, “pre-defined” codeword matrices can be generated by using random search that optimizes row separation yielding better error correction than a purely random matrix. To speed up the process of finding good matrices, GPU parallel programming is investigated in this thesis. From the empirical results, we can say that the new algorithm, which applies multi-class ECOCs on two-class data using clustering, does improve the performance for some base learners, when compared to applying them directly to the original two- class datasets

    Tree-structured multiclass probability estimators

    Get PDF
    Nested dichotomies are used as a method of transforming a multiclass classification problem into a series of binary problems. A binary tree structure is constructed over the label space that recursively splits the set of classes into subsets, and a binary classification model learns to discriminate between the two subsets of classes at each node. Several distinct nested dichotomy structures can be built in an ensemble for superior performance. In this thesis, we introduce two new methods for constructing more accurate nested dichotomies. Random-pair selection is a subset selection method that aims to group similar classes together in a non-deterministic fashion to easily enable the construction of accurate ensembles. Multiple subset evaluation takes this, and other subset selection methods, further by evaluating several different splits and choosing the best performing one. Finally, we also discuss the calibration of the probability estimates produced by nested dichotomies. We observe that nested dichotomies systematically produce under-confident predictions, even if the binary classifiers are well calibrated, and especially when the number of classes is high. Furthermore, substantial performance gains can be made when probability calibration methods are also applied to the internal models

    Building ensembles of adaptive nested dichotomies with random-pair selection

    Get PDF
    A system of nested dichotomies is a method of decomposing a multi-class problem into a collection of binary problems. Such a system recursively applies binary splits to divide the set of classes into two subsets, and trains a binary classifier for each split. Although ensembles of nested dichotomies with random structure have been shown to perform well in practice, using a more sophisticated class subset selection method can be used to improve classification accuracy. We investigate an approach to this problem called random-pair selection, and evaluate its effectiveness compared to other published methods of subset selection. We show that our method outperforms other methods in many cases when forming ensembles of nested dichotomies, and is at least on par in all other cases. The software related to this paper is available at https://svn.cms.waikato.ac.nz/svn/weka/trunk/packages/ internal/ensemblesOfNestedDichotomies/

    The Connectivity of Boolean Satisfiability: Dichotomies for Formulas and Circuits

    Full text link
    For Boolean satisfiability problems, the structure of the solution space is characterized by the solution graph, where the vertices are the solutions, and two solutions are connected iff they differ in exactly one variable. In 2006, Gopalan et al. studied connectivity properties of the solution graph and related complexity issues for CSPs, motivated mainly by research on satisfiability algorithms and the satisfiability threshold. They proved dichotomies for the diameter of connected components and for the complexity of the st-connectivity question, and conjectured a trichotomy for the connectivity question. Recently, we were able to establish the trichotomy [arXiv:1312.4524]. Here, we consider connectivity issues of satisfiability problems defined by Boolean circuits and propositional formulas that use gates, resp. connectives, from a fixed set of Boolean functions. We obtain dichotomies for the diameter and the two connectivity problems: on one side, the diameter is linear in the number of variables, and both problems are in P, while on the other side, the diameter can be exponential, and the problems are PSPACE-complete. For partially quantified formulas, we show an analogous dichotomy.Comment: 20 pages, several improvement

    Knowledge hubs and knowledge clusters: Designing a knowledge architecture for development.

    Get PDF
    With globalisation and knowledge-based production, firms may cooperate on a global scale, outsource parts of their administrative or productive units and negate location altogether. The extremely low transaction costs of data, information and knowledge seem to invalidate the theory of agglomeration and the spatial clustering of firms, going back to the classical work by Alfred Weber (1868-1958) and Alfred Marshall (1842-1924), who emphasized the microeconomic benefits of industrial collocation. This paper will argue against this view and show why the growth of knowledge societies will rather increase than decrease the relevance of location by creating knowledge clusters and knowledge hubs. A knowledge cluster is a local innovation system organized around universities, research institutions and firms which successfully drive innovations and create new industries. Knowledge hubs are localities with high internal and external networking and knowledge sharing capabilities. Both form a new knowledge architecture within an epistemic landscape of knowledge creation and dissemination, structured by knowledge gaps and areas of low knowledge intensity. The paper will focus on the internal dynamics of knowledge clusters and knowledge hubs and show why clustering takes place despite globalisation and the rapid growth of ICT. The basic argument that firms and their delivery chains attempt to reduce transport (transaction) costs by choosing the same location is still valid for most industrial economies, but knowledge hubs have different dynamics relating to externalities produced from knowledge sharing and research and development outputs. The paper draws on empirical data derived from ongoing research in the Lee Kong Chian School of Business, Singapore Management University and in the Center for Development Research (ZEF), University of Bonn, supported by the German Aeronautics and Space Agency (DLR).
    corecore