5 research outputs found

    Large dataset complexity reduction for classification: An optimization perspective

    Get PDF
    Doctor of PhilosophyComputational complexity in data mining is attributed to algorithms but lies hugely with the data. Different algorithms may exist to solve the same problem, but the simplest is not always the best. At the same time, data of astronomical proportions is rather common, boosted by automation, and the fuller the data, the better resolution of the concept it projects. Paradoxically, it is the computing power that is lacking. Perhaps a fast algorithm can be run on the data, but not the optimal. Even then any modeling is much constrained, involving serial application of many algorithms. The only other way to relieve the computational load is via making the data lighter. Any representative subset has to preserve the data essence suiting, ideally, any algorithm. The reduction should minimize the error of approximation, while trading precision for performance. Data mining is a wide field. We concentrate on classification. In the literature review we present a variety of methods, emphasizing the effort of past decade. Two major objects of reduction are instances and attributes. The data can be also recast into a more economical format. We address sampling, noise reduction, class domain binarization, feature ranking, feature subset selection, feature extraction, and also discretization of continuous features. Achievements are tremendous, but so are possibilities. We improve an existing technique of data cleansing and suggest a way of data condensing as the extension. We also touch on noise reduction. Instance similarity, excepting the class mix, prompts a technique of feature selection. Additionally, we consider multivariate discretization, enabling a compact data representation without the size change. We compare proposed methods with alternative techniques which we introduce new, implement or use available

    A unified view of rank-based decision combination

    No full text
    This study presents a theoretical investigation of the rank-based multiple classifier decision problem for closed-set pattern classification. The case with classifier raw outputs in the form of candidate class rankings is considered and formulated as a discrete optimization problem with the objective function being the total probability of correct decision. The problem has a global optimum solution but is of prohibitive dimensionality. We present a partitioning formalism under which this dimensionality can be reduced by incorporating our prior knowledge about the problem domain and the structure of the training data. The formalism can effectively explain a number of rank-based combination approaches successfully used in the literature one of which is discussed

    A Unified View of Rank-Based Decision Combination

    No full text
    This study presents a theoretical investigation of the rank-based multiple classifier decision problem for closedset pattern classification. The case with classifier raw outputs in the form of candidate class rankings is considered and formulated as a discrete optimization problem with the objective function being the total probability of correct decision. The problem has a global optimum solution but is of prohibitive dimensionality. We present a partitioning formalism under which this dimensionality can be reduced by incorporating our prior knowledge about the problem domain and the structure of the training data. The formalism can effectively explain a number of rank-based combination approaches successfully used in the literature one of which is discussed. 1
    corecore