10,129 research outputs found

    A Training Sample Sequence Planning Method for Pattern Recognition Problems

    Get PDF
    In solving pattern recognition problems, many classification methods, such as the nearest-neighbor (NN) rule, need to determine prototypes from a training set. To improve the performance of these classifiers in finding an efficient set of prototypes, this paper introduces a training sample sequence planning method. In particular, by estimating the relative nearness of the training samples to the decision boundary, the approach proposed here incrementally increases the number of prototypes until the desired classification accuracy has been reached. This approach has been tested with a NN classification method and a neural network training approach. Studies based on both artificial and real data demonstrate that higher classification accuracy can be achieved with fewer prototypes

    Designing labeled graph classifiers by exploiting the R\'enyi entropy of the dissimilarity representation

    Full text link
    Representing patterns as labeled graphs is becoming increasingly common in the broad field of computational intelligence. Accordingly, a wide repertoire of pattern recognition tools, such as classifiers and knowledge discovery procedures, are nowadays available and tested for various datasets of labeled graphs. However, the design of effective learning procedures operating in the space of labeled graphs is still a challenging problem, especially from the computational complexity viewpoint. In this paper, we present a major improvement of a general-purpose classifier for graphs, which is conceived on an interplay between dissimilarity representation, clustering, information-theoretic techniques, and evolutionary optimization algorithms. The improvement focuses on a specific key subroutine devised to compress the input data. We prove different theorems which are fundamental to the setting of the parameters controlling such a compression operation. We demonstrate the effectiveness of the resulting classifier by benchmarking the developed variants on well-known datasets of labeled graphs, considering as distinct performance indicators the classification accuracy, computing time, and parsimony in terms of structural complexity of the synthesized classification models. The results show state-of-the-art standards in terms of test set accuracy and a considerable speed-up for what concerns the computing time.Comment: Revised versio

    Online and Offline Character Recognition Using Alignment to Prototypes

    Full text link
    Nearest neighbor classifiers are simple to implement, yet they can model complex non-parametric distributions, and provide state-of-the-art recognition accuracy in OCR databases. At the same time, they may be too slow for practical character recognition, especially when they rely on similarity measures that require computationally expensive pairwise alignments between characters. This paper proposes an efficient method for computing an approximate similarity score between two characters based on their exact alignment to a small number of prototypes. The proposed method is applied to both online and offline character recognition, where similarity is based on widely used and computationally expensive alignment methods, i.e., Dynamic Time Warping and the Hungarian method respectively. In both cases significant recognition speedup is obtained at the expense of only a minor increase in recognition error.Office of Naval Research (N00014-03-1-0108); National Science Foundation (IIS-0308213, EIA-0202067

    A dissimilarity-based approach for Classification

    Get PDF
    The Nearest Neighbor classifier has shown to be a powerful tool for multiclass classification. In this note we explore both theoretical properties and empirical behavior of a variant of such method, in which the Nearest Neighbor rule is applied after selecting a set of so-called prototypes, whose cardinality is fixed in advance, by minimizing the empirical mis-classification cost. With this we alleviate the two serious drawbacks of the Nearest Neighbor method: high storage requirements and time-consuming queries. The problem is shown to be NP-Hard. Mixed Integer Programming (MIP) programs are formulated, theoretically compared and solved by a standard MIP solver for problem instances of small size. Large sized problem instances are solved by a metaheuristic yielding good classification rules in reasonable time.operations research and management science;
    corecore