2,445 research outputs found

    CNN training with graph-based sample preselection: application to handwritten character recognition

    Full text link
    In this paper, we present a study on sample preselection in large training data set for CNN-based classification. To do so, we structure the input data set in a network representation, namely the Relative Neighbourhood Graph, and then extract some vectors of interest. The proposed preselection method is evaluated in the context of handwritten character recognition, by using two data sets, up to several hundred thousands of images. It is shown that the graph-based preselection can reduce the training data set without degrading the recognition accuracy of a non pretrained CNN shallow model.Comment: Paper of 10 pages. Minor spelling corrections brought regarding the v2. Accepted as an oral paper in the 13th IAPR Internationale Workshop on Document Analysis Systems (DAS 2018

    Representing Style by Feature Space Archetypes: Description and Emulation of Spatial Styles in an Architectural Context

    Get PDF

    A survey of outlier detection methodologies

    Get PDF
    Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review

    Prototype Classifiers and the Big Fish. The Case of Prototype (Instance) Selection

    Get PDF

    One-class classifiers based on entropic spanning graphs

    Get PDF
    One-class classifiers offer valuable tools to assess the presence of outliers in data. In this paper, we propose a design methodology for one-class classifiers based on entropic spanning graphs. Our approach takes into account the possibility to process also non-numeric data by means of an embedding procedure. The spanning graph is learned on the embedded input data and the outcoming partition of vertices defines the classifier. The final partition is derived by exploiting a criterion based on mutual information minimization. Here, we compute the mutual information by using a convenient formulation provided in terms of the α\alpha-Jensen difference. Once training is completed, in order to associate a confidence level with the classifier decision, a graph-based fuzzy model is constructed. The fuzzification process is based only on topological information of the vertices of the entropic spanning graph. As such, the proposed one-class classifier is suitable also for data characterized by complex geometric structures. We provide experiments on well-known benchmarks containing both feature vectors and labeled graphs. In addition, we apply the method to the protein solubility recognition problem by considering several representations for the input samples. Experimental results demonstrate the effectiveness and versatility of the proposed method with respect to other state-of-the-art approaches.Comment: Extended and revised version of the paper "One-Class Classification Through Mutual Information Minimization" presented at the 2016 IEEE IJCNN, Vancouver, Canad

    An efficient randomised sphere cover classifier

    Get PDF
    This paper describes an efficient randomised sphere cover classifier(aRSC), that reduces the training data set size without loss of accuracy when compared to nearest neighbour classifiers. The motivation for developing this algorithm is the desire to have a non-deterministic, fast, instance-based classifier that performs well in isolation but is also ideal for use with ensembles. We use 24 benchmark datasets from UCI repository and six gene expression datasets for evaluation. The first set of experiments demonstrate the basic benefits of sphere covering. The second set of experiments demonstrate that when we set the a parameter through cross validation, the resulting aRSC algorithm outperforms several well known classifiers when compared using the Friedman rank sum test. Thirdly, we test the usefulness of aRSC when used with three feature filtering filters on six gene expression datasets. Finally, we highlight the benefits of pruning with a bias/variance decompositio
    • …
    corecore