23 research outputs found

    Reducing Spatial Data Complexity for Classification Models

    Get PDF
    Intelligent data analytics gradually becomes a day-to-day reality of today's businesses. However, despite rapidly increasing storage and computational power current state-of-the-art predictive models still can not handle massive and noisy corporate data warehouses. What is more adaptive and real-time operational environment requires multiple models to be frequently retrained which fiirther hinders their use. Various data reduction techniques ranging from data sampling up to density retention models attempt to address this challenge by capturing a summarised data structure, yet they either do not account for labelled data or degrade the classification performance of the model trained on the condensed dataset. Our response is a proposition of a new general framework for reducing the complexity of labelled data by means of controlled spatial redistribution of class densities in the input space. On the example of Parzen Labelled Data Compressor (PLDC) we demonstrate a simulatory data condensation process directly inspired by the electrostatic field interaction where the data are moved and merged following the attracting and repelling interactions with the other labelled data. The process is controlled by the class density function built on the original data that acts as a class-sensitive potential field ensuring preservation of the original class density distributions, yet allowing data to rearrange and merge joining together their soft class partitions. As a result we achieved a model that reduces the labelled datasets much further than any competitive approaches yet with the maximum retention of the original class densities and hence the classification performance. PLDC leaves the reduced dataset with the soft accumulative class weights allowing for efficient online updates and as shown in a series of experiments if coupled with Parzen Density Classifier (PDC) significantly outperforms competitive data condensation methods in terms of classification performance at the comparable compression levels

    A variable metric forward--backward method with extrapolation

    Full text link
    Forward-backward methods are a very useful tool for the minimization of a functional given by the sum of a differentiable term and a nondifferentiable one and their investigation has experienced several efforts from many researchers in the last decade. In this paper we focus on the convex case and, inspired by recent approaches for accelerating first-order iterative schemes, we develop a scaled inertial forward-backward algorithm which is based on a metric changing at each iteration and on a suitable extrapolation step. Unlike standard forward-backward methods with extrapolation, our scheme is able to handle functions whose domain is not the entire space. Both {an O(1/k2){\mathcal O}(1/k^2) convergence rate estimate on the objective function values and the convergence of the sequence of the iterates} are proved. Numerical experiments on several {test problems arising from image processing, compressed sensing and statistical inference} show the {effectiveness} of the proposed method in comparison to well performing {state-of-the-art} algorithms

    Multiple Resolution Nonparametric Classifiers

    Get PDF
    Bayesian discriminant functions provide optimal classification decision boundaries in the sense of minimizing the average error rate. An operational assumption is that the probability density functions for the individual classes are either known a priori or can be estimated from the data through the use of estimating techniques. The use of Parzen- windows is a popular and theoretically sound choice for such estimation. However, while the minimal average error rate can be achieved when combining Bayes Rule with Parzen-window density estimation, the latter is computationally costly to the point where it may lead to unacceptable run-time performance. We present the Multiple Resolution Nonparametric (MRN) classifier as a new approach for significantly reducing the computational cost of using Parzen-window density estimates without sacrificing the virtues of Bayesian discriminant functions. Performance is evaluated against a standard Parzen-window classifier on several common datasets
    corecore