8 research outputs found

    Support Vector Machinery for Infinite Ensemble Learning

    Get PDF
    Ensemble learning algorithms such as boosting can achieve better performance by averaging over the predictions of some base hypotheses. Nevertheless, most existing algorithms are limited to combining only a finite number of hypotheses, and the generated ensemble is usually sparse. Thus, it is not clear whether we should construct an ensemble classifier with a larger or even an infinite number of hypotheses. In addition, constructing an infinite ensemble itself is a challenging task. In this paper, we formulate an infinite ensemble learning framework based on the support vector machine (SVM). The framework can output an infinite and nonsparse ensemble through embedding infinitely many hypotheses into an SVM kernel. We use the framework to derive two novel kernels, the stump kernel and the perceptron kernel. The stump kernel embodies infinitely many decision stumps, and the perceptron kernel embodies infinitely many perceptrons. We also show that the Laplacian radial basis function kernel embodies infinitely many decision trees, and can thus be explained through infinite ensemble learning. Experimental results show that SVM with these kernels is superior to boosting with the same base hypothesis set. In addition, SVM with the stump kernel or the perceptron kernel performs similarly to SVM with the Gaussian radial basis function kernel, but enjoys the benefit of faster parameter selection. These properties make the novel kernels favorable choices in practice

    Soft Methodology for Cost-and-error Sensitive Classification

    Full text link
    Many real-world data mining applications need varying cost for different types of classification errors and thus call for cost-sensitive classification algorithms. Existing algorithms for cost-sensitive classification are successful in terms of minimizing the cost, but can result in a high error rate as the trade-off. The high error rate holds back the practical use of those algorithms. In this paper, we propose a novel cost-sensitive classification methodology that takes both the cost and the error rate into account. The methodology, called soft cost-sensitive classification, is established from a multicriteria optimization problem of the cost and the error rate, and can be viewed as regularizing cost-sensitive classification with the error rate. The simple methodology allows immediate improvements of existing cost-sensitive classification algorithms. Experiments on the benchmark and the real-world data sets show that our proposed methodology indeed achieves lower test error rates and similar (sometimes lower) test costs than existing cost-sensitive classification algorithms. We also demonstrate that the methodology can be extended for considering the weighted error rate instead of the original error rate. This extension is useful for tackling unbalanced classification problems.Comment: A shorter version appeared in KDD '1

    GRETL 2019. Proceedings of the International Conference on the Gnu Regression, Econometrics and Time-series Library

    Get PDF
    [English]:This book collects the papers presented at the 6th Gretl Conference, which took place at the Dipartimento di Scienze Politiche, University of Naples, on 13th-14th of June, 2019.The papers (which had been selected through a refereeing process) contain theoretical topics on computational problems as well as empirical applications using the program, and cover a wide range of topics in statistics and applied econometrics, among which Generalized Dynamic Factor Models, Propensity Score Matching, Bayesian Model Averaging, Spatial Models, Cointegration and Boostrap Techniques./[Italiano]:Il volume raccoglie i contributi presentati alla sesta Gretl Conference – che si è svolta presso il Dipartimento di Scienze Politiche, Università di Napoli Federico II, nei giorni 13-14 giugno 2019. Gretl (Gnu Regression, Econometrics and Time-series Library) è un software per le analisi statistiche-econometriche gratuito, open-source, scritto in C e disponibile su diverse piattaforme. Nato come versatile strumento di didattica, nei suoi 10 anni di vita è cresciuto enormemente anche come strumento di ricerca. La Conferenza ha visto la presentazione di studi sia teorici che applicati al fine di condividere i più recenti sviluppi del software e le nuove funzionalità. Il volume include 15 contributi di studiosi su diversi temi, tra i quali Generalized Dynamic Factor Models, il Propensity Score Matching, Modelli Spaziali e Modelli per Dati Ordinali, Cointegrazione e tecniche Boostrap, che sono stati accettati dopo un processo di revisione anonim
    corecore