Search CORE

95,430 research outputs found

Cost-sensitive feature selection for support vector machines

Author: Benítez Peña Sandra
Blanquero Bravo Rafael
Carrizosa Priego Emilio José
Ramírez Cobo Josefa
Publication venue: 'Elsevier BV'
Publication date: 01/03/2018
Field of study

Feature Selection (FS) is a crucial procedure in Data Science tasks such as Classification, since it identifies the relevant variables, making thus the classification procedures more interpretable and more effective by reducing noise and data overfit. The relevance of features in a classification procedure is linked to the fact that misclassifications costs are frequently asymmetric, since false positive and false negative cases may have very different consequences. However, off-the-shelf FS procedures seldom take into account such cost-sensitivity of errors. In this paper we propose a mathematical-optimization-based FS procedure embedded in one of the most popular classification procedures, namely, Support Vector Machines (SVM), accommodating asymmetric misclassification costs. The key idea is to replace the traditional margin maximization by minimizing the number of features selected, but imposing upper bounds on the false positive and negative rates. The problem is written as an integer linear problem plus a quadratic convex problem for SVM with both linear and radial kernels. The reported numerical experience demonstrates the usefulness of the proposed FS procedure. Indeed, our results on benchmark data sets show that a substantial decrease of the number of features is obtained, whilst the desired trade-off between false positive and false negative rates is achieved

idUS. Depósito de Investigación Universidad de Sevilla

Cost-sensitive probabilistic predictions for support vector machines

Author: Benítez-Peña Sandra
Blanquero Rafael
Carrizosa Emilio
Ramírez-Cobo Pepa
Publication venue
Publication date: 09/10/2023
Field of study

Support vector machines (SVMs) are widely used and constitute one of the best examined and used machine learning models for two-class classification. Classification in SVM is based on a score procedure, yielding a deterministic classification rule, which can be transformed into a probabilistic rule (as implemented in off-the-shelf SVM libraries), but is not probabilistic in nature. On the other hand, the tuning of the regularization parameters in SVM is known to imply a high computational effort and generates pieces of information that are not fully exploited, not being used to build a probabilistic classification rule. In this paper we propose a novel approach to generate probabilistic outputs for the SVM. The new method has the following three properties. First, it is designed to be cost-sensitive, and thus the different importance of sensitivity (or true positive rate, TPR) and specificity (true negative rate, TNR) is readily accommodated in the model. As a result, the model can deal with imbalanced datasets which are common in operational business problems as churn prediction or credit scoring. Second, the SVM is embedded in an ensemble method to improve its performance, making use of the valuable information generated in the parameters tuning process. Finally, the probabilities estimation is done via bootstrap estimates, avoiding the use of parametric models as competing approaches. Numerical tests on a wide range of datasets show the advantages of our approach over benchmark procedures.Comment: European Journal of Operational Research (2023

arXiv.org e-Print Archive

Supervised Machine Learning Under Test-Time Resource Constraints: A Trade-off Between Accuracy and Cost

Author: Xu Zhixiang
Publication venue: Washington University Open Scholarship
Publication date: 01/09/2014
Field of study

The past decade has witnessed how the field of machine learning has established itself as a necessary component in several multi-billion-dollar industries. The real-world industrial setting introduces an interesting new problem to machine learning research: computational resources must be budgeted and cost must be strictly accounted for during test-time. A typical problem is that if an application consumes x additional units of cost during test-time, but will improve accuracy by y percent, should the additional x resources be allocated? The core of this problem is a trade-off between accuracy and cost. In this thesis, we examine components of test-time cost, and develop different strategies to manage this trade-off. We first investigate test-time cost and discover that it typically consists of two parts: feature extraction cost and classifier evaluation cost. The former reflects the computational efforts of transforming data instances to feature vectors, and could be highly variable when features are heterogeneous. The latter reflects the effort of evaluating a classifier, which could be substantial, in particular nonparametric algorithms. We then propose three strategies to explicitly trade-off accuracy and the two components of test-time cost during classifier training. To budget the feature extraction cost, we first introduce two algorithms: GreedyMiser and Anytime Representation Learning (AFR). GreedyMiser employs a strategy that incorporates the extraction cost information during classifier training to explicitly minimize the test-time cost. AFR extends GreedyMiser to learn a cost-sensitive feature representation rather than a classifier, and turns traditional Support Vector Machines (SVM) into test- time cost-sensitive anytime classifiers. GreedyMiser and AFR are evaluated on two real-world data sets from two different application domains, and both achieve record performance. We then introduce Cost Sensitive Tree of Classifiers (CSTC) and Cost Sensitive Cascade of Classifiers (CSCC), which share a common strategy that trades-off the accuracy and the amortized test-time cost. CSTC introduces a tree structure and directs test inputs along different tree traversal paths, each is optimized for a specific sub-partition of the input space, extracting different, specialized subsets of features. CSCC extends CSTC and builds a linear cascade, instead of a tree, to cope with class-imbalanced binary classification tasks. Since both CSTC and CSCC extract different features for different inputs, the amortized test-time cost is greatly reduced while maintaining high accuracy. Both approaches out-perform the current state-of-the-art on real-world data sets. To trade-off accuracy and high classifier evaluation cost of nonparametric classifiers, we propose a model compression strategy and develop Compressed Vector Machines (CVM). CVM focuses on the nonparametric kernel Support Vector Machines (SVM), whose test-time evaluation cost is typically substantial when learned from large training sets. CVM is a post-processing algorithm which compresses the learned SVM model by reducing and optimizing support vectors. On several benchmark data sets, CVM maintains high test accuracy while reducing the test-time evaluation cost by several orders of magnitude

Washington University St. Louis: Open Scholarship

Randomized Sketches of Convex Programs with Sharp Guarantees

Author: Pilanci Mert
Wainwright Martin J.
Publication venue
Publication date: 28/04/2014
Field of study

Random projection (RP) is a classical technique for reducing storage and computational costs. We analyze RP-based approximations of convex programs, in which the original optimization problem is approximated by the solution of a lower-dimensional problem. Such dimensionality reduction is essential in computation-limited settings, since the complexity of general convex programming can be quite high (e.g., cubic for quadratic programs, and substantially higher for semidefinite programs). In addition to computational savings, random projection is also useful for reducing memory usage, and has useful properties for privacy-sensitive optimization. We prove that the approximation ratio of this procedure can be bounded in terms of the geometry of constraint set. For a broad class of random projections, including those based on various sub-Gaussian distributions as well as randomized Hadamard and Fourier transforms, the data matrix defining the cost function can be projected down to the statistical dimension of the tangent cone of the constraints at the original solution, which is often substantially smaller than the original dimension. We illustrate consequences of our theory for various cases, including unconstrained and

\ell_1

-constrained least squares, support vector machines, low-rank matrix estimation, and discuss implications on privacy-sensitive optimization and some connections with de-noising and compressed sensing

arXiv.org e-Print Archive

Crossref

Constrained Classification and Policy Learning

Author: Kitagawa Toru
Sakaguchi Shosei
Tetenov Aleksey
Publication venue
Publication date: 24/06/2021
Field of study

Modern machine learning approaches to classification, including AdaBoost, support vector machines, and deep neural networks, utilize surrogate loss techniques to circumvent the computational complexity of minimizing empirical classification risk. These techniques are also useful for causal policy learning problems, since estimation of individualized treatment rules can be cast as a weighted (cost-sensitive) classification problem. Consistency of the surrogate loss approaches studied in Zhang (2004) and Bartlett et al. (2006) crucially relies on the assumption of correct specification, meaning that the specified set of classifiers is rich enough to contain a first-best classifier. This assumption is, however, less credible when the set of classifiers is constrained by interpretability or fairness, leaving the applicability of surrogate loss based algorithms unknown in such second-best scenarios. This paper studies consistency of surrogate loss procedures under a constrained set of classifiers without assuming correct specification. We show that in the setting where the constraint restricts the classifier's prediction set only, hinge losses (i.e.,

\ell_1

-support vector machines) are the only surrogate losses that preserve consistency in second-best scenarios. If the constraint additionally restricts the functional form of the classifier, consistency of a surrogate loss approach is not guaranteed even with hinge loss. We therefore characterize conditions for the constrained set of classifiers that can guarantee consistency of hinge risk minimizing classifiers. Exploiting our theoretical results, we develop robust and computationally attractive hinge loss based procedures for a monotone classification problem

arXiv.org e-Print Archive

Recommended from our members

AN APPLICATION OF MACHINE LEARNING TO BAD PAGE PREDICTION IN MULTILEVEL FLASH

Author: Sree Prem Navya
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Flash memory is prone to failures as the number of program-erase cycles increase. These physical failures in the flash result in an increase in the bit error rates. Once the bit error count exceeds a certain threshold the error correction engines are incapable of correcting the error without adversely impacting the system performance, or may even fail entirely. This leads to an interest in learning the behavior of error count increase and page failure in the flash memory and obtaining an ability to make failure predictions. We tackle this problem using a machine learning approach. However, standard machine learning techniques may not work well with the particular data in hand. This is because the error counts are collected from actual flash memory and one can expect to see more pages with a lower error count than pages with a higher error count. This feature of the dataset leads to a formulation of our goal in terms of a classification problem with significant class imbalance in the underlying data. We have investigated various classification methods that address such class imbalance. Among those considered are cost-sensitive boosting techniques, bagging procedures, bagging ensemble support vector machines (SVMs) and cost-sensitive neural networks

eScholarship - University of California