95,430 research outputs found
Cost-sensitive feature selection for support vector machines
Feature Selection (FS) is a crucial procedure in Data Science tasks such as
Classification, since it identifies the relevant variables, making thus the classification procedures more interpretable and more effective by reducing noise and data overfit. The relevance of features in a classification procedure is linked to the fact that misclassifications costs are frequently asymmetric, since false positive and false negative cases may have very different consequences. However, off-the-shelf FS procedures seldom take into account such cost-sensitivity of errors. In this paper we propose a mathematical-optimization-based FS procedure embedded in one of the most popular classification procedures, namely, Support Vector Machines (SVM), accommodating asymmetric misclassification costs. The key idea is to replace the traditional margin maximization by minimizing the number of features selected, but imposing upper bounds on the false positive and negative rates. The problem is written as an integer linear problem plus a quadratic convex problem for SVM with both linear and radial kernels. The reported numerical experience demonstrates the usefulness of the proposed FS procedure. Indeed, our results on benchmark data sets show that a substantial decrease of the number of features is obtained, whilst the desired trade-off between false positive and false negative rates is achieved
Cost-sensitive probabilistic predictions for support vector machines
Support vector machines (SVMs) are widely used and constitute one of the best
examined and used machine learning models for two-class classification.
Classification in SVM is based on a score procedure, yielding a deterministic
classification rule, which can be transformed into a probabilistic rule (as
implemented in off-the-shelf SVM libraries), but is not probabilistic in
nature. On the other hand, the tuning of the regularization parameters in SVM
is known to imply a high computational effort and generates pieces of
information that are not fully exploited, not being used to build a
probabilistic classification rule. In this paper we propose a novel approach to
generate probabilistic outputs for the SVM. The new method has the following
three properties. First, it is designed to be cost-sensitive, and thus the
different importance of sensitivity (or true positive rate, TPR) and
specificity (true negative rate, TNR) is readily accommodated in the model. As
a result, the model can deal with imbalanced datasets which are common in
operational business problems as churn prediction or credit scoring. Second,
the SVM is embedded in an ensemble method to improve its performance, making
use of the valuable information generated in the parameters tuning process.
Finally, the probabilities estimation is done via bootstrap estimates, avoiding
the use of parametric models as competing approaches. Numerical tests on a wide
range of datasets show the advantages of our approach over benchmark
procedures.Comment: European Journal of Operational Research (2023
Supervised Machine Learning Under Test-Time Resource Constraints: A Trade-off Between Accuracy and Cost
The past decade has witnessed how the field of machine learning has established itself as a necessary component in several multi-billion-dollar industries. The real-world industrial setting introduces an interesting new problem to machine learning research: computational resources must be budgeted and cost must be strictly accounted for during test-time. A typical problem is that if an application consumes x additional units of cost during test-time, but will improve accuracy by y percent, should the additional x resources be allocated? The core of this problem is a trade-off between accuracy and cost. In this thesis, we examine components of test-time cost, and develop different strategies to manage this trade-off.
We first investigate test-time cost and discover that it typically consists of two parts: feature extraction cost and classifier evaluation cost. The former reflects the computational efforts of transforming data instances to feature vectors, and could be highly variable when features are heterogeneous. The latter reflects the effort of evaluating a classifier, which could be substantial, in particular nonparametric algorithms. We then propose three strategies to explicitly trade-off accuracy and the two components of test-time cost during classifier training.
To budget the feature extraction cost, we first introduce two algorithms: GreedyMiser and Anytime Representation Learning (AFR). GreedyMiser employs a strategy that incorporates the extraction cost information during classifier training to explicitly minimize the test-time cost. AFR extends GreedyMiser to learn a cost-sensitive feature representation rather than a classifier, and turns traditional Support Vector Machines (SVM) into test- time cost-sensitive anytime classifiers. GreedyMiser and AFR are evaluated on two real-world data sets from two different application domains, and both achieve record performance.
We then introduce Cost Sensitive Tree of Classifiers (CSTC) and Cost Sensitive Cascade of Classifiers (CSCC), which share a common strategy that trades-off the accuracy and the amortized test-time cost. CSTC introduces a tree structure and directs test inputs along different tree traversal paths, each is optimized for a specific sub-partition of the input space, extracting different, specialized subsets of features. CSCC extends CSTC and builds a linear cascade, instead of a tree, to cope with class-imbalanced binary classification tasks. Since both CSTC and CSCC extract different features for different inputs, the amortized test-time cost is greatly reduced while maintaining high accuracy. Both approaches out-perform the current state-of-the-art on real-world data sets.
To trade-off accuracy and high classifier evaluation cost of nonparametric classifiers, we propose a model compression strategy and develop Compressed Vector Machines (CVM). CVM focuses on the nonparametric kernel Support Vector Machines (SVM), whose test-time evaluation cost is typically substantial when learned from large training sets. CVM is a post-processing algorithm which compresses the learned SVM model by reducing and optimizing support vectors. On several benchmark data sets, CVM maintains high test accuracy while reducing the test-time evaluation cost by several orders of magnitude
Randomized Sketches of Convex Programs with Sharp Guarantees
Random projection (RP) is a classical technique for reducing storage and
computational costs. We analyze RP-based approximations of convex programs, in
which the original optimization problem is approximated by the solution of a
lower-dimensional problem. Such dimensionality reduction is essential in
computation-limited settings, since the complexity of general convex
programming can be quite high (e.g., cubic for quadratic programs, and
substantially higher for semidefinite programs). In addition to computational
savings, random projection is also useful for reducing memory usage, and has
useful properties for privacy-sensitive optimization. We prove that the
approximation ratio of this procedure can be bounded in terms of the geometry
of constraint set. For a broad class of random projections, including those
based on various sub-Gaussian distributions as well as randomized Hadamard and
Fourier transforms, the data matrix defining the cost function can be projected
down to the statistical dimension of the tangent cone of the constraints at the
original solution, which is often substantially smaller than the original
dimension. We illustrate consequences of our theory for various cases,
including unconstrained and -constrained least squares, support vector
machines, low-rank matrix estimation, and discuss implications on
privacy-sensitive optimization and some connections with de-noising and
compressed sensing
Constrained Classification and Policy Learning
Modern machine learning approaches to classification, including AdaBoost,
support vector machines, and deep neural networks, utilize surrogate loss
techniques to circumvent the computational complexity of minimizing empirical
classification risk. These techniques are also useful for causal policy
learning problems, since estimation of individualized treatment rules can be
cast as a weighted (cost-sensitive) classification problem. Consistency of the
surrogate loss approaches studied in Zhang (2004) and Bartlett et al. (2006)
crucially relies on the assumption of correct specification, meaning that the
specified set of classifiers is rich enough to contain a first-best classifier.
This assumption is, however, less credible when the set of classifiers is
constrained by interpretability or fairness, leaving the applicability of
surrogate loss based algorithms unknown in such second-best scenarios. This
paper studies consistency of surrogate loss procedures under a constrained set
of classifiers without assuming correct specification. We show that in the
setting where the constraint restricts the classifier's prediction set only,
hinge losses (i.e., -support vector machines) are the only surrogate
losses that preserve consistency in second-best scenarios. If the constraint
additionally restricts the functional form of the classifier, consistency of a
surrogate loss approach is not guaranteed even with hinge loss. We therefore
characterize conditions for the constrained set of classifiers that can
guarantee consistency of hinge risk minimizing classifiers. Exploiting our
theoretical results, we develop robust and computationally attractive hinge
loss based procedures for a monotone classification problem
Recommended from our members
AN APPLICATION OF MACHINE LEARNING TO BAD PAGE PREDICTION IN MULTILEVEL FLASH
Flash memory is prone to failures as the number of program-erase cycles increase. These physical failures in the flash result in an increase in the bit error rates. Once the bit error count exceeds a certain threshold the error correction engines are incapable of correcting the error without adversely impacting the system performance, or may even fail entirely. This leads to an interest in learning the behavior of error count increase and page failure in the flash memory and obtaining an ability to make failure predictions. We tackle this problem using a machine learning approach. However, standard machine learning techniques may not work well with the particular data in hand. This is because the error counts are collected from actual flash memory and one can expect to see more pages with a lower error count than pages with a higher error count. This feature of the dataset leads to a formulation of our goal in terms of a classification problem with significant class imbalance in the underlying data. We have investigated various classification methods that address such class imbalance. Among those considered are cost-sensitive boosting techniques, bagging procedures, bagging ensemble support vector machines (SVMs) and cost-sensitive neural networks
- …