Location of Repository

Data Selection for Support Vector Machine Classifiers

By Mangasarian Olvi and Glenn Fung


The problem of extracting a minimal number of data points from a large dataset, in order to generate a support vector machine (SVM) classi er, is formulated as a concave minimization problem and solved by a nite number of linear programs. This minimal set of data points, which is the smallest number of support vectors that completely characterize a separating plane classi er, is considerably smaller than that required by a standard 1-norm support vector machine with or without feature selection. The proposed approach also incorporates a feature selection procedure that results in a minimal number of input features used by the classi er. Tenfold cross validation gives as good or better test results using the proposed minimal support vector machine (MSVM) classi er based on the smaller set of data points compared to a standard 1-norm support vector machine classi er. The reduction in data points used by an MSVM classi er over those used by a 1-norm SVM classi er averaged 66% on seven public datasets and was as high as 81%. This makes MSVM a useful incremental classi cation tool which maintains only a small fraction of a large dataset before merging and processing it with new incoming data

Topics: linear programming, concave minimization, data selection, data classification, support vector machines
Year: 2000
DOI identifier: 10.1145/347090.347105
OAI identifier: oai:minds.wisconsin.edu:1793/64282

Suggested articles



  1. (1987). Occam's razor.
  2. (1998). Feature selection via concave minimization and support vector machines.
  3. (2000). Massive data discrimination via linear support vector machines.
  4. (1998). Parsimonious least norm approximation.
  5. (1998). Feature selection via mathematical programming.
  6. Adult dataset.
  7. (1998). A tutorial on support vector machines for pattern recognition.
  8. (1998). Learning from Data -Concepts, Theory and Methods.
  9. (1963). Linear Programming and Extensions.
  10. (1994). Nonlinear Programming.
  11. (1996). Machine learning via polyhedral concave minimization. In
  12. (1997). Solution of general linear complementarity problems via nondierentiable concave minimization.
  13. (1999). Arbitrary-norm separating plane.
  14. (2000). Generalized support vector machines. In
  15. (1999). Data discrimination via nonlinear generalized support vector machines.
  16. (1992). UCI repository of machine learning databases,
  17. (1992). Automated star/galaxy discrimination with neural networks.
  18. (1995). The Nature of Statistical Learning Theory.

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.