2 research outputs found
Mixed-Integer Quadratic Optimization and Iterative Clustering Techniques for Semi-Supervised Support Vector Machines
Among the most famous algorithms for solving classification problems are
support vector machines (SVMs), which find a separating hyperplane for a set of
labeled data points. In some applications, however, labels are only available
for a subset of points. Furthermore, this subset can be non-representative,
e.g., due to self-selection in a survey. Semi-supervised SVMs tackle the
setting of labeled and unlabeled data and can often improve the reliability of
the results. Moreover, additional information about the size of the classes can
be available from undisclosed sources. We propose a mixed-integer quadratic
optimization (MIQP) model that covers the setting of labeled and unlabeled data
points as well as the overall number of points in each class. Since the MIQP's
solution time rapidly grows as the number of variables increases, we introduce
an iterative clustering approach to reduce the model's size. Moreover, we
present an update rule for the required big- values, prove the correctness
of the iterative clustering method as well as derive tailored
dimension-reduction and warm-starting techniques. Our numerical results show
that our approach leads to a similar accuracy and precision than the MIQP
formulation but at much lower computational cost. Thus, we can solve solve
larger problems. With respect to the original SVM formulation, we observe that
our approach has even better accuracy and precision for biased samples.Comment: 33 pages,18 figure