12,486 research outputs found
Model Selection for Support Vector Machine Classification
We address the problem of model selection for Support Vector Machine (SVM)
classification. For fixed functional form of the kernel, model selection
amounts to tuning kernel parameters and the slack penalty coefficient . We
begin by reviewing a recently developed probabilistic framework for SVM
classification. An extension to the case of SVMs with quadratic slack penalties
is given and a simple approximation for the evidence is derived, which can be
used as a criterion for model selection. We also derive the exact gradients of
the evidence in terms of posterior averages and describe how they can be
estimated numerically using Hybrid Monte Carlo techniques. Though
computationally demanding, the resulting gradient ascent algorithm is a useful
baseline tool for probabilistic SVM model selection, since it can locate maxima
of the exact (unapproximated) evidence. We then perform extensive experiments
on several benchmark data sets. The aim of these experiments is to compare the
performance of probabilistic model selection criteria with alternatives based
on estimates of the test error, namely the so-called ``span estimate'' and
Wahba's Generalized Approximate Cross-Validation (GACV) error. We find that all
the ``simple'' model criteria (Laplace evidence approximations, and the Span
and GACV error estimates) exhibit multiple local optima with respect to the
hyperparameters. While some of these give performance that is competitive with
results from other approaches in the literature, a significant fraction lead to
rather higher test errors. The results for the evidence gradient ascent method
show that also the exact evidence exhibits local optima, but these give test
errors which are much less variable and also consistently lower than for the
simpler model selection criteria
A sparse multinomial probit model for classification
A recent development in penalized probit modelling using a hierarchical Bayesian approach has led to a sparse binomial (two-class) probit classifier that can be trained via an EM algorithm. A key advantage of the formulation is that no tuning of hyperparameters relating to the penalty is needed thus simplifying the model selection process. The resulting model demonstrates excellent classification performance and a high degree of sparsity when used as a kernel machine. It is, however, restricted to the binary classification problem and can only be used in the multinomial situation via a one-against-all or one-against-many strategy. To overcome this, we apply the idea to the multinomial probit model. This leads to a direct multi-classification approach and is shown to give a sparse solution with accuracy and sparsity comparable with the current state-of-the-art. Comparative numerical benchmark examples are used to demonstrate the method
Hyperparameter Importance Across Datasets
With the advent of automated machine learning, automated hyperparameter
optimization methods are by now routinely used in data mining. However, this
progress is not yet matched by equal progress on automatic analyses that yield
information beyond performance-optimizing hyperparameter settings. In this
work, we aim to answer the following two questions: Given an algorithm, what
are generally its most important hyperparameters, and what are typically good
values for these? We present methodology and a framework to answer these
questions based on meta-learning across many datasets. We apply this
methodology using the experimental meta-data available on OpenML to determine
the most important hyperparameters of support vector machines, random forests
and Adaboost, and to infer priors for all their hyperparameters. The results,
obtained fully automatically, provide a quantitative basis to focus efforts in
both manual algorithm design and in automated hyperparameter optimization. The
conducted experiments confirm that the hyperparameters selected by the proposed
method are indeed the most important ones and that the obtained priors also
lead to statistically significant improvements in hyperparameter optimization.Comment: \c{opyright} 2018. Copyright is held by the owner/author(s).
Publication rights licensed to ACM. This is the author's version of the work.
It is posted here for your personal use, not for redistribution. The
definitive Version of Record was published in Proceedings of the 24th ACM
SIGKDD International Conference on Knowledge Discovery & Data Minin
Mean field variational Bayesian inference for support vector machine classification
A mean field variational Bayes approach to support vector machines (SVMs)
using the latent variable representation on Polson & Scott (2012) is presented.
This representation allows circumvention of many of the shortcomings associated
with classical SVMs including automatic penalty parameter selection, the
ability to handle dependent samples, missing data and variable selection. We
demonstrate on simulated and real datasets that our approach is easily
extendable to non-standard situations and outperforms the classical SVM
approach whilst remaining computationally efficient.Comment: 18 pages, 4 figure
Sparse multinomial kernel discriminant analysis (sMKDA)
Dimensionality reduction via canonical variate analysis (CVA) is important for pattern recognition and has been extended variously to permit more flexibility, e.g. by "kernelizing" the formulation. This can lead to over-fitting, usually ameliorated by regularization. Here, a method for sparse, multinomial kernel discriminant analysis (sMKDA) is proposed, using a sparse basis to control complexity. It is based on the connection between CVA and least-squares, and uses forward selection via orthogonal least-squares to approximate a basis, generalizing a similar approach for binomial problems. Classification can be performed directly via minimum Mahalanobis distance in the canonical variates. sMKDA achieves state-of-the-art performance in terms of accuracy and sparseness on 11 benchmark datasets
- …