2,551 research outputs found
Recurrent kernel machines : computing with infinite echo state networks
Echo state networks (ESNs) are large, random recurrent neural networks with a single trained linear readout layer. Despite the untrained nature of the recurrent weights, they are capable of performing universal computations on temporal input data, which makes them interesting for both theoretical research and practical applications. The key to their success lies in the fact that the network computes a broad set of nonlinear, spatiotemporal mappings of the input data, on which linear regression or classification can easily be performed. One could consider the reservoir as a spatiotemporal kernel, in which the mapping to a high-dimensional space is computed explicitly. In this letter, we build on this idea and extend the concept of ESNs to infinite-sized recurrent neural networks, which can be considered recursive kernels that subsequently can be used to create recursive support vector machines. We present the theoretical framework, provide several practical examples of recursive kernels, and apply them to typical temporal tasks
Scalable and Interpretable One-class SVMs with Deep Learning and Random Fourier features
One-class support vector machine (OC-SVM) for a long time has been one of the
most effective anomaly detection methods and extensively adopted in both
research as well as industrial applications. The biggest issue for OC-SVM is
yet the capability to operate with large and high-dimensional datasets due to
optimization complexity. Those problems might be mitigated via dimensionality
reduction techniques such as manifold learning or autoencoder. However,
previous work often treats representation learning and anomaly prediction
separately. In this paper, we propose autoencoder based one-class support
vector machine (AE-1SVM) that brings OC-SVM, with the aid of random Fourier
features to approximate the radial basis kernel, into deep learning context by
combining it with a representation learning architecture and jointly exploit
stochastic gradient descent to obtain end-to-end training. Interestingly, this
also opens up the possible use of gradient-based attribution methods to explain
the decision making for anomaly detection, which has ever been challenging as a
result of the implicit mappings between the input space and the kernel space.
To the best of our knowledge, this is the first work to study the
interpretability of deep learning in anomaly detection. We evaluate our method
on a wide range of unsupervised anomaly detection tasks in which our end-to-end
training architecture achieves a performance significantly better than the
previous work using separate training.Comment: Accepted at European Conference on Machine Learning and Principles
and Practice of Knowledge Discovery in Databases (ECML-PKDD) 201
Fast rates for support vector machines using Gaussian kernels
For binary classification we establish learning rates up to the order of
for support vector machines (SVMs) with hinge loss and Gaussian RBF
kernels. These rates are in terms of two assumptions on the considered
distributions: Tsybakov's noise assumption to establish a small estimation
error, and a new geometric noise condition which is used to bound the
approximation error. Unlike previously proposed concepts for bounding the
approximation error, the geometric noise assumption does not employ any
smoothness assumption.Comment: Published at http://dx.doi.org/10.1214/009053606000001226 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Model Selection for Support Vector Machine Classification
We address the problem of model selection for Support Vector Machine (SVM)
classification. For fixed functional form of the kernel, model selection
amounts to tuning kernel parameters and the slack penalty coefficient . We
begin by reviewing a recently developed probabilistic framework for SVM
classification. An extension to the case of SVMs with quadratic slack penalties
is given and a simple approximation for the evidence is derived, which can be
used as a criterion for model selection. We also derive the exact gradients of
the evidence in terms of posterior averages and describe how they can be
estimated numerically using Hybrid Monte Carlo techniques. Though
computationally demanding, the resulting gradient ascent algorithm is a useful
baseline tool for probabilistic SVM model selection, since it can locate maxima
of the exact (unapproximated) evidence. We then perform extensive experiments
on several benchmark data sets. The aim of these experiments is to compare the
performance of probabilistic model selection criteria with alternatives based
on estimates of the test error, namely the so-called ``span estimate'' and
Wahba's Generalized Approximate Cross-Validation (GACV) error. We find that all
the ``simple'' model criteria (Laplace evidence approximations, and the Span
and GACV error estimates) exhibit multiple local optima with respect to the
hyperparameters. While some of these give performance that is competitive with
results from other approaches in the literature, a significant fraction lead to
rather higher test errors. The results for the evidence gradient ascent method
show that also the exact evidence exhibits local optima, but these give test
errors which are much less variable and also consistently lower than for the
simpler model selection criteria
- …