Deep learning using linear support vector machines

Abstract

Recently, fully-connected and convolutional neural networks have been trained to achieve state-of-the-art performance on a wide vari-ety of tasks such as speech recognition, im-age classification, natural language process-ing, and bioinformatics. For classification tasks, most of these “deep learning ” models employ the softmax activation function for prediction and minimize cross-entropy loss. In this paper, we demonstrate a small but consistent advantage of replacing the soft-max layer with a linear support vector ma-chine. Learning minimizes a margin-based loss instead of the cross-entropy loss. While there have been various combinations of neu-ral nets and SVMs in prior art, our results using L2-SVMs show that by simply replac-ing softmax with linear SVMs gives signifi-cant gains on popular deep learning datasets MNIST, CIFAR-10, and the ICML 2013 Rep-resentation Learning Workshop’s face expres-sion recognition challenge. 1

Similar works

Full text

thumbnail-image
oai:CiteSeerX.psu:10.1.1.767.4102Last time updated on 10/30/2017

This paper was published in CiteSeerX.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.