Deep learning using linear support vector machines
Abstract
Recently, fully-connected and convolutional neural networks have been trained to achieve state-of-the-art performance on a wide vari-ety of tasks such as speech recognition, im-age classification, natural language process-ing, and bioinformatics. For classification tasks, most of these “deep learning ” models employ the softmax activation function for prediction and minimize cross-entropy loss. In this paper, we demonstrate a small but consistent advantage of replacing the soft-max layer with a linear support vector ma-chine. Learning minimizes a margin-based loss instead of the cross-entropy loss. While there have been various combinations of neu-ral nets and SVMs in prior art, our results using L2-SVMs show that by simply replac-ing softmax with linear SVMs gives signifi-cant gains on popular deep learning datasets MNIST, CIFAR-10, and the ICML 2013 Rep-resentation Learning Workshop’s face expres-sion recognition challenge. 1