18 research outputs found
A Family of Maximum Margin Criterion for Adaptive Learning
In recent years, pattern analysis plays an important role in data mining and
recognition, and many variants have been proposed to handle complicated
scenarios. In the literature, it has been quite familiar with high
dimensionality of data samples, but either such characteristics or large data
have become usual sense in real-world applications. In this work, an improved
maximum margin criterion (MMC) method is introduced firstly. With the new
definition of MMC, several variants of MMC, including random MMC, layered MMC,
2D^2 MMC, are designed to make adaptive learning applicable. Particularly, the
MMC network is developed to learn deep features of images in light of simple
deep networks. Experimental results on a diversity of data sets demonstrate the
discriminant ability of proposed MMC methods are compenent to be adopted in
complicated application scenarios.Comment: 14 page
Reversible Architectures for Arbitrarily Deep Residual Neural Networks
Recently, deep residual networks have been successfully applied in many
computer vision and natural language processing tasks, pushing the
state-of-the-art performance with deeper and wider architectures. In this work,
we interpret deep residual networks as ordinary differential equations (ODEs),
which have long been studied in mathematics and physics with rich theoretical
and empirical success. From this interpretation, we develop a theoretical
framework on stability and reversibility of deep neural networks, and derive
three reversible neural network architectures that can go arbitrarily deep in
theory. The reversibility property allows a memory-efficient implementation,
which does not need to store the activations for most hidden layers. Together
with the stability of our architectures, this enables training deeper networks
using only modest computational resources. We provide both theoretical analyses
and empirical results. Experimental results demonstrate the efficacy of our
architectures against several strong baselines on CIFAR-10, CIFAR-100 and
STL-10 with superior or on-par state-of-the-art performance. Furthermore, we
show our architectures yield superior results when trained using fewer training
data.Comment: Accepted at AAAI 201
Class2Str: End to End Latent Hierarchy Learning
Deep neural networks for image classification typically consists of a
convolutional feature extractor followed by a fully connected classifier
network. The predicted and the ground truth labels are represented as one hot
vectors. Such a representation assumes that all classes are equally dissimilar.
However, classes have visual similarities and often form a hierarchy. Learning
this latent hierarchy explicitly in the architecture could provide invaluable
insights. We propose an alternate architecture to the classifier network called
the Latent Hierarchy (LH) Classifier and an end to end learned Class2Str
mapping which discovers a latent hierarchy of the classes. We show that for
some of the best performing architectures on CIFAR and Imagenet datasets, the
proposed replacement and training by LH classifier recovers the accuracy, with
a fraction of the number of parameters in the classifier part. Compared to the
previous work of HDCNN, which also learns a 2 level hierarchy, we are able to
learn a hierarchy at an arbitrary number of levels as well as obtain an
accuracy improvement on the Imagenet classification task over them. We also
verify that many visually similar classes are grouped together, under the
learnt hierarchy.Comment: 6 pages, ICPR 2018, Beijin