18 research outputs found

    A Family of Maximum Margin Criterion for Adaptive Learning

    Full text link
    In recent years, pattern analysis plays an important role in data mining and recognition, and many variants have been proposed to handle complicated scenarios. In the literature, it has been quite familiar with high dimensionality of data samples, but either such characteristics or large data have become usual sense in real-world applications. In this work, an improved maximum margin criterion (MMC) method is introduced firstly. With the new definition of MMC, several variants of MMC, including random MMC, layered MMC, 2D^2 MMC, are designed to make adaptive learning applicable. Particularly, the MMC network is developed to learn deep features of images in light of simple deep networks. Experimental results on a diversity of data sets demonstrate the discriminant ability of proposed MMC methods are compenent to be adopted in complicated application scenarios.Comment: 14 page

    Reversible Architectures for Arbitrarily Deep Residual Neural Networks

    Full text link
    Recently, deep residual networks have been successfully applied in many computer vision and natural language processing tasks, pushing the state-of-the-art performance with deeper and wider architectures. In this work, we interpret deep residual networks as ordinary differential equations (ODEs), which have long been studied in mathematics and physics with rich theoretical and empirical success. From this interpretation, we develop a theoretical framework on stability and reversibility of deep neural networks, and derive three reversible neural network architectures that can go arbitrarily deep in theory. The reversibility property allows a memory-efficient implementation, which does not need to store the activations for most hidden layers. Together with the stability of our architectures, this enables training deeper networks using only modest computational resources. We provide both theoretical analyses and empirical results. Experimental results demonstrate the efficacy of our architectures against several strong baselines on CIFAR-10, CIFAR-100 and STL-10 with superior or on-par state-of-the-art performance. Furthermore, we show our architectures yield superior results when trained using fewer training data.Comment: Accepted at AAAI 201

    Class2Str: End to End Latent Hierarchy Learning

    Full text link
    Deep neural networks for image classification typically consists of a convolutional feature extractor followed by a fully connected classifier network. The predicted and the ground truth labels are represented as one hot vectors. Such a representation assumes that all classes are equally dissimilar. However, classes have visual similarities and often form a hierarchy. Learning this latent hierarchy explicitly in the architecture could provide invaluable insights. We propose an alternate architecture to the classifier network called the Latent Hierarchy (LH) Classifier and an end to end learned Class2Str mapping which discovers a latent hierarchy of the classes. We show that for some of the best performing architectures on CIFAR and Imagenet datasets, the proposed replacement and training by LH classifier recovers the accuracy, with a fraction of the number of parameters in the classifier part. Compared to the previous work of HDCNN, which also learns a 2 level hierarchy, we are able to learn a hierarchy at an arbitrary number of levels as well as obtain an accuracy improvement on the Imagenet classification task over them. We also verify that many visually similar classes are grouped together, under the learnt hierarchy.Comment: 6 pages, ICPR 2018, Beijin
    corecore