1 research outputs found

    Regularizing Deep Models for Visual Recognition

    Get PDF
    Image understanding is a shared goal in all computer vision problems. This objective includes decomposing the image into a set of primitive components through which one can perform region segmentation, region labeling, object recognition and finally modeling the interactions between recognized objects. However, due to the large intra-class variations in appearance, shape and structure, extracting image primitives is highly challenging. While images come in the form of intensity matrices, in order to cope with this large variations, a high-level abstraction of images is required. Therefore, the main challenge is to bridge the gap between the low-level pixel representation and the high-level abstract image descriptors. In recent years, we have witnessed a striking popularity of the learned image descriptors using deep networks for visual recognition. The multi-layer architecture of these networks is particularly useful in capturing the hierarchical structure of the image data: simple features are detected at lower layers and fed into higher layers for extracting more complex and abstract representations. Despite the remarkable representational power of deep networks, training these models is computationally expensive. In addition, considering the lack of enough labeled training data in many applications, over-fitting is a serious threat for deep models with large number of free parameters. Also, there are innate issues with the gradient-based optimization procedure used for parameter learning in these models. This research is aimed at addressing the above issues by leveraging domain knowledge. Particularly, we focus on tailoring deep networks for visual recognition through exploiting the characteristics of the image data. These modifications tend to regularize deep models and therefore, improve their generalization performance. We propose novel ways for incorporation of image-specific domain knowledge into deep networks. As part of this thesis, we show how one can significantly decrease the number of free parameters in fully-connected architectures by exploiting the global characteristics of the image data. For convolutional networks, a new multi-neighborhood architecture is introduced which can capture scale-dependent features. In this architecture, the fine-scale image structures (i.e., appearance features) are captured using a small-sized neighborhood while coarse-scale characteristics (i.e., shape features) are detected by considering a wider range area around each pixel. Besides, we propose an effective regularization method for deep networks in which a frequency parameter is devised to specifically treat the issues of gradient-based optimization for training these models. Finally, we introduce a stage-wise training framework for deep networks in which the learning process is broken down into a number of related sub-tasks completed stage-by-stage, where the learned parameters at each stage acts as a prior for the next stage. This goal is achieved through "gradual" injection of the information presented in the training data so that in the early stages of training, the "coarse-scale" properties of the data are captured while the "finer-scale" characteristics are learned in later stages. The performance of the proposed methods are assessed on a number of image classification data sets. Our comprehensive empirical analysis demonstrates that these "regularized" networks offer a better discrimination and generalization performance compared to their domain-oblivious counterparts
    corecore