831 research outputs found

    A Taxonomy of Deep Convolutional Neural Nets for Computer Vision

    Get PDF
    Traditional architectures for solving computer vision problems and the degree of success they enjoyed have been heavily reliant on hand-crafted features. However, of late, deep learning techniques have offered a compelling alternative -- that of automatically learning problem-specific features. With this new paradigm, every problem in computer vision is now being re-examined from a deep learning perspective. Therefore, it has become important to understand what kind of deep networks are suitable for a given problem. Although general surveys of this fast-moving paradigm (i.e. deep-networks) exist, a survey specific to computer vision is missing. We specifically consider one form of deep networks widely used in computer vision - convolutional neural networks (CNNs). We start with "AlexNet" as our base CNN and then examine the broad variations proposed over time to suit different applications. We hope that our recipe-style survey will serve as a guide, particularly for novice practitioners intending to use deep-learning techniques for computer vision.Comment: Published in Frontiers in Robotics and AI (http://goo.gl/6691Bm

    Face Centered Image Analysis Using Saliency and Deep Learning Based Techniques

    Get PDF
    Image analysis starts with the purpose of configuring vision machines that can perceive like human to intelligently infer general principles and sense the surrounding situations from imagery. This dissertation studies the face centered image analysis as the core problem in high level computer vision research and addresses the problem by tackling three challenging subjects: Are there anything interesting in the image? If there is, what is/are that/they? If there is a person presenting, who is he/she? What kind of expression he/she is performing? Can we know his/her age? Answering these problems results in the saliency-based object detection, deep learning structured objects categorization and recognition, human facial landmark detection and multitask biometrics. To implement object detection, a three-level saliency detection based on the self-similarity technique (SMAP) is firstly proposed in the work. The first level of SMAP accommodates statistical methods to generate proto-background patches, followed by the second level that implements local contrast computation based on image self-similarity characteristics. At last, the spatial color distribution constraint is considered to realize the saliency detection. The outcome of the algorithm is a full resolution image with highlighted saliency objects and well-defined edges. In object recognition, the Adaptive Deconvolution Network (ADN) is implemented to categorize the objects extracted from saliency detection. To improve the system performance, L1/2 norm regularized ADN has been proposed and tested in different applications. The results demonstrate the efficiency and significance of the new structure. To fully understand the facial biometrics related activity contained in the image, the low rank matrix decomposition is introduced to help locate the landmark points on the face images. The natural extension of this work is beneficial in human facial expression recognition and facial feature parsing research. To facilitate the understanding of the detected facial image, the automatic facial image analysis becomes essential. We present a novel deeply learnt tree-structured face representation to uniformly model the human face with different semantic meanings. We show that the proposed feature yields unified representation in multi-task facial biometrics and the multi-task learning framework is applicable to many other computer vision tasks

    CNN Feature Map Interpretation and Key-Point Detection Using Statistics of Activation Layers

    Get PDF
    Convolutional Neural Networks (CNNs) have evolved to be very accurate for the classification of image objects from a single image or frames in video. A major function in a CNN model is the extraction and encoding of features from training or ground truth images, and simple CNN models are trained to identify a dominant object in an image from the feature encodings. More complex models such as RCNN and others can identify and locate multiple objects in an image. Feature Maps from trained CNNs contain useful information beyond the encoding for classification or detection. By examining the maximum activation values and statistics from early layer feature maps it is possible to identify key points of objects, including location, particularly object types that were included in the original training data set. Methods are introduced that leverage the key points extracted from these early layers to isolate objects for more accurate classification and detection, using simpler networks compared to more complex, integrated networks. An examination of the feature extraction process will provide insight into the information that is available in the various feature map layers of a CNN. While a basic CNN model does not explicitly create instances of visual or other types of information expression, it is possible to examine the Feature Map layers and create a framework for interpreting these layers. This can be valuable in a variety of different goals such object location and size, feature statistics, and redundancy analysis. In this thesis we examine in detail the interpretation of Feature Maps in CNN models, and develop a method for extracting information from trained convolutional layers to locate objects belonging to a pre-trained image data set. A major contribution of this work is the analysis of statistical characteristics of early layer feature maps and development of a method of identifying key-points of objects without the benefit of information from deeper layers. A second contribution is analysis of the accuracy of the selections as key-points of objects present in the image. A third contribution is the clustering of key-points to form partitions for cropping the original image and computing detection using the simple CNN model. This key-point detection method has the potential to greatly improve the classification capability of simple CNNs by making it possible to identify multiple objects in a complex input image, with a modest computation cost, and also provide localization information
    • …
    corecore