156,013 research outputs found

    Unsupervised Feature Learning by Deep Sparse Coding

    Full text link
    In this paper, we propose a new unsupervised feature learning framework, namely Deep Sparse Coding (DeepSC), that extends sparse coding to a multi-layer architecture for visual object recognition tasks. The main innovation of the framework is that it connects the sparse-encoders from different layers by a sparse-to-dense module. The sparse-to-dense module is a composition of a local spatial pooling step and a low-dimensional embedding process, which takes advantage of the spatial smoothness information in the image. As a result, the new method is able to learn several levels of sparse representation of the image which capture features at a variety of abstraction levels and simultaneously preserve the spatial smoothness between the neighboring image patches. Combining the feature representations from multiple layers, DeepSC achieves the state-of-the-art performance on multiple object recognition tasks.Comment: 9 pages, submitted to ICL

    ART Neural Networks for Remote Sensing Image Analysis

    Full text link
    ART and ARTMAP neural networks for adaptive recognition and prediction have been applied to a variety of problems, including automatic mapping from remote sensing satellite measurements, parts design retrieval at the Boeing Company, medical database prediction, and robot vision. This paper features a self-contained introduction to ART and ARTMAP dynamics. An application of these networks to image processing is illustrated by means of a remote sensing example. The basic ART and ARTMAP networks feature winner-take-all (WTA) competitive coding, which groups inputs into discrete recognition categories. WTA coding in these networks enables fast learning, which allows the network to encode important rare cases but which may lead to inefficient category proliferation with noisy training inputs. This problem is partially solved by ART-EMAP, which use WTA coding for learning but distributed category representations for test-set prediction. Recently developed ART models (dART and dARTMAP) retain stable coding, recognition, and prediction, but allow arbitrarily distributed category representation during learning as well as performance

    Regularity scalable image coding based on wavelet singularity detection

    Get PDF
    In this paper, we propose an adaptive algorithm for scalable wavelet image coding, which is based on the general feature, the regularity, of images. In pattern recognition or computer vision, regularity of images is estimated from the oriented wavelet coefficients and quantified by the Lipschitz exponents. To estimate the Lipschitz exponents, evaluating the interscale evolution of the wavelet transform modulus sum (WTMS) over the directional cone of influence was proven to be a better approach than tracing the wavelet transform modulus maxima (WTMM). This is because the irregular sampling nature of the WTMM complicates the reconstruction process. Moreover, examples were found to show that the WTMM representation cannot uniquely characterize a signal. It implies that the reconstruction of signal from its WTMM may not be consistently stable. Furthermore, the WTMM approach requires much more computational effort. Therefore, we use the WTMS approach to estimate the regularity of images from the separable wavelet transformed coefficients. Since we do not concern about the localization issue, we allow the decimation to occur when we evaluate the interscale evolution. After the regularity is estimated, this information is utilized in our proposed adaptive regularity scalable wavelet image coding algorithm. This algorithm can be simply embedded into any wavelet image coders, so it is compatible with the existing scalable coding techniques, such as the resolution scalable and signal-to-noise ratio (SNR) scalable coding techniques, without changing the bitstream format, but provides more scalable levels with higher peak signal-to-noise ratios (PSNRs) and lower bit rates. In comparison to the other feature-based wavelet scalable coding algorithms, the proposed algorithm outperforms them in terms of visual perception, computational complexity and coding efficienc

    Towards Arabic Alphabet and Numbers Sign Language Recognition

    Get PDF
    This paper proposes to develop a new Arabic sign language recognition using Restricted Boltzmann Machines and a direct use of tiny images. Restricted Boltzmann Machines are able to code images as a superposition of a limited number of features taken from a larger alphabet. Repeating this process in deep architecture (Deep Belief Networks) leads to an efficient sparse representation of the initial data in the feature space. A complex problem of classification in the input space is thus transformed into an easier one in the feature space. After appropriate coding, a softmax regression in the feature space must be sufficient to recognize a hand sign according to the input image. To our knowledge, this is the first attempt that tiny images feature extraction using deep architecture is a simpler alternative approach for Arabic sign language recognition that deserves to be considered and investigated

    Novel image descriptors and learning methods for image classification applications

    Get PDF
    Image classification is an active and rapidly expanding research area in computer vision and machine learning due to its broad applications. With the advent of big data, the need for robust image descriptors and learning methods to process a large number of images for different kinds of visual applications has greatly increased. Towards that end, this dissertation focuses on exploring new image descriptors and learning methods by incorporating important visual aspects and enhancing the feature representation in the discriminative space for advancing image classification. First, an innovative sparse representation model using the complete marginal Fisher analysis (CMFA-SR) framework is proposed for improving the image classification performance. In particular, the complete marginal Fisher analysis method extracts the discriminatory features in both the column space of the local samples based within class scatter matrix and the null space of its transformed matrix. To further improve the classification capability, a discriminative sparse representation model is proposed by integrating a representation criterion such as the sparse representation and a discriminative criterion. Second, the discriminative dictionary distribution based sparse coding (DDSC) method is presented that utilizes both the discriminative and generative information to enhance the feature representation. Specifically, the dictionary distribution criterion reveals the class conditional probability of each dictionary item by using the dictionary distribution coefficients, and the discriminative criterion applies new within-class and between-class scatter matrices for discriminant analysis. Third, a fused color Fisher vector (FCFV) feature is developed by integrating the most expressive features of the DAISY Fisher vector (D-FV) feature, the WLD-SIFT Fisher vector (WS-FV) feature, and the SIFT-FV feature in different color spaces to capture the local, color, spatial, relative intensity, as well as the gradient orientation information. Furthermore, a sparse kernel manifold learner (SKML) method is applied to the FCFV features for learning a discriminative sparse representation by considering the local manifold structure and the label information based on the marginal Fisher criterion. Finally, a novel multiple anthropological Fisher kernel framework (M-AFK) is presented to extract and enhance the facial genetic features for kinship verification. The proposed method is derived by applying a novel similarity enhancement approach based on SIFT flow and learning an inheritable transformation on the multiple Fisher vector features that uses the criterion of minimizing the distance among the kinship samples and maximizing the distance among the non-kinship samples. The effectiveness of the proposed methods is assessed on numerous image classification tasks, such as face recognition, kinship verification, scene classification, object classification, and computational fine art painting categorization. The experimental results on popular image datasets show the feasibility of the proposed methods

    Distortion Robust Biometric Recognition

    Get PDF
    abstract: Information forensics and security have come a long way in just a few years thanks to the recent advances in biometric recognition. The main challenge remains a proper design of a biometric modality that can be resilient to unconstrained conditions, such as quality distortions. This work presents a solution to face and ear recognition under unconstrained visual variations, with a main focus on recognition in the presence of blur, occlusion and additive noise distortions. First, the dissertation addresses the problem of scene variations in the presence of blur, occlusion and additive noise distortions resulting from capture, processing and transmission. Despite their excellent performance, ’deep’ methods are susceptible to visual distortions, which significantly reduce their performance. Sparse representations, on the other hand, have shown huge potential capabilities in handling problems, such as occlusion and corruption. In this work, an augmented SRC (ASRC) framework is presented to improve the performance of the Spare Representation Classifier (SRC) in the presence of blur, additive noise and block occlusion, while preserving its robustness to scene dependent variations. Different feature types are considered in the performance evaluation including image raw pixels, HoG and deep learning VGG-Face. The proposed ASRC framework is shown to outperform the conventional SRC in terms of recognition accuracy, in addition to other existing sparse-based methods and blur invariant methods at medium to high levels of distortion, when particularly used with discriminative features. In order to assess the quality of features in improving both the sparsity of the representation and the classification accuracy, a feature sparse coding and classification index (FSCCI) is proposed and used for feature ranking and selection within both the SRC and ASRC frameworks. The second part of the dissertation presents a method for unconstrained ear recognition using deep learning features. The unconstrained ear recognition is performed using transfer learning with deep neural networks (DNNs) as a feature extractor followed by a shallow classifier. Data augmentation is used to improve the recognition performance by augmenting the training dataset with image transformations. The recognition performance of the feature extraction models is compared with an ensemble of fine-tuned networks. The results show that, in the case where long training time is not desirable or a large amount of data is not available, the features from pre-trained DNNs can be used with a shallow classifier to give a comparable recognition accuracy to the fine-tuned networks.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

    FACE RECOGNITION AND VERIFICATION IN UNCONSTRAINED ENVIRIONMENTS

    Get PDF
    Face recognition has been a long standing problem in computer vision. General face recognition is challenging because of large appearance variability due to factors including pose, ambient lighting, expression, size of the face, age, and distance from the camera, etc. There are very accurate techniques to perform face recognition in controlled environments, especially when large numbers of samples are available for each face (individual). However, face identification under uncontrolled( unconstrained) environments or with limited training data is still an unsolved problem. There are two face recognition tasks: face identification (who is who in a probe face set, given a gallery face set) and face verification (same or not, given two faces). In this work, we study both face identification and verification in unconstrained environments. Firstly, we propose a face verification framework that combines Partial Least Squares (PLS) and the One-Shot similarity model[1]. The idea is to describe a face with a large feature set combining shape, texture and color information. PLS regression is applied to perform multi-channel feature weighting on this large feature set. Finally the PLS regression is used to compute the similarity score of an image pair by One-Shot learning (using a fixed negative set). Secondly, we study face identification with image sets, where the gallery and probe are sets of face images of an individual. We model a face set by its covariance matrix (COV) which is a natural 2nd-order statistic of a sample set.By exploring an efficient metric for the SPD matrices, i.e., Log-Euclidean Distance (LED), we derive a kernel function that explicitly maps the covariance matrix from the Riemannian manifold to Euclidean space. Then, discriminative learning is performed on the COV manifold: the learning aims to maximize the between-class COV distance and minimize the within-class COV distance. Sparse representation and dictionary learning have been widely used in face recognition, especially when large numbers of samples are available for each face (individual). Sparse coding is promising since it provides a more stable and discriminative face representation. In the last part of our work, we explore sparse coding and dictionary learning for face verification application. More specifically, in one approach, we apply sparse representations to face verification in two ways via a fix reference set as dictionary. In the other approach, we propose a dictionary learning framework with explicit pairwise constraints, which unifies the discriminative dictionary learning for pair matching (face verification) and classification (face recognition) problems
    • …
    corecore