25,115 research outputs found

    MULTIPLE DICTIONARY FOR SPARSE MODELING

    Get PDF
    Much of the progress made in image processing in the past decades can be attributed to better modeling of image content, and a wise deployment of these models in relevant applications. In this paper, we review the role of this recent model in image processing, its rationale, and models related to it. As it turns out, the field of image processing is one of the main beneficiaries from the recent progress made in the theory and practice of sparse and redundant representations. Sparse coding is a key principle that underlies wavelet representation of images. Sparse representation based classification has led to interesting image recognition results, while the dictionary used for sparse coding plays a key role in it. In general, the choice of a proper dictionary can be done using one of two ways: i) building asparsifying  dictionary based on a mathematical model of the data, or ii) learning a dictionary to perform best on a training set

    Fisher Discrimination Dictionary Learning for sparse representation

    Full text link
    2011 IEEE International Conference on Computer Vision, ICCV 2011, Barcelona, 6-13 November 2011Sparse representation based classification has led to interesting image recognition results, while the dictionary used for sparse coding plays a key role in it. This paper presents a novel dictionary learning (DL) method to improve the pattern classification performance. Based on the Fisher discrimination criterion, a structured dictionary, whose dictionary atoms have correspondence to the class labels, is learned so that the reconstruction error after sparse coding can be used for pattern classification. Meanwhile, the Fisher discrimination criterion is imposed on the coding coefficients so that they have small within-class scatter but big between-class scatter. A new classification scheme associated with the proposed Fisher discrimination DL (FDDL) method is then presented by using both the discriminative information in the reconstruction error and sparse coding coefficients. The proposed FDDL is extensively evaluated on benchmark image databases in comparison with existing sparse representation and DL based classification methods.Department of ComputingRefereed conference pape

    DISCRIMINATIVE LEARNING AND RECOGNITION USING DICTIONARIES

    Get PDF
    In recent years, the theory of sparse representation has emerged as a powerful tool for efficient processing of data in non-traditional ways. This is mainly due to the fact that most signals and images of interest tend to be sparse or compressible in some dictionary. In other words, they can be well approximated by a linear combination of a few elements (also known as atoms) of a dictionary. This dictionary can either be an analytic dictionary composed of wavelets or Fourier basis or it can be directly trained from data. It has been observed that dictionaries learned directly from data provide better representation and hence can improve the performance of many practical applications such as restoration and classification. In this dissertation, we study dictionary learning and recognition under supervised, unsupervised, and semi-supervised settings. In the supervised case, we propose an approach to recognize humans in unconstrained videos, where the main challenge is exploiting the identity information in multiple frames and the accompanying dynamic signature. These identity cues include face, body, and motion. Our approach is based on video-dictionaries for face and body. We design video-dictionaries to implicitly encode temporal, pose, and illumination information. Next, we propose a novel multivariate sparse representation method that jointly represents all the video data by a sparse linear combination of training data. To increase the ability of our algorithm to learn nonlinearities, we apply kernel methods to learn the dictionaries. Next, we address the problem of matching faces across changes in pose in unconstrained videos. Our approach consists of two methods based on 3D rotation and sparse representation that compensate for changes in pose. We demonstrate the superior performance of our approach over several state-of-the-art algorithms through extensive experiments on unconstrained video datasets. In the unsupervised case, we present an approach that simultaneously clusters images and learns dictionaries from the clusters. The method learns dictionaries in the Radon transform domain. The main feature of the proposed approach is that it provides in-plane rotation and scale invariant clustering, which is useful in many applications such as Content Based Image Retrieval (CBIR). We demonstrate through experiments that the proposed rotation and scale invariant clustering provides not only good retrieval performances but also substantial improvements and robustness compared to traditional Gabor-based and several state-of-the-art shape-based methods. We then extend the dictionary learning problem to a generalized semi-supervised formulation, where each training sample is provided with a set of possible labels and only one label among them is the true one. Such applications can be found in image and video collections where one often has only partially labeled data. For instance, given an image with multiple faces and a caption specifying the names, we can be sure that each of the faces belong to one of the names specified, while the exact identity of each face is not known. Labeling involves significant amount of human effort and is expensive. This has motivated researchers to develop learning algorithms from partially labeled training data. In this work, we develop dictionary learning algorithms that utilize such partially labeled data. The proposed method aims to solve the problem of ambiguously labeled multiclass-classification using an iterative algorithm. The dictionaries are updated using either soft (EM-based) or hard decision rules. Extensive evaluations on existing datasets demonstrate that the proposed method performs significantly better than state-of-the-art approaches for learning from ambiguously labeled data. As sparsity plays a major role in our research, we further present a sparse representation-based approach to find the salient views of 3D objects. The salient views are categorized into two groups. The first are boundary representative views that have several visible sides and object surfaces that may be attractive to humans. The second are side representative views that best represent side views of the approximating convex shape. The side representative views are class-specific views and possess the most representative power compared to other within-class views. Using the concept of characteristic view class, we first present a sparse representation-based approach for estimating the boundary representative views. With the estimated boundaries, we determine the side representative views based on a minimum reconstruction error criterion. Furthermore, to evaluate our method, we introduce the notion of geometric dictionaries built from salient views for applications in 3D object recognition, retrieval and sparse-to-full reconstruction. By a series of experiments on four publicly available 3D object datasets, we demonstrate the effectiveness of our approach over state-of-the-art algorithms and baseline methods

    Essays on hyperspectral image analysis: classification and target detection

    Get PDF
    Over the past a few decades, hyperspectral imaging has drawn significant attention and become an important scientific tool for various fields of real-world applications. Among the research topics of hyperspectral image (HSI) analysis, two major topics -- HSI classification and HSI target detection have been intensively studied. Statistical learning has played a pivotal role in promoting the development of algorithms and methodologies for the two topics. Among the existing methods for HSI classification, sparse representation classification (SRC) has been widely investigated, which is based on the assumption that a signal can be represented by a linear combination of a small number of redundant bases (so called dictionary atoms). By virtue of the signal coherence in HSIs, a joint sparse model (JSM) has been successfully developed for HSI classification and has achieved promising performance. However, the JSM-based dictionary learning for HSIs is barely discussed. In addition, the non-negativity properties of coefficients in the JSM are also little touched. HSI target detection can be regarded as a special case of classification, i.e. a binary classification, but faces more challenges. Traditional statistical methods regard a test HSI pixel as a linear combination of several endmembers with corresponding fractions, i.e. based on the linear mixing model (LMM). However, due to the complicated environments in real-world problems, complex mixing effects may exist in HSIs and make the detection of targets more difficult. As a consequence, the performance of traditional LMM is limited. In this thesis, we focus on the topics of HSI classification and HSI target detection and propose five new methods to tackle the aforementioned issues in the two tasks. For the HSI classification, two new methods are proposed based on the JSM. The first proposed method focuses on the dictionary learning, which incorporates the JSM in the discriminative K-SVD learning algorithm, in order to learn a quality dictionary with rich information for improving the classification performance. The second proposed method focuses on developing the convex cone-based JSM, i.e. by incorporating the non-negativity constraints in the coefficients in the JSM. For the HSI target detection, three approaches are proposed based on the linear mixing model (LMM). The first approach takes account of interaction effects to tackle the mixing problems in HSI target detection. The second approach called matched shrunken subspace detector (MSSD) and the third approach, called matched cone shrunken detector (MSCD), both offer on Bayesian derivatives of regularisation constrained LMM. Specifically, the proposed MSSD is a regularised subspace-representation of LMM, while the proposed MSCD is a regularised cone-representation of LMM

    KCRC-LCD: Discriminative Kernel Collaborative Representation with Locality Constrained Dictionary for Visual Categorization

    Full text link
    We consider the image classification problem via kernel collaborative representation classification with locality constrained dictionary (KCRC-LCD). Specifically, we propose a kernel collaborative representation classification (KCRC) approach in which kernel method is used to improve the discrimination ability of collaborative representation classification (CRC). We then measure the similarities between the query and atoms in the global dictionary in order to construct a locality constrained dictionary (LCD) for KCRC. In addition, we discuss several similarity measure approaches in LCD and further present a simple yet effective unified similarity measure whose superiority is validated in experiments. There are several appealing aspects associated with LCD. First, LCD can be nicely incorporated under the framework of KCRC. The LCD similarity measure can be kernelized under KCRC, which theoretically links CRC and LCD under the kernel method. Second, KCRC-LCD becomes more scalable to both the training set size and the feature dimension. Example shows that KCRC is able to perfectly classify data with certain distribution, while conventional CRC fails completely. Comprehensive experiments on many public datasets also show that KCRC-LCD is a robust discriminative classifier with both excellent performance and good scalability, being comparable or outperforming many other state-of-the-art approaches

    Collaborative Representation based Classification for Face Recognition

    Full text link
    By coding a query sample as a sparse linear combination of all training samples and then classifying it by evaluating which class leads to the minimal coding residual, sparse representation based classification (SRC) leads to interesting results for robust face recognition. It is widely believed that the l1- norm sparsity constraint on coding coefficients plays a key role in the success of SRC, while its use of all training samples to collaboratively represent the query sample is rather ignored. In this paper we discuss how SRC works, and show that the collaborative representation mechanism used in SRC is much more crucial to its success of face classification. The SRC is a special case of collaborative representation based classification (CRC), which has various instantiations by applying different norms to the coding residual and coding coefficient. More specifically, the l1 or l2 norm characterization of coding residual is related to the robustness of CRC to outlier facial pixels, while the l1 or l2 norm characterization of coding coefficient is related to the degree of discrimination of facial features. Extensive experiments were conducted to verify the face recognition accuracy and efficiency of CRC with different instantiations.Comment: It is a substantial revision of a previous conference paper (L. Zhang, M. Yang, et al. "Sparse Representation or Collaborative Representation: Which Helps Face Recognition?" in ICCV 2011
    corecore