96 research outputs found

    Learning Multimodal Structures in Computer Vision

    Get PDF
    A phenomenon or event can be received from various kinds of detectors or under different conditions. Each such acquisition framework is a modality of the phenomenon. Due to the relation between the modalities of multimodal phenomena, a single modality cannot fully describe the event of interest. Since several modalities report on the same event introduces new challenges comparing to the case of exploiting each modality separately. We are interested in designing new algorithmic tools to apply sensor fusion techniques in the particular signal representation of sparse coding which is a favorite methodology in signal processing, machine learning and statistics to represent data. This coding scheme is based on a machine learning technique and has been demonstrated to be capable of representing many modalities like natural images. We will consider situations where we are not only interested in support of the model to be sparse, but also to reflect a-priorily known knowledge about the application in hand. Our goal is to extract a discriminative representation of the multimodal data that leads to easily finding its essential characteristics in the subsequent analysis step, e.g., regression and classification. To be more precise, sparse coding is about representing signals as linear combinations of a small number of bases from a dictionary. The idea is to learn a dictionary that encodes intrinsic properties of the multimodal data in a decomposition coefficient vector that is favorable towards the maximal discriminatory power. We carefully design a multimodal representation framework to learn discriminative feature representations by fully exploiting, the modality-shared which is the information shared by various modalities, and modality-specific which is the information content of each modality individually. Plus, it automatically learns the weights for various feature components in a data-driven scheme. In other words, the physical interpretation of our learning framework is to fully exploit the correlated characteristics of the available modalities, while at the same time leverage the modality-specific character of each modality and change their corresponding weights for different parts of the feature in recognition

    Sparse Modeling for Image and Vision Processing

    Get PDF
    In recent years, a large amount of multi-disciplinary research has been conducted on sparse models and their applications. In statistics and machine learning, the sparsity principle is used to perform model selection---that is, automatically selecting a simple model among a large collection of them. In signal processing, sparse coding consists of representing data with linear combinations of a few dictionary elements. Subsequently, the corresponding tools have been widely adopted by several scientific communities such as neuroscience, bioinformatics, or computer vision. The goal of this monograph is to offer a self-contained view of sparse modeling for visual recognition and image processing. More specifically, we focus on applications where the dictionary is learned and adapted to data, yielding a compact representation that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics and Visio

    Sparse Coding with Structured Sparsity Priors and Multilayer Architecture for Image Classification

    Get PDF
    Applying sparse coding on large dataset for image classification is a long standing problem in the field of computer vision. It has been found that the sparse coding models exhibit disappointing performance on these large datasets where variability is broad and anomalies are common. Conversely, deep neural networks thrive on bountiful data. Their success has encouraged researchers to try and augment the learning capacity of traditionally shallow sparse coding methods by adding layers. Multilayer sparse coding networks are expected to combine the best of both sparsity regularizations and deep architectures. To date, however, endeavors to marry the two techniques have not achieved significant improvements over their individual counterparts. In this thesis, we first briefly review multiple structured sparsity priors as well as various supervised dictionary learning techniques with applications on hyperspectral image classification. Based on the structured sparsity priors and dictionary learning techniques, we then develop a novel multilayer sparse coding network that contains thirteen sparse coding layers. The proposed sparse coding network learns both the dictionaries and the regularization parameters simultaneously using an end-to-end supervised learning scheme. We show empirical evidence that the regularization parameters can adapt to the given training data. We also propose applying dimension reduction within sparse coding networks to dramatically reduce the output dimensionality of the sparse coding layers and mitigate computational costs. Moreover, our sparse coding network is compatible with other powerful deep learning techniques such as drop out, batch normalization and shortcut connections. Experimental results show that the proposed multilayer sparse coding network produces classification accuracy competitive with the deep neural networks while using significantly fewer parameters and layers

    Support matrix machine: A review

    Full text link
    Support vector machine (SVM) is one of the most studied paradigms in the realm of machine learning for classification and regression problems. It relies on vectorized input data. However, a significant portion of the real-world data exists in matrix format, which is given as input to SVM by reshaping the matrices into vectors. The process of reshaping disrupts the spatial correlations inherent in the matrix data. Also, converting matrices into vectors results in input data with a high dimensionality, which introduces significant computational complexity. To overcome these issues in classifying matrix input data, support matrix machine (SMM) is proposed. It represents one of the emerging methodologies tailored for handling matrix input data. The SMM method preserves the structural information of the matrix data by using the spectral elastic net property which is a combination of the nuclear norm and Frobenius norm. This article provides the first in-depth analysis of the development of the SMM model, which can be used as a thorough summary by both novices and experts. We discuss numerous SMM variants, such as robust, sparse, class imbalance, and multi-class classification models. We also analyze the applications of the SMM model and conclude the article by outlining potential future research avenues and possibilities that may motivate academics to advance the SMM algorithm
    • …
    corecore