7 research outputs found

    Visual Understanding via Multi-Feature Shared Learning with Global Consistency

    Full text link
    Image/video data is usually represented with multiple visual features. Fusion of multi-source information for establishing the attributes has been widely recognized. Multi-feature visual recognition has recently received much attention in multimedia applications. This paper studies visual understanding via a newly proposed l_2-norm based multi-feature shared learning framework, which can simultaneously learn a global label matrix and multiple sub-classifiers with the labeled multi-feature data. Additionally, a group graph manifold regularizer composed of the Laplacian and Hessian graph is proposed for better preserving the manifold structure of each feature, such that the label prediction power is much improved through the semi-supervised learning with global label consistency. For convenience, we call the proposed approach Global-Label-Consistent Classifier (GLCC). The merits of the proposed method include: 1) the manifold structure information of each feature is exploited in learning, resulting in a more faithful classification owing to the global label consistency; 2) a group graph manifold regularizer based on the Laplacian and Hessian regularization is constructed; 3) an efficient alternative optimization method is introduced as a fast solver owing to the convex sub-problems. Experiments on several benchmark visual datasets for multimedia understanding, such as the 17-category Oxford Flower dataset, the challenging 101-category Caltech dataset, the YouTube & Consumer Videos dataset and the large-scale NUS-WIDE dataset, demonstrate that the proposed approach compares favorably with the state-of-the-art algorithms. An extensive experiment on the deep convolutional activation features also show the effectiveness of the proposed approach. The code is available on http://www.escience.cn/people/lei/index.htmlComment: 13 pages,6 figures, this paper is accepted for publication in IEEE Transactions on Multimedi

    On the application of reservoir computing networks for noisy image recognition

    Get PDF
    Reservoir Computing Networks (RCNs) are a special type of single layer recurrent neural networks, in which the input and the recurrent connections are randomly generated and only the output weights are trained. Besides the ability to process temporal information, the key points of RCN are easy training and robustness against noise. Recently, we introduced a simple strategy to tune the parameters of RCNs. Evaluation in the domain of noise robust speech recognition proved that this method was effective. The aim of this work is to extend that study to the field of image processing, by showing that the proposed parameter tuning procedure is equally valid in the field of image processing and conforming that RCNs are apt at temporal modeling and are robust with respect to noise. In particular, we investigate the potential of RCNs in achieving competitive performance on the well-known MNIST dataset by following the aforementioned parameter optimizing strategy. Moreover, we achieve good noise robust recognition by utilizing such a network to denoise images and supplying them to a recognizer that is solely trained on clean images. The experiments demonstrate that the proposed RCN-based handwritten digit recognizer achieves an error rate of 0.81 percent on the clean test data of the MNIST benchmark and that the proposed RCN-based denoiser can effectively reduce the error rate on the various types of noise. (c) 2017 Elsevier B.V. All rights reserved

    Single Image Super-Resolution Using a Deep Encoder-Decoder Symmetrical Network with Iterative Back Projection

    Get PDF
    Image super-resolution (SR) usually refers to reconstructing a high resolution (HR) image from a low resolution (LR) image without losing high frequency details or reducing the image quality. Recently, image SR based on convolutional neural network (SRCNN) was proposed and has received much attention due to its end-to-end mapping simplicity and superior performance. This method, however, only using three convolution layers to learn the mapping from LR to HR, usually converges slowly and leads to the size of output image reducing significantly. To address these issues, in this work, we propose a novel deep encoder-decoder symmetrical neural network (DEDSN) for single image SR. This deep network is fully composed of symmetrical multiple layers of convolution and deconvolution and there is no pooling (down-sampling and up-sampling) operations in the whole network so that image details degradation occurred in traditional convolutional frameworks is prevented. Additionally, in view of the success of the iterative back projection (IBP) algorithm in image SR, we further combine DEDSN with IBP network realization in this work. The new DEDSN-IBP model introduces the down sampling version of the ground truth image and calculates the simulation error as the prior guidance. Experimental results on benchmark data sets demonstrate that the proposed DEDSN model can achieve better performance than SRCNN and the improved DEDSN-IBP outperforms the reported state-of-the-art methods

    Single Image Super-Resolution Using a Deep Encoder-Decoder Symmetrical Network with Iterative Back Projection

    Get PDF
    Image super-resolution (SR) usually refers to reconstructing a high resolution (HR) image from a low resolution (LR) image without losing high frequency details or reducing the image quality. Recently, image SR based on convolutional neural network (SRCNN) was proposed and has received much attention due to its end-to-end mapping simplicity and superior performance. This method, however, only using three convolution layers to learn the mapping from LR to HR, usually converges slowly and leads to the size of output image reducing significantly. To address these issues, in this work, we propose a novel deep encoder-decoder symmetrical neural network (DEDSN) for single image SR. This deep network is fully composed of symmetrical multiple layers of convolution and deconvolution and there is no pooling (down-sampling and up-sampling) operations in the whole network so that image details degradation occurred in traditional convolutional frameworks is prevented. Additionally, in view of the success of the iterative back projection (IBP) algorithm in image SR, we further combine DEDSN with IBP network realization in this work. The new DEDSN-IBP model introduces the down sampling version of the ground truth image and calculates the simulation error as the prior guidance. Experimental results on benchmark data sets demonstrate that the proposed DEDSN model can achieve better performance than SRCNN and the improved DEDSN-IBP outperforms the reported state-of-the-art methods
    corecore