87,558 research outputs found

    Learning Convolutional Text Representations for Visual Question Answering

    Full text link
    Visual question answering is a recently proposed artificial intelligence task that requires a deep understanding of both images and texts. In deep learning, images are typically modeled through convolutional neural networks, and texts are typically modeled through recurrent neural networks. While the requirement for modeling images is similar to traditional computer vision tasks, such as object recognition and image classification, visual question answering raises a different need for textual representation as compared to other natural language processing tasks. In this work, we perform a detailed analysis on natural language questions in visual question answering. Based on the analysis, we propose to rely on convolutional neural networks for learning textual representations. By exploring the various properties of convolutional neural networks specialized for text data, such as width and depth, we present our "CNN Inception + Gate" model. We show that our model improves question representations and thus the overall accuracy of visual question answering models. We also show that the text representation requirement in visual question answering is more complicated and comprehensive than that in conventional natural language processing tasks, making it a better task to evaluate textual representation methods. Shallow models like fastText, which can obtain comparable results with deep learning models in tasks like text classification, are not suitable in visual question answering.Comment: Conference paper at SDM 2018. https://github.com/divelab/sva

    End-to-End Kernel Learning with Supervised Convolutional Kernel Networks

    Get PDF
    In this paper, we introduce a new image representation based on a multilayer kernel machine. Unlike traditional kernel methods where data representation is decoupled from the prediction task, we learn how to shape the kernel with supervision. We proceed by first proposing improvements of the recently-introduced convolutional kernel networks (CKNs) in the context of unsupervised learning; then, we derive backpropagation rules to take advantage of labeled training data. The resulting model is a new type of convolutional neural network, where optimizing the filters at each layer is equivalent to learning a linear subspace in a reproducing kernel Hilbert space (RKHS). We show that our method achieves reasonably competitive performance for image classification on some standard "deep learning" datasets such as CIFAR-10 and SVHN, and also for image super-resolution, demonstrating the applicability of our approach to a large variety of image-related tasks.Comment: to appear in Advances in Neural Information Processing Systems (NIPS

    Convolutional Patch Representations for Image Retrieval: an Unsupervised Approach

    Get PDF
    International audienceConvolutional neural networks (CNNs) have recently received a lot of attention due to their ability to model local stationary structures in natural images in a multi-scale fashion, when learning all model parameters with supervision. While excellent performance was achieved for image classification when large amounts of labeled visual data are available, their success for un-supervised tasks such as image retrieval has been moderate so far. Our paper focuses on this latter setting and explores several methods for learning patch descriptors without supervision with application to matching and instance-level retrieval. To that effect, we propose a new family of convolutional descriptors for patch representation , based on the recently introduced convolutional kernel networks. We show that our descriptor, named Patch-CKN, performs better than SIFT as well as other convolutional networks learned by artificially introducing supervision and is significantly faster to train. To demonstrate its effectiveness, we perform an extensive evaluation on standard benchmarks for patch and image retrieval where we obtain state-of-the-art results. We also introduce a new dataset called RomePatches, which allows to simultaneously study descriptor performance for patch and image retrieval

    Convolutional neural network model in machine learning methods and computer vision for image recognition: a review

    Get PDF
    Recently, Convolutional Neural Networks (CNNs) are used in variety of areas including image and pattern recognition, speech recognition, biometric embedded vision, food recognition and video analysis for surveillance, industrial robots and autonomous cars. There are a number of reasons that convolutional neural networks (CNNs) are becoming important. Feature extractors are hand designed during traditional models for image recognition. In CNNs, the weights of the convolutional layer being used for feature extraction in addition to the fully connected layer are applied for classification that are determined during the training process. The objective of this paper is to review a few learning machine methods of convolutional neural network (CNNs) in image recognition. Furthermore, current approaches to image recognition make essential use of machine learning methods. Based on twenty five journal that have been review, this paper focusing on the development trend of convolution neural network (CNNs) model due to various learning method in image recognition since 2000s, which is mainly introduced from the aspects of capturing, verification and clustering. Consequently, deep convolutional neural network (DCNNs) have shown much successful in various machine learning and computer vision problem because it significant quality gain at a modest increase of computational requirement. This training method also allows models that are composed of multiple processing layers to learn representation of data with multiple levels of abstraction
    • …
    corecore