Search CORE

111,349 research outputs found

Learning Convolutional Text Representations for Visual Question Answering

Author: Ji Shuiwang
Wang Zhengyang
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 18/04/2018
Field of study

Visual question answering is a recently proposed artificial intelligence task that requires a deep understanding of both images and texts. In deep learning, images are typically modeled through convolutional neural networks, and texts are typically modeled through recurrent neural networks. While the requirement for modeling images is similar to traditional computer vision tasks, such as object recognition and image classification, visual question answering raises a different need for textual representation as compared to other natural language processing tasks. In this work, we perform a detailed analysis on natural language questions in visual question answering. Based on the analysis, we propose to rely on convolutional neural networks for learning textual representations. By exploring the various properties of convolutional neural networks specialized for text data, such as width and depth, we present our "CNN Inception + Gate" model. We show that our model improves question representations and thus the overall accuracy of visual question answering models. We also show that the text representation requirement in visual question answering is more complicated and comprehensive than that in conventional natural language processing tasks, making it a better task to evaluate textual representation methods. Shallow models like fastText, which can obtain comparable results with deep learning models in tasks like text classification, are not suitable in visual question answering.Comment: Conference paper at SDM 2018. https://github.com/divelab/sva

arXiv.org e-Print Archive

TOWARDS DEEP LEARNING FOR ARCHITECTURE: A MONUMENT RECOGNITION MOBILE APP

Author: V. Palma
Publication venue
Publication date: 01/01/2019
Field of study

Abstract. In recent years, the diffusion of large image datasets and an unprecedented computational power have boosted the development of a class of artificial intelligence (AI) algorithms referred to as deep learning (DL). Among DL methods, convolutional neural networks (CNNs) have proven particularly effective in computer vision, finding applications in many disciplines. This paper introduces a project aimed at studying CNN techniques in the field of architectural heritage, a still to be developed research stream. The first steps and results in the development of a mobile app to recognize monuments are discussed. While AI is just beginning to interact with the built environment through mobile devices, heritage technologies have long been producing and exploring digital models and spatial archives. The interaction between DL algorithms and state-of-the-art information modeling is addressed, as an opportunity to both exploit heritage collections and optimize new object recognition techniques.</p

Directory of Open Access Journals

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Open Access Repository

Classification of Time-Series Images Using Deep Convolutional Neural Networks

Author: Abdel-Hamid
Armano
Bouvrie
Chen
Cui
Dalto
Deng
Eads
Graves
Hatami
Hatami
Krizhevsky
Krizhevsky
LeCun
LeCun
Lee
Nanopoulos
Rakthanmanon
Rodriguez
Senin
Simonyan
Souza
Souza
Wang
Wang
Wang
Xing
Yang
Zheng
Publication venue
Publication date: 07/10/2017
Field of study

Convolutional Neural Networks (CNN) has achieved a great success in image recognition task by automatically learning a hierarchical feature representation from raw data. While the majority of Time-Series Classification (TSC) literature is focused on 1D signals, this paper uses Recurrence Plots (RP) to transform time-series into 2D texture images and then take advantage of the deep CNN classifier. Image representation of time-series introduces different feature types that are not available for 1D signals, and therefore TSC can be treated as texture image recognition task. CNN model also allows learning different levels of representations together with a classifier, jointly and automatically. Therefore, using RP and CNN in a unified framework is expected to boost the recognition rate of TSC. Experimental results on the UCR time-series classification archive demonstrate competitive accuracy of the proposed approach, compared not only to the existing deep architectures, but also to the state-of-the art TSC algorithms.Comment: The 10th International Conference on Machine Vision (ICMV 2017

arXiv.org e-Print Archive