480 research outputs found
Learning Convolutional Text Representations for Visual Question Answering
Visual question answering is a recently proposed artificial intelligence task
that requires a deep understanding of both images and texts. In deep learning,
images are typically modeled through convolutional neural networks, and texts
are typically modeled through recurrent neural networks. While the requirement
for modeling images is similar to traditional computer vision tasks, such as
object recognition and image classification, visual question answering raises a
different need for textual representation as compared to other natural language
processing tasks. In this work, we perform a detailed analysis on natural
language questions in visual question answering. Based on the analysis, we
propose to rely on convolutional neural networks for learning textual
representations. By exploring the various properties of convolutional neural
networks specialized for text data, such as width and depth, we present our
"CNN Inception + Gate" model. We show that our model improves question
representations and thus the overall accuracy of visual question answering
models. We also show that the text representation requirement in visual
question answering is more complicated and comprehensive than that in
conventional natural language processing tasks, making it a better task to
evaluate textual representation methods. Shallow models like fastText, which
can obtain comparable results with deep learning models in tasks like text
classification, are not suitable in visual question answering.Comment: Conference paper at SDM 2018. https://github.com/divelab/sva
Spatial Variational Auto-Encoding via Matrix-Variate Normal Distributions
The key idea of variational auto-encoders (VAEs) resembles that of
traditional auto-encoder models in which spatial information is supposed to be
explicitly encoded in the latent space. However, the latent variables in VAEs
are vectors, which can be interpreted as multiple feature maps of size 1x1.
Such representations can only convey spatial information implicitly when
coupled with powerful decoders. In this work, we propose spatial VAEs that use
feature maps of larger size as latent variables to explicitly capture spatial
information. This is achieved by allowing the latent variables to be sampled
from matrix-variate normal (MVN) distributions whose parameters are computed
from the encoder network. To increase dependencies among locations on latent
feature maps and reduce the number of parameters, we further propose spatial
VAEs via low-rank MVN distributions. Experimental results show that the
proposed spatial VAEs outperform original VAEs in capturing rich structural and
spatial information.Comment: Accepted by SDM2019. Code is publicly available at
https://github.com/divelab/sva
City Scene Super-Resolution via Geometric Error Minimization
Super-resolution techniques are crucial in improving image granularity,
particularly in complex urban scenes, where preserving geometric structures is
vital for data-informed cultural heritage applications. In this paper, we
propose a city scene super-resolution method via geometric error minimization.
The geometric-consistent mechanism leverages the Hough Transform to extract
regular geometric features in city scenes, enabling the computation of
geometric errors between low-resolution and high-resolution images. By
minimizing mixed mean square error and geometric align error during the
super-resolution process, the proposed method efficiently restores details and
geometric regularities. Extensive validations on the SET14, BSD300, Cityscapes
and GSV-Cities datasets demonstrate that the proposed method outperforms
existing state-of-the-art methods, especially in urban scenes.Comment: 26 pages, 10 figure
Improved Approximation Ratios of Fixed-Price Mechanisms in Bilateral Trades
We continue the study of the performance for fixed-price mechanisms in the
bilateral trade problem, and improve approximation ratios of welfare-optimal
mechanisms in several settings. Specifically, in the case where only the buyer
distribution is known, we prove that there exists a distribution over different
fixed-price mechanisms, such that the approximation ratio lies within the
interval of [0.71, 0.7381]. Furthermore, we show that the same approximation
ratio holds for the optimal fixed-price mechanism, when both buyer and seller
distributions are known. As a result, the previously best-known (1 -
1/e+0.0001)-approximation can be improved to . Additionally, we examine
randomized fixed-price mechanisms when we receive just one single sample from
the seller distribution, for both symmetric and asymmetric settings. Our
findings reveal that posting the single sample as the price remains optimal
among all randomized fixed-price mechanisms
- …