152 research outputs found
Textbook Question Answering with Multi-modal Context Graph Understanding and Self-supervised Open-set Comprehension
In this work, we introduce a novel algorithm for solving the textbook
question answering (TQA) task which describes more realistic QA problems
compared to other recent tasks. We mainly focus on two related issues with
analysis of the TQA dataset. First, solving the TQA problems requires to
comprehend multi-modal contexts in complicated input data. To tackle this issue
of extracting knowledge features from long text lessons and merging them with
visual features, we establish a context graph from texts and images, and
propose a new module f-GCN based on graph convolutional networks (GCN). Second,
scientific terms are not spread over the chapters and subjects are split in the
TQA dataset. To overcome this so called "out-of-domain" issue, before learning
QA problems, we introduce a novel self-supervised open-set learning process
without any annotations. The experimental results show that our model
significantly outperforms prior state-of-the-art methods. Moreover, ablation
studies validate that both methods of incorporating f-GCN for extracting
knowledge from multi-modal contexts and our newly proposed self-supervised
learning process are effective for TQA problems.Comment: ACL2019 Camera-read
URNet : User-Resizable Residual Networks with Conditional Gating Module
Convolutional Neural Networks are widely used to process spatial scenes, but
their computational cost is fixed and depends on the structure of the network
used. There are methods to reduce the cost by compressing networks or varying
its computational path dynamically according to the input image. However, since
a user can not control the size of the learned model, it is difficult to
respond dynamically if the amount of service requests suddenly increases. We
propose User-Resizable Residual Networks (URNet), which allows users to adjust
the scale of the network as needed during evaluation. URNet includes
Conditional Gating Module (CGM) that determines the use of each residual block
according to the input image and the desired scale. CGM is trained in a
supervised manner using the newly proposed scale loss and its corresponding
training methods. URNet can control the amount of computation according to
user's demand without degrading the accuracy significantly. It can also be used
as a general compression method by fixing the scale size during training. In
the experiments on ImageNet, URNet based on ResNet-101 maintains the accuracy
of the baseline even when resizing it to approximately 80% of the original
network, and demonstrates only about 1% accuracy degradation when using about
65% of the computation.Comment: 12 page
Generalized mean for robust principal component analysis
AbstractIn this paper, we propose a robust principal component analysis (PCA) to overcome the problem that PCA is prone to outliers included in the training set. Different from the other alternatives which commonly replace L2-norm by other distance measures, the proposed method alleviates the negative effect of outliers using the characteristic of the generalized mean keeping the use of the Euclidean distance. The optimization problem based on the generalized mean is solved by a novel method. We also present a generalized sample mean, which is a generalization of the sample mean, to estimate a robust mean in the presence of outliers. The proposed method shows better or equivalent performance than the conventional PCAs in various problems such as face reconstruction, clustering, and object categorization
Superpixel-based Semantic Segmentation Trained by Statistical Process Control
Semantic segmentation, like other fields of computer vision, has seen a
remarkable performance advance by the use of deep convolution neural networks.
However, considering that neighboring pixels are heavily dependent on each
other, both learning and testing of these methods have a lot of redundant
operations. To resolve this problem, the proposed network is trained and tested
with only 0.37% of total pixels by superpixel-based sampling and largely
reduced the complexity of upsampling calculation. The hypercolumn feature maps
are constructed by pyramid module in combination with the convolution layers of
the base network. Since the proposed method uses a very small number of sampled
pixels, the end-to-end learning of the entire network is difficult with a
common learning rate for all the layers. In order to resolve this problem, the
learning rate after sampling is controlled by statistical process control (SPC)
of gradients in each layer. The proposed method performs better than or equal
to the conventional methods that use much more samples on Pascal Context,
SUN-RGBD dataset.Comment: Accepted in British Machine Vision Conference (BMVC), 201
- …