9,057 research outputs found
Flow-based Intrinsic Curiosity Module
In this paper, we focus on a prediction-based novelty estimation strategy
upon the deep reinforcement learning (DRL) framework, and present a flow-based
intrinsic curiosity module (FICM) to exploit the prediction errors from optical
flow estimation as exploration bonuses. We propose the concept of leveraging
motion features captured between consecutive observations to evaluate the
novelty of observations in an environment. FICM encourages a DRL agent to
explore observations with unfamiliar motion features, and requires only two
consecutive frames to obtain sufficient information when estimating the
novelty. We evaluate our method and compare it with a number of existing
methods on multiple benchmark environments, including Atari games, Super Mario
Bros., and ViZDoom. We demonstrate that FICM is favorable to tasks or
environments featuring moving objects, which allow FICM to utilize the motion
features between consecutive observations. We further ablatively analyze the
encoding efficiency of FICM, and discuss its applicable domains
comprehensively.Comment: The SOLE copyright holder is IJCAI (International Joint Conferences
on Artificial Intelligence), all rights reserved. The link is provided as
follows: https://www.ijcai.org/Proceedings/2020/28
Excitation Backprop for RNNs
Deep models are state-of-the-art for many vision tasks including video action
recognition and video captioning. Models are trained to caption or classify
activity in videos, but little is known about the evidence used to make such
decisions. Grounding decisions made by deep networks has been studied in
spatial visual content, giving more insight into model predictions for images.
However, such studies are relatively lacking for models of spatiotemporal
visual content - videos. In this work, we devise a formulation that
simultaneously grounds evidence in space and time, in a single pass, using
top-down saliency. We visualize the spatiotemporal cues that contribute to a
deep model's classification/captioning output using the model's internal
representation. Based on these spatiotemporal cues, we are able to localize
segments within a video that correspond with a specific action, or phrase from
a caption, without explicitly optimizing/training for these tasks.Comment: CVPR 2018 Camera Ready Versio
Towards Building Deep Networks with Bayesian Factor Graphs
We propose a Multi-Layer Network based on the Bayesian framework of the
Factor Graphs in Reduced Normal Form (FGrn) applied to a two-dimensional
lattice. The Latent Variable Model (LVM) is the basic building block of a
quadtree hierarchy built on top of a bottom layer of random variables that
represent pixels of an image, a feature map, or more generally a collection of
spatially distributed discrete variables. The multi-layer architecture
implements a hierarchical data representation that, via belief propagation, can
be used for learning and inference. Typical uses are pattern completion,
correction and classification. The FGrn paradigm provides great flexibility and
modularity and appears as a promising candidate for building deep networks: the
system can be easily extended by introducing new and different (in cardinality
and in type) variables. Prior knowledge, or supervised information, can be
introduced at different scales. The FGrn paradigm provides a handy way for
building all kinds of architectures by interconnecting only three types of
units: Single Input Single Output (SISO) blocks, Sources and Replicators. The
network is designed like a circuit diagram and the belief messages flow
bidirectionally in the whole system. The learning algorithms operate only
locally within each block. The framework is demonstrated in this paper in a
three-layer structure applied to images extracted from a standard data set.Comment: Submitted for journal publicatio
Convolutional LSTM Networks for Subcellular Localization of Proteins
Machine learning is widely used to analyze biological sequence data.
Non-sequential models such as SVMs or feed-forward neural networks are often
used although they have no natural way of handling sequences of varying length.
Recurrent neural networks such as the long short term memory (LSTM) model on
the other hand are designed to handle sequences. In this study we demonstrate
that LSTM networks predict the subcellular location of proteins given only the
protein sequence with high accuracy (0.902) outperforming current state of the
art algorithms. We further improve the performance by introducing convolutional
filters and experiment with an attention mechanism which lets the LSTM focus on
specific parts of the protein. Lastly we introduce new visualizations of both
the convolutional filters and the attention mechanisms and show how they can be
used to extract biological relevant knowledge from the LSTM networks
- …