8,219 research outputs found
Towards Learning a Self-inverse Network for Bidirectional Image-to-image Translation
The one-to-one mapping is necessary for many bidirectional image-to-image
translation applications, such as MRI image synthesis as MRI images are unique
to the patient. State-of-the-art approaches for image synthesis from domain X
to domain Y learn a convolutional neural network that meticulously maps between
the domains. A different network is typically implemented to map along the
opposite direction, from Y to X. In this paper, we explore the possibility of
only wielding one network for bi-directional image synthesis. In other words,
such an autonomous learning network implements a self-inverse function. A
self-inverse network shares several distinct advantages: only one network
instead of two, better generalization and more restricted parameter space. Most
importantly, a self-inverse function guarantees a one-to-one mapping, a
property that cannot be guaranteed by earlier approaches that are not
self-inverse. The experiments on three datasets show that, compared with the
baseline approaches that use two separate models for the image synthesis along
two directions, our self-inverse network achieves better synthesis results in
terms of standard metrics. Finally, our sensitivity analysis confirms the
feasibility of learning a self-inverse function for the bidirectional image
translation.Comment: 10 pages, 9 figure
A Compositional Textual Model for Recognition of Imperfect Word Images
Printed text recognition is an important problem for industrial OCR systems.
Printed text is constructed in a standard procedural fashion in most settings.
We develop a mathematical model for this process that can be applied to the
backward inference problem of text recognition from an image. Through ablation
experiments we show that this model is realistic and that a multi-task
objective setting can help to stabilize estimation of its free parameters,
enabling use of conventional deep learning methods. Furthermore, by directly
modeling the geometric perturbations of text synthesis we show that our model
can help recover missing characters from incomplete text regions, the bane of
multicomponent OCR systems, enabling recognition even when the detection
returns incomplete information
On the Origin of Deep Learning
This paper is a review of the evolutionary history of deep learning models.
It covers from the genesis of neural networks when associationism modeling of
the brain is studied, to the models that dominate the last decade of research
in deep learning like convolutional neural networks, deep belief networks, and
recurrent neural networks. In addition to a review of these models, this paper
primarily focuses on the precedents of the models above, examining how the
initial ideas are assembled to construct the early models and how these
preliminary models are developed into their current forms. Many of these
evolutionary paths last more than half a century and have a diversity of
directions. For example, CNN is built on prior knowledge of biological vision
system; DBN is evolved from a trade-off of modeling power and computation
complexity of graphical models and many nowadays models are neural counterparts
of ancient linear models. This paper reviews these evolutionary paths and
offers a concise thought flow of how these models are developed, and aims to
provide a thorough background for deep learning. More importantly, along with
the path, this paper summarizes the gist behind these milestones and proposes
many directions to guide the future research of deep learning.Comment: 70 pages, 200 reference
Improve Diverse Text Generation by Self Labeling Conditional Variational Auto Encoder
Diversity plays a vital role in many text generating applications. In recent
years, Conditional Variational Auto Encoders (CVAE) have shown promising
performances for this task. However, they often encounter the so called
KL-Vanishing problem. Previous works mitigated such problem by heuristic
methods such as strengthening the encoder or weakening the decoder while
optimizing the CVAE objective function. Nevertheless, the optimizing direction
of these methods are implicit and it is hard to find an appropriate degree to
which these methods should be applied. In this paper, we propose an explicit
optimizing objective to complement the CVAE to directly pull away from
KL-vanishing. In fact, this objective term guides the encoder towards the "best
encoder" of the decoder to enhance the expressiveness. A labeling network is
introduced to estimate the "best encoder". It provides a continuous label in
the latent space of CVAE to help build a close connection between latent
variables and targets. The whole proposed method is named Self Labeling
CVAE~(SLCVAE). To accelerate the research of diverse text generation, we also
propose a large native one-to-many dataset. Extensive experiments are conducted
on two tasks, which show that our method largely improves the generating
diversity while achieving comparable accuracy compared with state-of-art
algorithms.Comment: Accepted as a conference paper in ICASSP 2019. But this copy is an
extended version of the submitted manuscript. With more theoretical analysis
and human evaluatio
When Autonomous Systems Meet Accuracy and Transferability through AI: A Survey
With widespread applications of artificial intelligence (AI), the
capabilities of the perception, understanding, decision-making and control for
autonomous systems have improved significantly in the past years. When
autonomous systems consider the performance of accuracy and transferability,
several AI methods, like adversarial learning, reinforcement learning (RL) and
meta-learning, show their powerful performance. Here, we review the
learning-based approaches in autonomous systems from the perspectives of
accuracy and transferability. Accuracy means that a well-trained model shows
good results during the testing phase, in which the testing set shares a same
task or a data distribution with the training set. Transferability means that
when a well-trained model is transferred to other testing domains, the accuracy
is still good. Firstly, we introduce some basic concepts of transfer learning
and then present some preliminaries of adversarial learning, RL and
meta-learning. Secondly, we focus on reviewing the accuracy or transferability
or both of them to show the advantages of adversarial learning, like generative
adversarial networks (GANs), in typical computer vision tasks in autonomous
systems, including image style transfer, image superresolution, image
deblurring/dehazing/rain removal, semantic segmentation, depth estimation,
pedestrian detection and person re-identification (re-ID). Then, we further
review the performance of RL and meta-learning from the aspects of accuracy or
transferability or both of them in autonomous systems, involving pedestrian
tracking, robot navigation and robotic manipulation. Finally, we discuss
several challenges and future topics for using adversarial learning, RL and
meta-learning in autonomous systems
A Comprehensive Survey of Deep Learning for Image Captioning
Generating a description of an image is called image captioning. Image
captioning requires to recognize the important objects, their attributes and
their relationships in an image. It also needs to generate syntactically and
semantically correct sentences. Deep learning-based techniques are capable of
handling the complexities and challenges of image captioning. In this survey
paper, we aim to present a comprehensive review of existing deep learning-based
image captioning techniques. We discuss the foundation of the techniques to
analyze their performances, strengths and limitations. We also discuss the
datasets and the evaluation metrics popularly used in deep learning based
automatic image captioning.Comment: 36 Pages, Accepted as a Journal Paper in ACM Computing Surveys
(October 2018
An Empirical Investigation of Global and Local Normalization for Recurrent Neural Sequence Models Using a Continuous Relaxation to Beam Search
Globally normalized neural sequence models are considered superior to their
locally normalized equivalents because they may ameliorate the effects of label
bias. However, when considering high-capacity neural parametrizations that
condition on the whole input sequence, both model classes are theoretically
equivalent in terms of the distributions they are capable of representing.
Thus, the practical advantage of global normalization in the context of modern
neural methods remains unclear. In this paper, we attempt to shed light on this
problem through an empirical study. We extend an approach for search-aware
training via a continuous relaxation of beam search (Goyal et al., 2017b) in
order to enable training of globally normalized recurrent sequence models
through simple backpropagation. We then use this technique to conduct an
empirical study of the interaction between global normalization, high-capacity
encoders, and search-aware optimization. We observe that in the context of
inexact search, globally normalized neural models are still more effective than
their locally normalized counterparts. Further, since our training approach is
sensitive to warm-starting with pre-trained models, we also propose a novel
initialization strategy based on self-normalization for pre-training globally
normalized models. We perform analysis of our approach on two tasks: CCG
supertagging and Machine Translation, and demonstrate the importance of global
normalization under different conditions while using search-aware training.Comment: Long paper at NAACL 201
Self-Supervised Flow Estimation using Geometric Regularization with Applications to Camera Image and Grid Map Sequences
We present a self-supervised approach to estimate flow in camera image and
top-view grid map sequences using fully convolutional neural networks in the
domain of automated driving. We extend existing approaches for self-supervised
optical flow estimation by adding a regularizer expressing motion consistency
assuming a static environment. However, as this assumption is violated for
other moving traffic participants we also estimate a mask to scale this
regularization. Adding a regularization towards motion consistency improves
convergence and flow estimation accuracy. Furthermore, we scale the errors due
to spatial flow inconsistency by a mask that we derive from the motion mask.
This improves accuracy in regions where the flow drastically changes due to a
better separation between static and dynamic environment. We apply our approach
to optical flow estimation from camera image sequences, validate on odometry
estimation and suggest a method to iteratively increase optical flow estimation
accuracy using the generated motion masks. Finally, we provide quantitative and
qualitative results based on the KITTI odometry and tracking benchmark for
scene flow estimation based on grid map sequences. We show that we can improve
accuracy and convergence when applying motion and spatial consistency
regularization.Comment: 6 pages, 5 figure
Neural Allocentric Intuitive Physics Prediction from Real Videos
Humans are able to make rich predictions about the future dynamics of
physical objects from a glance. On the other hand, most existing computer
vision approaches require strong assumptions about the underlying system,
ad-hoc modeling, or annotated datasets, to carry out even simple predictions.
To tackle this gap, we propose a new perspective on the problem of learning
intuitive physics that is inspired by the spatial memory representation of
objects and spaces in human brains, in particular the co-existence of
egocentric and allocentric spatial representations. We present a generic
framework that learns a layered representation of the physical world, using a
cascade of invertible modules. In this framework, real images are first
converted to a synthetic domain representation that reduces complexity arising
from lighting and texture. Then, an allocentric viewpoint transformer removes
viewpoint complexity by projecting images to a canonical view. Finally, a novel
Recurrent Latent Variation Network (RLVN) architecture learns the dynamics of
the objects interacting with the environment and predicts future motion,
leveraging the availability of unlimited synthetic simulations. Predicted
frames are then projected back to the original camera view and translated back
to the real world domain. Experimental results show the ability of the
framework to consistently and accurately predict several frames in the future
and the ability to adapt to real images.Comment: Added references, minor changes. arXiv admin note: text overlap with
arXiv:1506.02025 by other author
Action Representations in Robotics: A Taxonomy and Systematic Classification
Understanding and defining the meaning of "action" is substantial for
robotics research. This becomes utterly evident when aiming at equipping
autonomous robots with robust manipulation skills for action execution.
Unfortunately, to this day we still lack both a clear understanding of the
concept of an action and a set of established criteria that ultimately
characterize an action. In this survey we thus first review existing ideas and
theories on the notion and meaning of action. Subsequently we discuss the role
of action in robotics and attempt to give a seminal definition of action in
accordance with its use in robotics research. Given this definition we then
introduce a taxonomy for categorizing action representations in robotics along
various dimensions. Finally, we provide a systematic literature survey on
action representations in robotics where we categorize relevant literature
along our taxonomy. After discussing the current state of the art we conclude
with an outlook towards promising research directions.Comment: 36 pages, 4 figures, 7 tables, submitted to the International Journal
of Robotics Research (IJRR
- …