93,670 research outputs found
Input Fast-Forwarding for Better Deep Learning
This paper introduces a new architectural framework, known as input
fast-forwarding, that can enhance the performance of deep networks. The main
idea is to incorporate a parallel path that sends representations of input
values forward to deeper network layers. This scheme is substantially different
from "deep supervision" in which the loss layer is re-introduced to earlier
layers. The parallel path provided by fast-forwarding enhances the training
process in two ways. First, it enables the individual layers to combine
higher-level information (from the standard processing path) with lower-level
information (from the fast-forward path). Second, this new architecture reduces
the problem of vanishing gradients substantially because the fast-forwarding
path provides a shorter route for gradient backpropagation. In order to
evaluate the utility of the proposed technique, a Fast-Forward Network (FFNet),
with 20 convolutional layers along with parallel fast-forward paths, has been
created and tested. The paper presents empirical results that demonstrate
improved learning capacity of FFNet due to fast-forwarding, as compared to
GoogLeNet (with deep supervision) and CaffeNet, which are 4x and 18x larger in
size, respectively. All of the source code and deep learning models described
in this paper will be made available to the entire research communityComment: Accepted in the 14th International Conference on Image Analysis and
Recognition (ICIAR) 2017, Montreal, Canad
Unsupervised Intuitive Physics from Visual Observations
While learning models of intuitive physics is an increasingly active area of
research, current approaches still fall short of natural intelligences in one
important regard: they require external supervision, such as explicit access to
physical states, at training and sometimes even at test times. Some authors
have relaxed such requirements by supplementing the model with an handcrafted
physical simulator. Still, the resulting methods are unable to automatically
learn new complex environments and to understand physical interactions within
them. In this work, we demonstrated for the first time learning such predictors
directly from raw visual observations and without relying on simulators. We do
so in two steps: first, we learn to track mechanically-salient objects in
videos using causality and equivariance, two unsupervised learning principles
that do not require auto-encoding. Second, we demonstrate that the extracted
positions are sufficient to successfully train visual motion predictors that
can take the underlying environment into account. We validate our predictors on
synthetic datasets; then, we introduce a new dataset, ROLL4REAL, consisting of
real objects rolling on complex terrains (pool table, elliptical bowl, and
random height-field). We show that in all such cases it is possible to learn
reliable extrapolators of the object trajectories from raw videos alone,
without any form of external supervision and with no more prior knowledge than
the choice of a convolutional neural network architecture
CompILE: Compositional Imitation Learning and Execution
We introduce Compositional Imitation Learning and Execution (CompILE): a
framework for learning reusable, variable-length segments of
hierarchically-structured behavior from demonstration data. CompILE uses a
novel unsupervised, fully-differentiable sequence segmentation module to learn
latent encodings of sequential data that can be re-composed and executed to
perform new tasks. Once trained, our model generalizes to sequences of longer
length and from environment instances not seen during training. We evaluate
CompILE in a challenging 2D multi-task environment and a continuous control
task, and show that it can find correct task boundaries and event encodings in
an unsupervised manner. Latent codes and associated behavior policies
discovered by CompILE can be used by a hierarchical agent, where the high-level
policy selects actions in the latent code space, and the low-level,
task-specific policies are simply the learned decoders. We found that our
CompILE-based agent could learn given only sparse rewards, where agents without
task-specific policies struggle.Comment: ICML (2019
Learning a Complete Image Indexing Pipeline
To work at scale, a complete image indexing system comprises two components:
An inverted file index to restrict the actual search to only a subset that
should contain most of the items relevant to the query; An approximate distance
computation mechanism to rapidly scan these lists. While supervised deep
learning has recently enabled improvements to the latter, the former continues
to be based on unsupervised clustering in the literature. In this work, we
propose a first system that learns both components within a unifying neural
framework of structured binary encoding
Learning a Complete Image Indexing Pipeline
To work at scale, a complete image indexing system comprises two components:
An inverted file index to restrict the actual search to only a subset that
should contain most of the items relevant to the query; An approximate distance
computation mechanism to rapidly scan these lists. While supervised deep
learning has recently enabled improvements to the latter, the former continues
to be based on unsupervised clustering in the literature. In this work, we
propose a first system that learns both components within a unifying neural
framework of structured binary encoding
Hashing as Tie-Aware Learning to Rank
Hashing, or learning binary embeddings of data, is frequently used in nearest
neighbor retrieval. In this paper, we develop learning to rank formulations for
hashing, aimed at directly optimizing ranking-based evaluation metrics such as
Average Precision (AP) and Normalized Discounted Cumulative Gain (NDCG). We
first observe that the integer-valued Hamming distance often leads to tied
rankings, and propose to use tie-aware versions of AP and NDCG to evaluate
hashing for retrieval. Then, to optimize tie-aware ranking metrics, we derive
their continuous relaxations, and perform gradient-based optimization with deep
neural networks. Our results establish the new state-of-the-art for image
retrieval by Hamming ranking in common benchmarks.Comment: 15 pages, 3 figures. IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 201
DEEP FULLY RESIDUAL CONVOLUTIONAL NEURAL NETWORK FOR SEMANTIC IMAGE SEGMENTATION
Department of Computer Science and EngineeringThe goal of semantic image segmentation is to partition the pixels of an image into semantically meaningful parts and classifying those parts according to a predefined label set. Although object recognition
models achieved remarkable performance recently and they even surpass human???s ability to recognize
objects, but semantic segmentation models are still behind. One of the reason that makes semantic
segmentation relatively a hard problem is the image understanding at pixel level by considering global
context as oppose to object recognition. One other challenge is transferring the knowledge of an object
recognition model for the task of semantic segmentation. In this thesis, we are delineating some of the
main challenges we faced approaching semantic image segmentation with machine learning algorithms.
Our main focus was how we can use deep learning algorithms for this task since they require the
least amount of feature engineering and also it was shown that such models can be applied to large scale
datasets and exhibit remarkable performance. More precisely, we worked on a variation of convolutional
neural networks (CNN) suitable for the semantic segmentation task. We proposed a model called deep
fully residual convolutional networks (DFRCN) to tackle this problem. Utilizing residual learning makes
training of deep models feasible which ultimately leads to having a rich powerful visual representation.
Our model also benefits from skip-connections which ease the propagation of information from the
encoder module to the decoder module. This would enable our model to have less parameters in the
decoder module while it also achieves better performance. We also benchmarked the effective variation
of the proposed model on a semantic segmentation benchmark.
We first make a thorough review of current high-performance models and the problems one might
face when trying to replicate such models which mainly arose from the lack of sufficient provided
information. Then, we describe our own novel method which we called deep fully residual convolutional
network (DFRCN). We showed that our method exhibits state of the art performance on a challenging
benchmark for aerial image segmentation.clos
- âŠ