1,064 research outputs found
Robust learning with implicit residual networks
In this effort, we propose a new deep architecture utilizing residual blocks
inspired by implicit discretization schemes. As opposed to the standard
feed-forward networks, the outputs of the proposed implicit residual blocks are
defined as the fixed points of the appropriately chosen nonlinear
transformations. We show that this choice leads to the improved stability of
both forward and backward propagations, has a favorable impact on the
generalization power and allows to control the robustness of the network with
only a few hyperparameters. In addition, the proposed reformulation of ResNet
does not introduce new parameters and can potentially lead to a reduction in
the number of required layers due to improved forward stability. Finally, we
derive the memory-efficient training algorithm, propose a stochastic
regularization technique and provide numerical results in support of our
findings
Reversible GANs for Memory-efficient Image-to-Image Translation
The Pix2pix and CycleGAN losses have vastly improved the qualitative and
quantitative visual quality of results in image-to-image translation tasks. We
extend this framework by exploring approximately invertible architectures which
are well suited to these losses. These architectures are approximately
invertible by design and thus partially satisfy cycle-consistency before
training even begins. Furthermore, since invertible architectures have constant
memory complexity in depth, these models can be built arbitrarily deep. We are
able to demonstrate superior quantitative output on the Cityscapes and Maps
datasets at near constant memory budget
Sparsely Aggregated Convolutional Networks
We explore a key architectural aspect of deep convolutional neural networks:
the pattern of internal skip connections used to aggregate outputs of earlier
layers for consumption by deeper layers. Such aggregation is critical to
facilitate training of very deep networks in an end-to-end manner. This is a
primary reason for the widespread adoption of residual networks, which
aggregate outputs via cumulative summation. While subsequent works investigate
alternative aggregation operations (e.g. concatenation), we focus on an
orthogonal question: which outputs to aggregate at a particular point in the
network. We propose a new internal connection structure which aggregates only a
sparse set of previous outputs at any given depth. Our experiments demonstrate
this simple design change offers superior performance with fewer parameters and
lower computational requirements. Moreover, we show that sparse aggregation
allows networks to scale more robustly to 1000+ layers, thereby opening future
avenues for training long-running visual processes.Comment: Accepted to ECCV 201
- …