Many works have posited the benefit of depth in deep networks. However,
one of the problems encountered in the training of very deep networks is feature
reuse; that is, features are ’diluted’ as they are forward propagated through
the model. Hence, later network layers receive less informative signals about the
input data, consequently making training less effective. In this work, we address
the problem of feature reuse by taking inspiration from an earlier work which
employed residual learning for alleviating the problem of feature reuse. We propose
a modification of residual learning for training very deep networks to realize
improved generalization performance; for this, we allow stochastic shortcut connections
of identity mappings from the input to hidden layers.We perform extensive
experiments using the USPS and MNIST datasets. On the USPS dataset, we
achieve an error rate of 2.69% without employing any form of data augmentation
(or manipulation). On the MNIST dataset, we reach a comparable state-of-the-art
error rate of 0.52%. Particularly, these results are achieved without employing
any explicit regularization technique

K Hornik

KI Funahashi

L Trottier

M Bianchini

N Srivastava

OK Oyedotun

PY Simard

TH Chan

X Glorot

Oyebade K. Oyedotun

Abd El Rahman Shabayek

Djamila Aouada

Björn Ottersten

Crossref

Training Very Deep Networks via Residual Learning with Stochastic Input Shortcut Connections

peer reviewedMany works have posited the benefit of depth in deep networks. However,
one of the problems encountered in the training of very deep networks is feature
reuse; that is, features are ’diluted’ as they are forward propagated through
the model. Hence, later network layers receive less informative signals about the
input data, consequently making training less effective. In this work, we address
the problem of feature reuse by taking inspiration from an earlier work which
employed residual learning for alleviating the problem of feature reuse. We propose
a modification of residual learning for training very deep networks to realize
improved generalization performance; for this, we allow stochastic shortcut connections
of identity mappings from the input to hidden layers.We perform extensive
experiments using the USPS and MNIST datasets. On the USPS dataset, we
achieve an error rate of 2.69% without employing any form of data augmentation
(or manipulation). On the MNIST dataset, we reach a comparable state-of-the-art
error rate of 0.52%. Particularly, these results are achieved without employing
any explicit regularization technique

Training Very Deep Networks via Residual Learning with Stochastic Input Shortcut Connections

Abstract

Similar works

Full text

Available Versions

Crossref

Open Repository and Bibliography - Luxembourg