1,064 research outputs found
Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation
In order to extract the best possible performance from asynchronous
stochastic gradient descent one must increase the mini-batch size and scale the
learning rate accordingly. In order to achieve further speedup we introduce a
technique that delays gradient updates effectively increasing the mini-batch
size. Unfortunately with the increase of mini-batch size we worsen the stale
gradient problem in asynchronous stochastic gradient descent (SGD) which makes
the model convergence poor. We introduce local optimizers which mitigate the
stale gradient problem and together with fine tuning our momentum we are able
to train a shallow machine translation system 27% faster than an optimized
baseline with negligible penalty in BLEU.Comment: To appear in EMNLP 2018 as a short pape
Map Generation from Large Scale Incomplete and Inaccurate Data Labels
Accurately and globally mapping human infrastructure is an important and
challenging task with applications in routing, regulation compliance
monitoring, and natural disaster response management etc.. In this paper we
present progress in developing an algorithmic pipeline and distributed compute
system that automates the process of map creation using high resolution aerial
images. Unlike previous studies, most of which use datasets that are available
only in a few cities across the world, we utilizes publicly available imagery
and map data, both of which cover the contiguous United States (CONUS). We
approach the technical challenge of inaccurate and incomplete training data
adopting state-of-the-art convolutional neural network architectures such as
the U-Net and the CycleGAN to incrementally generate maps with increasingly
more accurate and more complete labels of man-made infrastructure such as roads
and houses. Since scaling the mapping task to CONUS calls for parallelization,
we then adopted an asynchronous distributed stochastic parallel gradient
descent training scheme to distribute the computational workload onto a cluster
of GPUs with nearly linear speed-up.Comment: This paper is accepted by KDD 202
- …