1 research outputs found
A block-random algorithm for learning on distributed, heterogeneous data
Most deep learning models are based on deep neural networks with multiple
layers between input and output. The parameters defining these layers are
initialized using random values and are "learned" from data, typically using
stochastic gradient descent based algorithms. These algorithms rely on data
being randomly shuffled before optimization. The randomization of the data
prior to processing in batches that is formally required for stochastic
gradient descent algorithm to effectively derive a useful deep learning model
is expected to be prohibitively expensive for in situ model training because of
the resulting data communications across the processor nodes. We show that the
stochastic gradient descent (SGD) algorithm can still make useful progress if
the batches are defined on a per-processor basis and processed in random order
even though (i) the batches are constructed from data samples from a single
class or specific flow region, and (ii) the overall data samples are
heterogeneous. We present block-random gradient descent, a new algorithm that
works on distributed, heterogeneous data without having to pre-shuffle. This
algorithm enables in situ learning for exascale simulations. The performance of
this algorithm is demonstrated on a set of benchmark classification models and
the construction of a subgrid scale large eddy simulations (LES) model for
turbulent channel flow using a data model similar to that which will be
encountered in exascale simulation