1,043 research outputs found
Federated Optimization: Distributed Machine Learning for On-Device Intelligence
We introduce a new and increasingly relevant setting for distributed
optimization in machine learning, where the data defining the optimization are
unevenly distributed over an extremely large number of nodes. The goal is to
train a high-quality centralized model. We refer to this setting as Federated
Optimization. In this setting, communication efficiency is of the utmost
importance and minimizing the number of rounds of communication is the
principal goal.
A motivating example arises when we keep the training data locally on users'
mobile devices instead of logging it to a data center for training. In
federated optimziation, the devices are used as compute nodes performing
computation on their local data in order to update a global model. We suppose
that we have extremely large number of devices in the network --- as many as
the number of users of a given service, each of which has only a tiny fraction
of the total data available. In particular, we expect the number of data points
available locally to be much smaller than the number of devices. Additionally,
since different users generate data with different patterns, it is reasonable
to assume that no device has a representative sample of the overall
distribution.
We show that existing algorithms are not suitable for this setting, and
propose a new algorithm which shows encouraging experimental results for sparse
convex problems. This work also sets a path for future research needed in the
context of \federated optimization.Comment: 38 page
A semi-implicit version of the MPAS-atmosphere dynamical core
An important question for atmospheric modeling is the viability of semi-implicit time integration schemes on massively parallel computing architectures. Semi-implicit schemes can provide increased stability and accuracy. However, they require the solution of an elliptic problem at each time step, creating concerns about their parallel efficiency and scalability. Here, a semi-implicit (SI) version of the Model for Prediction Across Scales (MPAS) is developed and compared with the original model version, which uses a split Runge-Kutta (SRK3) time integration scheme. The SI scheme is based on a quasi-Newton iteration toward a Crank-Nicolson scheme. Each Newton iteration requires the solution of a Helmholtz problem; here, the Helmholtz problem is derived, and its solution using a geometric multigrid method is described. On two standard test cases, a midlatitude baroclinic wave and a small-planet nonhydrostatic gravity wave, the SI and SRK3 versions produce almost identical results. On the baroclinic wave test, the SI version can use somewhat larger time steps (about 60%) than the SRK3 version before losing stability. The SI version costs 10%-20% more per step than the SRK3 version, and the weak and strong scalability characteristics of the two versions are very similar for the processor configurations the authors have been able to test (up to 1920 processors). Because of the spatial discretization of the pressure gradient in the lowest model layer, the SI version becomes unstable in the presence of realistic orography. Some further work will be needed to demonstrate the viability of the SI scheme in this case.UK Natural Environment Research Council as part of the G8 ICOMEX projec
- …