3,825 research outputs found
Local SGD Converges Fast and Communicates Little
Mini-batch stochastic gradient descent (SGD) is state of the art in large
scale distributed training. The scheme can reach a linear speedup with respect
to the number of workers, but this is rarely seen in practice as the scheme
often suffers from large network delays and bandwidth limits. To overcome this
communication bottleneck recent works propose to reduce the communication
frequency. An algorithm of this type is local SGD that runs SGD independently
in parallel on different workers and averages the sequences only once in a
while.
This scheme shows promising results in practice, but eluded thorough
theoretical analysis. We prove concise convergence rates for local SGD on
convex problems and show that it converges at the same rate as mini-batch SGD
in terms of number of evaluated gradients, that is, the scheme achieves linear
speedup in the number of workers and mini-batch size. The number of
communication rounds can be reduced up to a factor of T^{1/2}---where T denotes
the number of total steps---compared to mini-batch SGD. This also holds for
asynchronous implementations. Local SGD can also be used for large scale
training of deep learning models.
The results shown here aim serving as a guideline to further explore the
theoretical and practical aspects of local SGD in these applications.Comment: to appear at ICLR 2019, 19 page
Variable Metric Random Pursuit
We consider unconstrained randomized optimization of smooth convex objective
functions in the gradient-free setting. We analyze Random Pursuit (RP)
algorithms with fixed (F-RP) and variable metric (V-RP). The algorithms only
use zeroth-order information about the objective function and compute an
approximate solution by repeated optimization over randomly chosen
one-dimensional subspaces. The distribution of search directions is dictated by
the chosen metric.
Variable Metric RP uses novel variants of a randomized zeroth-order Hessian
approximation scheme recently introduced by Leventhal and Lewis (D. Leventhal
and A. S. Lewis., Optimization 60(3), 329--245, 2011). We here present (i) a
refined analysis of the expected single step progress of RP algorithms and
their global convergence on (strictly) convex functions and (ii) novel
convergence bounds for V-RP on strongly convex functions. We also quantify how
well the employed metric needs to match the local geometry of the function in
order for the RP algorithms to converge with the best possible rate.
Our theoretical results are accompanied by numerical experiments, comparing
V-RP with the derivative-free schemes CMA-ES, Implicit Filtering, Nelder-Mead,
NEWUOA, Pattern-Search and Nesterov's gradient-free algorithms.Comment: 42 pages, 6 figures, 15 tables, submitted to journal, Version 3:
majorly revised second part, i.e. Section 5 and Appendi
Effective Theory of 3H and 3He
We present a new perturbative expansion for pionless effective field theory
with Coulomb interactions in which at leading order the spin-singlet
nucleon-nucleon channels are taken in the unitarity limit. Presenting results
up to next-to-leading order for the Phillips line and the neutron-deuteron
doublet-channel phase shift, we find that a perturbative expansion in the
inverse 1S0 scattering lengths converges rapidly. Using a new systematic
treatment of the proton-proton sector that isolates the divergence due to
one-photon exchange, we renormalize the corresponding contribution to the
3H-3He binding energy splitting and demonstrate that the Coulomb force in
pionless EFT is a completely perturbative effect in the trinucleon bound-state
regime. In our new expansion, the leading order is exactly isospin-symmetric.
At next-to-leading order, we include isospin breaking via the Coulomb force and
two-body scattering lengths, and find for the energy splitting
(E_B(3He)-E_B(3H))^NLO = (-0.86 +/- 0.17) MeV.Comment: 37 pages, 14 figures, published versio
Nuclear Physics Around the Unitarity Limit
We argue that many features of the structure of nuclei emerge from a strictly
perturbative expansion around the unitarity limit, where the two-nucleon S
waves have bound states at zero energy. In this limit, the gross features of
states in the nuclear chart are correlated to only one dimensionful parameter,
which is related to the breaking of scale invariance to a discrete scaling
symmetry and set by the triton binding energy. Observables are moved to their
physical values by small, perturbative corrections, much like in descriptions
of the fine structure of atomic spectra. We provide evidence in favor of the
conjecture that light, and possibly heavier, nuclei are bound weakly enough to
be insensitive to the details of the interactions but strongly enough to be
insensitive to the exact size of the two-nucleon system.Comment: 6 pages, 3 figures, published version, rewritten for clarit
Profound effect of profiling platform and normalization strategy on detection of differentially expressed microRNAs
Adequate normalization minimizes the effects of systematic technical variations and is a prerequisite for getting meaningful biological changes. However, there is inconsistency about miRNA normalization performances and recommendations. Thus, we investigated the impact of seven different normalization methods (reference gene index, global geometric mean, quantile, invariant selection, loess, loessM, and generalized procrustes analysis) on intra- and inter-platform performance of two distinct and commonly used miRNA profiling platforms. We included data from miRNA profiling analyses derived from a hybridization-based platform (Agilent Technologies) and an RT-qPCR platform (Applied Biosystems). Furthermore, we validated a subset of miRNAs by individual RT-qPCR assays. Our analyses incorporated data from the effect of differentiation and tumor necrosis factor alpha treatment on primary human skeletal muscle cells and a murine skeletal muscle cell line. Distinct normalization methods differed in their impact on (i) standard deviations, (ii) the area under the receiver operating characteristic (ROC) curve, (iii) the similarity of differential expression. Loess, loessM, and quantile analysis were most effective in minimizing standard deviations on the Agilent and TLDA platform. Moreover, loess, loessM, invariant selection and generalized procrustes analysis increased the area under the ROC curve, a measure for the statistical performance of a test. The Jaccard index revealed that inter-platform concordance of differential expression tended to be increased by loess, loessM, quantile, and GPA normalization of AGL and TLDA data as well as RGI normalization of TLDA data. We recommend the application of loess, or loessM, and GPA normalization for miRNA Agilent arrays and qPCR cards as these normalization approaches showed to (i) effectively reduce standard deviations, (ii) increase sensitivity and accuracy of differential miRNA expression detection as well as (iii) increase inter-platform concordance. Results showed the successful adoption of loessM and generalized procrustes analysis to one-color miRNA profiling experiments
Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance Reduction
The recently proposed stochastic Polyak stepsize (SPS) and stochastic
line-search (SLS) for SGD have shown remarkable effectiveness when training
over-parameterized models. However, in non-interpolation settings, both
algorithms only guarantee convergence to a neighborhood of a solution which may
result in a worse output than the initial guess. While artificially decreasing
the adaptive stepsize has been proposed to address this issue (Orvieto et al.
[2022]), this approach results in slower convergence rates for convex and
over-parameterized models. In this work, we make two contributions: Firstly, we
propose two new variants of SPS and SLS, called AdaSPS and AdaSLS, which
guarantee convergence in non-interpolation settings and maintain sub-linear and
linear convergence rates for convex and strongly convex functions when training
over-parameterized models. AdaSLS requires no knowledge of problem-dependent
parameters, and AdaSPS requires only a lower bound of the optimal function
value as input. Secondly, we equip AdaSPS and AdaSLS with a novel variance
reduction technique and obtain algorithms that require
gradient evaluations to achieve
an -suboptimality for convex functions, which improves
upon the slower rates of AdaSPS and AdaSLS without
variance reduction in the non-interpolation regimes. Moreover, our result
matches the fast rates of AdaSVRG but removes the inner-outer-loop structure,
which is easier to implement and analyze. Finally, numerical experiments on
synthetic and real datasets validate our theory and demonstrate the
effectiveness and robustness of our algorithms
- …