3,825 research outputs found

    Local SGD Converges Fast and Communicates Little

    Get PDF
    Mini-batch stochastic gradient descent (SGD) is state of the art in large scale distributed training. The scheme can reach a linear speedup with respect to the number of workers, but this is rarely seen in practice as the scheme often suffers from large network delays and bandwidth limits. To overcome this communication bottleneck recent works propose to reduce the communication frequency. An algorithm of this type is local SGD that runs SGD independently in parallel on different workers and averages the sequences only once in a while. This scheme shows promising results in practice, but eluded thorough theoretical analysis. We prove concise convergence rates for local SGD on convex problems and show that it converges at the same rate as mini-batch SGD in terms of number of evaluated gradients, that is, the scheme achieves linear speedup in the number of workers and mini-batch size. The number of communication rounds can be reduced up to a factor of T^{1/2}---where T denotes the number of total steps---compared to mini-batch SGD. This also holds for asynchronous implementations. Local SGD can also be used for large scale training of deep learning models. The results shown here aim serving as a guideline to further explore the theoretical and practical aspects of local SGD in these applications.Comment: to appear at ICLR 2019, 19 page

    Variable Metric Random Pursuit

    Full text link
    We consider unconstrained randomized optimization of smooth convex objective functions in the gradient-free setting. We analyze Random Pursuit (RP) algorithms with fixed (F-RP) and variable metric (V-RP). The algorithms only use zeroth-order information about the objective function and compute an approximate solution by repeated optimization over randomly chosen one-dimensional subspaces. The distribution of search directions is dictated by the chosen metric. Variable Metric RP uses novel variants of a randomized zeroth-order Hessian approximation scheme recently introduced by Leventhal and Lewis (D. Leventhal and A. S. Lewis., Optimization 60(3), 329--245, 2011). We here present (i) a refined analysis of the expected single step progress of RP algorithms and their global convergence on (strictly) convex functions and (ii) novel convergence bounds for V-RP on strongly convex functions. We also quantify how well the employed metric needs to match the local geometry of the function in order for the RP algorithms to converge with the best possible rate. Our theoretical results are accompanied by numerical experiments, comparing V-RP with the derivative-free schemes CMA-ES, Implicit Filtering, Nelder-Mead, NEWUOA, Pattern-Search and Nesterov's gradient-free algorithms.Comment: 42 pages, 6 figures, 15 tables, submitted to journal, Version 3: majorly revised second part, i.e. Section 5 and Appendi

    Effective Theory of 3H and 3He

    Full text link
    We present a new perturbative expansion for pionless effective field theory with Coulomb interactions in which at leading order the spin-singlet nucleon-nucleon channels are taken in the unitarity limit. Presenting results up to next-to-leading order for the Phillips line and the neutron-deuteron doublet-channel phase shift, we find that a perturbative expansion in the inverse 1S0 scattering lengths converges rapidly. Using a new systematic treatment of the proton-proton sector that isolates the divergence due to one-photon exchange, we renormalize the corresponding contribution to the 3H-3He binding energy splitting and demonstrate that the Coulomb force in pionless EFT is a completely perturbative effect in the trinucleon bound-state regime. In our new expansion, the leading order is exactly isospin-symmetric. At next-to-leading order, we include isospin breaking via the Coulomb force and two-body scattering lengths, and find for the energy splitting (E_B(3He)-E_B(3H))^NLO = (-0.86 +/- 0.17) MeV.Comment: 37 pages, 14 figures, published versio

    Nuclear Physics Around the Unitarity Limit

    Full text link
    We argue that many features of the structure of nuclei emerge from a strictly perturbative expansion around the unitarity limit, where the two-nucleon S waves have bound states at zero energy. In this limit, the gross features of states in the nuclear chart are correlated to only one dimensionful parameter, which is related to the breaking of scale invariance to a discrete scaling symmetry and set by the triton binding energy. Observables are moved to their physical values by small, perturbative corrections, much like in descriptions of the fine structure of atomic spectra. We provide evidence in favor of the conjecture that light, and possibly heavier, nuclei are bound weakly enough to be insensitive to the details of the interactions but strongly enough to be insensitive to the exact size of the two-nucleon system.Comment: 6 pages, 3 figures, published version, rewritten for clarit

    Profound effect of profiling platform and normalization strategy on detection of differentially expressed microRNAs

    Get PDF
    Adequate normalization minimizes the effects of systematic technical variations and is a prerequisite for getting meaningful biological changes. However, there is inconsistency about miRNA normalization performances and recommendations. Thus, we investigated the impact of seven different normalization methods (reference gene index, global geometric mean, quantile, invariant selection, loess, loessM, and generalized procrustes analysis) on intra- and inter-platform performance of two distinct and commonly used miRNA profiling platforms. We included data from miRNA profiling analyses derived from a hybridization-based platform (Agilent Technologies) and an RT-qPCR platform (Applied Biosystems). Furthermore, we validated a subset of miRNAs by individual RT-qPCR assays. Our analyses incorporated data from the effect of differentiation and tumor necrosis factor alpha treatment on primary human skeletal muscle cells and a murine skeletal muscle cell line. Distinct normalization methods differed in their impact on (i) standard deviations, (ii) the area under the receiver operating characteristic (ROC) curve, (iii) the similarity of differential expression. Loess, loessM, and quantile analysis were most effective in minimizing standard deviations on the Agilent and TLDA platform. Moreover, loess, loessM, invariant selection and generalized procrustes analysis increased the area under the ROC curve, a measure for the statistical performance of a test. The Jaccard index revealed that inter-platform concordance of differential expression tended to be increased by loess, loessM, quantile, and GPA normalization of AGL and TLDA data as well as RGI normalization of TLDA data. We recommend the application of loess, or loessM, and GPA normalization for miRNA Agilent arrays and qPCR cards as these normalization approaches showed to (i) effectively reduce standard deviations, (ii) increase sensitivity and accuracy of differential miRNA expression detection as well as (iii) increase inter-platform concordance. Results showed the successful adoption of loessM and generalized procrustes analysis to one-color miRNA profiling experiments

    Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance Reduction

    Full text link
    The recently proposed stochastic Polyak stepsize (SPS) and stochastic line-search (SLS) for SGD have shown remarkable effectiveness when training over-parameterized models. However, in non-interpolation settings, both algorithms only guarantee convergence to a neighborhood of a solution which may result in a worse output than the initial guess. While artificially decreasing the adaptive stepsize has been proposed to address this issue (Orvieto et al. [2022]), this approach results in slower convergence rates for convex and over-parameterized models. In this work, we make two contributions: Firstly, we propose two new variants of SPS and SLS, called AdaSPS and AdaSLS, which guarantee convergence in non-interpolation settings and maintain sub-linear and linear convergence rates for convex and strongly convex functions when training over-parameterized models. AdaSLS requires no knowledge of problem-dependent parameters, and AdaSPS requires only a lower bound of the optimal function value as input. Secondly, we equip AdaSPS and AdaSLS with a novel variance reduction technique and obtain algorithms that require O~(n+1/ϵ)\smash{\widetilde{\mathcal{O}}}(n+1/\epsilon) gradient evaluations to achieve an O(ϵ)\mathcal{O}(\epsilon)-suboptimality for convex functions, which improves upon the slower O(1/ϵ2)\mathcal{O}(1/\epsilon^2) rates of AdaSPS and AdaSLS without variance reduction in the non-interpolation regimes. Moreover, our result matches the fast rates of AdaSVRG but removes the inner-outer-loop structure, which is easier to implement and analyze. Finally, numerical experiments on synthetic and real datasets validate our theory and demonstrate the effectiveness and robustness of our algorithms
    corecore