Search CORE

146 research outputs found

Constant Step Size Least-Mean-Square: Bias-Variance Trade-offs and Optimal Sampling Distributions

Author: Bach Francis
Défossez Alexandre
Publication venue
Publication date: 29/11/2014
Field of study

We consider the least-squares regression problem and provide a detailed asymptotic analysis of the performance of averaged constant-step-size stochastic gradient descent (a.k.a. least-mean-squares). In the strongly-convex case, we provide an asymptotic expansion up to explicit exponentially decaying terms. Our analysis leads to new insights into stochastic approximation algorithms: (a) it gives a tighter bound on the allowed step-size; (b) the generalization error may be divided into a variance term which is decaying as O(1/n), independently of the step-size

\gamma

, and a bias term that decays as O(1/

\gamma

2 n 2); (c) when allowing non-uniform sampling, the choice of a good sampling density depends on whether the variance or bias terms dominate. In particular, when the variance term dominates, optimal sampling densities do not lead to much gain, while when the bias term dominates, we can choose larger step-sizes that leads to significant improvements

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

AdaBatch: Efficient Gradient Aggregation Rules for Sequential and Parallel Stochastic Gradient Methods

Author: Bach Francis
Défossez Alexandre
Publication venue
Publication date: 03/11/2017
Field of study

We study a new aggregation operator for gradients coming from a mini-batch for stochastic gradient (SG) methods that allows a significant speed-up in the case of sparse optimization problems. We call this method AdaBatch and it only requires a few lines of code change compared to regular mini-batch SGD algorithms. We provide a theoretical insight to understand how this new class of algorithms is performing and show that it is equivalent to an implicit per-coordinate rescaling of the gradients, similarly to what Adagrad methods can do. In theory and in practice, this new aggregation allows to keep the same sample efficiency of SG methods while increasing the batch size. Experimentally, we also show that in the case of smooth convex optimization, our procedure can even obtain a better loss when increasing the batch size for a fixed number of samples. We then apply this new algorithm to obtain a parallelizable stochastic gradient method that is synchronous but allows speed-up on par with Hogwild! methods as convergence does not deteriorate with the increase of the batch size. The same approach can be used to make mini-batch provably efficient for variance-reduced SG methods such as SVRG

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

A sufficient condition for bicolorable hypergraphs

Author: Défossez David
Publication venue: 'Centre pour la Communication Scientifique Directe (CCSD)'
Publication date: 01/01/2005
Field of study

International audienceIn this note we prove Sterboul's conjecture, that provides a sufficient condition for the bicolorability of hypergraphs

Hal - Université Grenoble Alpes

Episciences.org

Tree resistance to wind: the effects of soil conditions on tree stability

Author: Défossez Pauline
Publication venue: AFM, Association Française de Mécanique
Publication date: 01/01/2015
Field of study

Wind damage represents more than 50% by volume of forest damage in Europe. Recent evidence suggests that wind damage could double or even quadruple by the end of the century with dramatic consequences for the forest economy and the ecological functioning and survival of European forests. Most trees during storms are uprooted. While a large amount of work has been done over the last decade on understanding the aerial tree response to turbulent wind flow, much less is known about the root-soil interface, and the impact of soil moisture on tree uprooting. This paper investigates at tree scale the effects of soil conditions, such as water saturation during storms, on tree stability. Our analysis is based on (i) the critical bending moment that induces tree uprooting measured from static pulling experiments (ii) the soil mechanical properties as function of climatic conditions measured and modeled from laboratory measurements (iii) new techniques developed for studying the mechanics of tree structure incorporating 3D roots architecture and numerical biomechanics modeling

I-Revues

Quantifying the effect of matric suction on the compressive properties of two agricultural soil using an osmotic oedometer

Author: Cui Kai
Cui Yu-Jun
Défossez Pauline
Richard Guy
Publication venue: 'Elsevier BV'
Publication date: 25/03/2010
Field of study

International audienceThe compaction of cultivated soils by agricultural machines considerably affects both the structure and physical properties of soil, thus having a major impact on crop production and the environment. The soil mechanical strength to compaction is highly variable both in time and space because it depends on soil type (texture), soil structure (porosity) and soil moisture (suction). This paper is devoted to the effect of soil suction on the compression index Cc which is one of the mechanical parameters that describes the soil mechanical strength to compaction. We used an oedometer compression tests with suction control implemented by using the osmotic technique to study the compression index of a loamy soil and a sandy soil. Soil samples were prepared by compacting soil powder passed through 2 mm sieve, to a dry bulk density of 1.1 or 1.45 Mg m-3. The mechanical stress and the suction ranges considered corresponded to field conditions, with vertical stress less than 800 kPa and suction less than 200 kPa. The results show that the compression index Cc changed little with suctions ranging from 10 to 200 kPa for the two soils at different initial densities. By contrast, the variation of Cc is significant when soil suction is close to zero for the loamy soil at an initial dry bulk density of 1.1 Mg m-3. From a practical point of view, this variation in compression index with suction is a useful result for modelling soil strain due to traffic and predicting the compaction of cultivated soils

HAL Descartes

HAL-Ecole des Ponts ParisTech

On the Convergence of Adam and Adagrad

Author: Bach Francis
Bottou Léon
Défossez Alexandre
Usunier Nicolas
Publication venue
Publication date: 04/03/2020
Field of study

We provide a simple proof of the convergence of the optimization algorithms Adam and Adagrad with the assumptions of smooth gradients and almost sure uniform bound on the

\ell_\infty

norm of the gradients. This work builds on the techniques introduced by Ward et al. (2019) and extends them to the Adam optimizer. We show that in expectation, the squared norm of the objective gradient averaged over the trajectory has an upper-bound which is explicit in the constants of the problem, parameters of the optimizer and the total number of iterations N. This bound can be made arbitrarily small. In particular, Adam with a learning rate

\alpha=1/\sqrt{N}

and a momentum parameter on squared gradients

\beta_2=1 - 1/N

achieves the same rate of convergence

O(\ln(N)/\sqrt{N})

as Adagrad. Thus, it is possible to use Adam as a finite horizon version of Adagrad, much like constant step size SGD can be used instead of its asymptotically converging decaying step size version.Comment: 19 pages, 0 figures, preprint versio

arXiv.org e-Print Archive

Music Source Separation in the Waveform Domain

Author: Bach Francis
Bottou Léon
Défossez Alexandre
Usunier Nicolas
Publication venue: HAL CCSD
Publication date: 25/11/2019
Field of study

Source separation for music is the task of isolating contributions, or stems, from different instruments recorded individually and arranged together to form a song. Such components include voice, bass, drums and any other accompaniments. Contrarily to many audio synthesis tasks where the best performances are achieved by models that directly generate the waveform, the state-of-the-art in source separation for music is to compute masks on the magnitude spectrum. In this paper, we first show that an adaptation of Conv-Tasnet (Luo & Mesgarani, 2019), a waveform-to-waveform model for source separation for speech, significantly beats the state-of-the-art on the MusDB dataset, the standard benchmark of multi-instrument source separation. Second, we observe that Conv-Tasnet follows a masking approach on the input signal, which has the potential drawback of removing parts of the relevant source without the capacity to reconstruct it. We propose Demucs, a new waveform-to-waveform model, which has an architecture closer to models for audio generation with more capacity on the decoder. Experiments on the MusDB dataset show that Demucs beats previously reported results in terms of signal to distortion ratio (SDR), but lower than Conv-Tasnet. Human evaluations show that Demucs has significantly higher quality (as assessed by mean opinion score) than Conv-Tasnet, but slightly more contamination from other sources, which explains the difference in SDR. Additional experiments with a larger dataset suggest that the gap in SDR between Demucs and Conv-Tasnet shrinks, showing that our approach is promising

Demucs: Deep Extractor for Music Sources with extra unlabeled data remixed

Author: Bach Francis
Bottou Léon
Défossez Alexandre
Usunier Nicolas
Publication venue: HAL CCSD
Publication date: 03/09/2019
Field of study

We study the problem of source separation for music using deep learning with four known sources: drums, bass, vocals and other accompaniments. State-of-the-art approaches predict soft masks over mixture spectrograms while methods working on the waveform are lagging behind as measured on the standard MusDB benchmark. Our contribution is two fold. (i) We introduce a simple convolutional and recurrent model that outperforms the state-of-the-art model on waveforms, that is, Wave-U-Net, by 1.6 points of SDR (signal to distortion ratio). (ii) We propose a new scheme to leverage unlabeled music. We train a first model to extract parts with at least one source silent in unlabeled tracks, for instance without bass. We remix this extract with a bass line taken from the supervised dataset to form a new weakly supervised training example. Combining our architecture and scheme, we show that waveform methods can play in the same ballpark as spectrogram ones

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server