46 research outputs found
Incremental Learning in Diagonal Linear Networks
Diagonal linear networks (DLNs) are a toy simplification of artificial neural
networks; they consist in a quadratic reparametrization of linear regression
inducing a sparse implicit regularization. In this paper, we describe the
trajectory of the gradient flow of DLNs in the limit of small initialization.
We show that incremental learning is effectively performed in the limit:
coordinates are successively activated, while the iterate is the minimizer of
the loss constrained to have support on the active coordinates only. This shows
that the sparse implicit regularization of DLNs decreases with time. This work
is restricted to the underparametrized regime with anti-correlated features for
technical reasons
Accelerated Gossip in Networks of Given Dimension using Jacobi Polynomial Iterations
Consider a network of agents connected by communication links, where each
agent holds a real value. The gossip problem consists in estimating the average
of the values diffused in the network in a distributed manner. We develop a
method solving the gossip problem that depends only on the spectral dimension
of the network, that is, in the communication network set-up, the dimension of
the space in which the agents live. This contrasts with previous work that
required the spectral gap of the network as a parameter, or suffered from slow
mixing. Our method shows an important improvement over existing algorithms in
the non-asymptotic regime, i.e., when the values are far from being fully mixed
in the network. Our approach stems from a polynomial-based point of view on
gossip algorithms, as well as an approximation of the spectral measure of the
graphs with a Jacobi measure. We show the power of the approach with
simulations on various graphs, and with performance guarantees on graphs of
known spectral dimension, such as grids and random percolation bonds. An
extension of this work to distributed Laplacian solvers is discussed. As a side
result, we also use the polynomial-based point of view to show the convergence
of the message passing algorithm for gossip of Moallemi \& Van Roy on regular
graphs. The explicit computation of the rate of the convergence shows that
message passing has a slow rate of convergence on graphs with small spectral
gap
Leveraging the two timescale regime to demonstrate convergence of neural networks
We study the training dynamics of shallow neural networks, in a two-timescale
regime in which the stepsizes for the inner layer are much smaller than those
for the outer layer. In this regime, we prove convergence of the gradient flow
to a global optimum of the non-convex optimization problem in a simple
univariate setting. The number of neurons need not be asymptotically large for
our result to hold, distinguishing our result from popular recent approaches
such as the neural tangent kernel or mean-field regimes. Experimental
illustration is provided, showing that the stochastic gradient descent behaves
according to our description of the gradient flow and thus converges to a
global optimum in the two-timescale regime, but can fail outside of this
regime.Comment: 33 pages, 7 figure
Graph-based Approximate Message Passing Iterations
Approximate-message passing (AMP) algorithms have become an important element
of high-dimensional statistical inference, mostly due to their adaptability and
concentration properties, the state evolution (SE) equations. This is
demonstrated by the growing number of new iterations proposed for increasingly
complex problems, ranging from multi-layer inference to low-rank matrix
estimation with elaborate priors. In this paper, we address the following
questions: is there a structure underlying all AMP iterations that unifies them
in a common framework? Can we use such a structure to give a modular proof of
state evolution equations, adaptable to new AMP iterations without reproducing
each time the full argument ? We propose an answer to both questions, showing
that AMP instances can be generically indexed by an oriented graph. This
enables to give a unified interpretation of these iterations, independent from
the problem they solve, and a way of composing them arbitrarily. We then show
that all AMP iterations indexed by such a graph admit rigorous SE equations,
extending the reach of previous proofs, and proving a number of recent
heuristic derivations of those equations. Our proof naturally includes
non-separable functions and we show how existing refinements, such as spatial
coupling or matrix-valued variables, can be combined with our framework.Comment: 59 pages, 24 main, 35 appendi
Accelerated Gossip in Networks of Given Dimension using Jacobi Polynomial Iterations
Consider a network of agents connected by communication links, where each agent holds a real value. The gossip problem consists in estimating the average of the values diffused in the network in a distributed manner. We develop a method solving the gossip problem that depends only on the spectral dimension of the network, that is, in the communication network set-up, the dimension of the space in which the agents live. This contrasts with previous work that required the spectral gap of the network as a parameter, or suffered from slow mixing. Our method shows an important improvement over existing algorithms in the non-asymptotic regime, i.e., when the values are far from being fully mixed in the network. Our approach stems from a polynomial-based point of view on gossip algorithms, as well as an approximation of the spectral measure of the graphs with a Jacobi measure. We show the power of the approach with simulations on various graphs, and with performance guarantees on graphs of known spectral dimension, such as grids and random percolation bonds. An extension of this work to distributed Laplacian solvers is discussed. As a side result, we also use the polynomial-based point of view to show the convergence of the message passing algorithm for gossip of Moallemi & Van Roy on regular graphs. The explicit computation of the rate of the convergence shows that message passing has a slow rate of convergence on graphs with small spectral gap
Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model
International audienceIn the context of statistical supervised learning, the noiseless linear model assumes that there exists a deterministic linear relation between the random output and the random feature vector , a potentially non-linear transformation of the inputs . We analyze the convergence of single-pass, fixed step-size stochastic gradient descent on the least-square risk under this model. The convergence of the iterates to the optimum and the decay of the generalization error follow polynomial convergence rates with exponents that both depend on the regularities of the optimum and of the feature vectors . We interpret our result in the reproducing kernel Hilbert space framework. As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the noiseless observation of its value at randomly sampled points; the convergence depends on the Sobolev smoothness of the function and of a chosen kernel. Finally, we apply our analysis beyond the supervised learning setting to obtain convergence rates for the averaging process (a.k.a. gossip algorithm) on a graph depending on its spectral dimension
A Continuized View on Nesterov Acceleration
We introduce the "continuized" Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process, one can use differential calculus to analyze convergence and obtain analytical expressions for the parameters; but a discretization of the continuized process can be computed exactly with convergence rates similar to those of Nesterov original acceleration. We show that the discretization has the same structure as Nesterov acceleration, but with random parameters
Massive Nest-Box Supplementation Boosts Fecundity, Survival and Even Immigration without Altering Mating and Reproductive Behaviour in a Rapidly Recovered Bird Population
Habitat restoration measures may result in artificially high breeding density, for instance when nest-boxes saturate the environment, which can negatively impact species' demography. Potential risks include changes in mating and reproductive behaviour such as increased extra-pair paternity, conspecific brood parasitism, and polygyny. Under particular cicumstances, these mechanisms may disrupt reproduction, with populations dragged into an extinction vortex. With the use of nuclear microsatellite markers, we investigated the occurrence of these potentially negative effects in a recovered population of a rare secondary cavity-nesting farmland bird of Central Europe, the hoopoe (Upupa epops). High intensity farming in the study area has resulted in a total eradication of cavity trees, depriving hoopoes from breeding sites. An intensive nest-box campaign rectified this problem, resulting in a spectacular population recovery within a few years only. There was some concern, however, that the new, high artificially-induced breeding density might alter hoopoe mating and reproductive behaviour. As the species underwent a serious demographic bottleneck in the 1970â1990s, we also used the microsatellite markers to reconstitute the demo-genetic history of the population, looking in particular for signs of genetic erosion. We found i) a low occurrence of extra-pair paternity, polygyny and conspecific brood parasitism, ii) a high level of neutral genetic diversity (mean number of alleles and expected heterozygosity per locus: 13.8 and 83%, respectively) and, iii) evidence for genetic connectivity through recent immigration of individuals from well differentiated populations. The recent increase in breeding density did thus not induce so far any noticeable detrimental changes in mating and reproductive behaviour. The demographic bottleneck undergone by the population in the 1970s-1990s was furthermore not accompanied by any significant drop in neutral genetic diversity. Finally, genetic data converged with a concomitant demographic study to evidence that immigration strongly contributed to local population recovery
Recommended from our members
Continuation vs Discontinuation of Renin-Angiotensin System Inhibitors Before Major Noncardiac Surgery
ImportanceBefore surgery, the best strategy for managing patients who are taking renin-angiotensin system inhibitors (RASIs) (angiotensin-converting enzyme inhibitors or angiotensin receptor blockers) is unknown. The lack of evidence leads to conflicting guidelines.ObjectiveTo evaluate whether a continuation strategy vs a discontinuation strategy of RASIs before major noncardiac surgery results in decreased complications at 28 days after surgery.Design, setting, and participantsRandomized clinical trial that included patients who were being treated with a RASI for at least 3 months and were scheduled to undergo a major noncardiac surgery between January 2018 and April 2023 at 40 hospitals in France.InterventionPatients were randomized to continue use of RASIs (nâ=â1107) until the day of surgery or to discontinue use of RASIs 48 hours prior to surgery (ie, they would take the last dose 3 days before surgery) (nâ=â1115).Main outcomes and measuresThe primary outcome was a composite of all-cause mortality and major postoperative complications within 28 days after surgery. The key secondary outcomes were episodes of hypotension during surgery, acute kidney injury, postoperative organ failure, and length of stay in the hospital and intensive care unit during the 28 days after surgery.ResultsOf the 2222 patients (mean age, 67 years [SD, 10 years]; 65% were male), 46% were being treated with angiotensin-converting enzyme inhibitors at baseline and 54% were being treated with angiotensin receptor blockers. The rate of all-cause mortality and major postoperative complications was 22% (245 of 1115 patients) in the RASI discontinuation group and 22% (247 of 1107 patients) in the RASI continuation group (risk ratio, 1.02 [95% CI, 0.87-1.19]; Pâ=â.85). Episodes of hypotension during surgery occurred in 41% of the patients in the RASI discontinuation group and in 54% of the patients in the RASI continuation group (risk ratio, 1.31 [95% CI, 1.19-1.44]). There were no other differences in the trial outcomes.Conclusions and relevanceAmong patients who underwent major noncardiac surgery, a continuation strategy of RASIs before surgery was not associated with a higher rate of postoperative complications than a discontinuation strategy.Trial registrationClinicalTrials.gov Identifier: NCT03374449