63 research outputs found

    Spreads in Effective Learning Rates: The Perils of Batch Normalization During Early Training

    Full text link
    Excursions in gradient magnitude pose a persistent challenge when training deep networks. In this paper, we study the early training phases of deep normalized ReLU networks, accounting for the induced scale invariance by examining effective learning rates (LRs). Starting with the well-known fact that batch normalization (BN) leads to exponentially exploding gradients at initialization, we develop an ODE-based model to describe early training dynamics. Our model predicts that in the gradient flow, effective LRs will eventually equalize, aligning with empirical findings on warm-up training. Using large LRs is analogous to applying an explicit solver to a stiff non-linear ODE, causing overshooting and vanishing gradients in lower layers after the first step. Achieving overall balance demands careful tuning of LRs, depth, and (optionally) momentum. Our model predicts the formation of spreads in effective LRs, consistent with empirical measurements. Moreover, we observe that large spreads in effective LRs result in training issues concerning accuracy, indicating the importance of controlling these dynamics. To further support a causal relationship, we implement a simple scheduling scheme prescribing uniform effective LRs across layers and confirm accuracy benefits

    Nonlinear Advantage: Trained Networks Might Not Be As Complex as You Think

    Full text link
    We perform an empirical study of the behaviour of deep networks when fully linearizing some of its feature channels through a sparsity prior on the overall number of nonlinear units in the network. In experiments on image classification and machine translation tasks, we investigate how much we can simplify the network function towards linearity before performance collapses. First, we observe a significant performance gap when reducing nonlinearity in the network function early on as opposed to late in training, in-line with recent observations on the time-evolution of the data-dependent NTK. Second, we find that after training, we are able to linearize a significant number of nonlinear units while maintaining a high performance, indicating that much of a network's expressivity remains unused but helps gradient descent in early stages of training. To characterize the depth of the resulting partially linearized network, we introduce a measure called average path length, representing the average number of active nonlinearities encountered along a path in the network graph. Under sparsity pressure, we find that the remaining nonlinear units organize into distinct structures, forming core-networks of near constant effective depth and width, which in turn depend on task difficulty

    MILK PRODUCTION IN COMMERCIAL CATTLE DAIRY FARMS IN KOSOVA

    Get PDF
    A study research was carried out in comercial dairy farms in Kosovo with the aim to contribute to the understanding of the situation of milk production and factors affecting milk productivity. Seventeen dairy cattle farms were selected for the study. The fresh milk samples were collected and record analyses were done according to the International Committee for Animal Recording using the A4 standard method, and were carried out from August 2007 till September 2008. Meanwhile, 4694 milk samples from 461 individual cows were collected. Depending on the cow breed, daily milk yield was very different (P < 0.0001) ranging from 18.92 0.22 to 12.34 0.53. Effect of the farm and lactation number was also very significant (P < 0.0001), showing that there are huge managment variation from farm to farm (for about 14.87 kg/day) and during different lactations (16.910.26 to 18.430.24 kg/day). According to this study, although in generaly milk yield was very much constant, in some months of year cows in Kosovo tend to produce more milk. Huge differences (about 29.06%) were noticed also within the same breed comparing the current production in Kosovo and from cow breed origine in Austria. It was concluded that low milk yield was achieved for all breeds compared to their genetic potential. Furthermore, according to current dairy farm management condition in Kosovo, more favorable breeds tend to be dual purpose breeds compare to more milk specialized ones

    Retarded PDI diffusion and a reductive shift in poise of the calcium depleted endoplasmic reticulum

    Get PDF
    Background: Endoplasmic reticulum (ER) lumenal protein thiol redox balance resists dramatic variation in unfolded protein load imposed by diverse physiological challenges including compromise in the key upstream oxidases. Lumenal calcium depletion, incurred during normal cell signaling, stands out as a notable exception to this resilience, promoting a rapid and reversible shift towards a more reducing poise. Calcium depletion induced ER redox alterations are relevant to physiological conditions associated with calcium signaling, such as the response of pancreatic cells to secretagogues and neuronal activity. The core components of the ER redox machinery are well characterized; however, the molecular basis for the calcium-depletion induced shift in redox balance is presently obscure. Results: In vitro, the core machinery for generating disulfides, consisting of ERO1 and the oxidizing protein disulfide isomerase, PDI1A, was indifferent to variation in calcium concentration within the physiological range. However, ER calcium depletion in vivo led to a selective 2.5-fold decline in PDI1A mobility, whereas the mobility of the reducing PDI family member, ERdj5 was unaffected. In vivo, fluorescence resonance energy transfer measurements revealed that declining PDI1A mobility correlated with formation of a complex with the abundant ER chaperone calreticulin, whose mobility was also inhibited by calcium depletion and the calcium depletion-mediated reductive shift was attenuated in cells lacking calreticulin. Measurements with purified proteins confirmed that the PDI1A-calreticulin complex dissociated as Ca2+ concentrations approached those normally found in the ER lumen ([Ca2+] K-0.5max = 190 mu M). Conclusions: Our findings suggest that selective sequestration of PDI1A in a calcium depletion-mediated complex with the abundant chaperone calreticulin attenuates the effective concentration of this major lumenal thiol oxidant, providing a plausible and simple mechanism for the observed shift in ER lumenal redox poise upon physiological calcium depletion.Wellcome Trust [Wellcome 084812/Z/08/Z]; European Commission (EU FP7 Beta-Bat) [277713]; Fundacao para a Ciencia e Tecnologia, Portugal [PTDC/QUI-BIQ/119677/2010]info:eu-repo/semantics/publishedVersio

    Official Discrepancies: Kosovo Independence and Western European Rhetoric

    Get PDF
    This article examines approaches and official discrepancies characterising Western European rhetoric with regard to the Kosovo status question. Since the early 1980s, Kosovo has been increasingly present in European debates, culminating with the 1999 international intervention in the region and subsequent talks about its final status. Although the Kosovo Albanians proclaimed independence in February 2008 and the majority of EU Member States decided to recognise Kosovo as an independent state, Western European rhetoric has been rather divided. This article shows that in addition to five EU members who have decided not to recognise Kosovo from the very beginning, and thus are powerful enough to affect its further progress, both locally and internationally, some of the recognisers, although having abandoned the policy of ‘standards before status’, have also struggled to develop full support for the province – a discrepancy that surely questions the overall Western support for Kosovo’s independence
    corecore