18,575 research outputs found

    Toward Understanding Why Adam Converges Faster Than SGD for Transformers

    Full text link
    While stochastic gradient descent (SGD) is still the most popular optimization algorithm in deep learning, adaptive algorithms such as Adam have established empirical advantages over SGD in some deep learning applications such as training transformers. However, it remains a question that why Adam converges significantly faster than SGD in these scenarios. In this paper, we propose one explanation of why Adam converges faster than SGD using a new concept directional sharpness. We argue that the performance of optimization algorithms is closely related to the directional sharpness of the update steps, and show SGD has much worse directional sharpness compared to adaptive algorithms. We further observe that only a small fraction of the coordinates causes the bad sharpness and slow convergence of SGD, and propose to use coordinate-wise clipping as a solution to SGD and other optimization algorithms. We demonstrate the effect of coordinate-wise clipping on sharpness reduction and speeding up the convergence of optimization algorithms under various settings. We show that coordinate-wise clipping improves the local loss reduction when only a small fraction of the coordinates has bad sharpness. We conclude that the sharpness reduction effect of adaptive coordinate-wise scaling is the reason for Adam's success in practice and suggest the use of coordinate-wise clipping as a universal technique to speed up deep learning optimization.Comment: 37 pages, 16 figure

    Will Ezhou become an air cargo superhub in China? A comparison to Memphis

    Get PDF
    Purpose: As China boosts high-end manufacturing and modern services along with industrial relocation to its central and west regions, air cargo hubs become more critical for development in these regions. Meanwhile, aviation logistics has been listed as a new momentum for further economic growth in multiple Chinese cities, among which Ezhou is said to become Asia’s first and the world’s fourth professional cargo airport. This article assesses the possibility for Ezhou to realize this goal, based on a comparison to the US busiest air cargo hub, Memphis. Design/methodology: Factors under comparison include Geographical location, city foundation, weather conditions, traffic connections, and policy support. Also, this article evaluates Ezhou’s privileges against other Chinese cities, taking Zhengzhou as an example. Findings: Ezhou is finally found to be more suited to be a Chinese Memphis. Research limitations/implications: No permission was given to make interviews with the ground handling personnel and gather real-life data to analyze task durations and workers’ body movements. Originality/value: This article is the first to analyze the possible rise of an air cargo hub in China in English literature.Peer Reviewe

    Increase in neuroexcitability of unmyelinated C-type vagal ganglion neurons during initial postnatal development of visceral afferent reflex functions

    Get PDF
    BACKGROUND: Baroreflex gain increase up closely to adult level during initial postnatal weeks, and any interruption within this period will increase the risk of cardiovascular problems in later of life span. We hypothesize that this short period after birth might be critical for postnatal development of vagal ganglion neurons (VGNs). METHODS: To evaluate neuroexcitability evidenced by discharge profiles and coordinate changes, ion currents were collected from identified A- and C-type VGNs at different developmental stages using whole-cell patch clamping. RESULTS: C-type VGNs underwent significant age-dependent transition from single action potential (AP) to repetitive discharge. The coordinate changes between TTX-S and TTX-R Na(+) currents were also confirmed and well simulated by computer modeling. Although 4-AP or iberiotoxin age dependently increased firing frequency, AP duration was prolonged in an opposite fashion, which paralleled well with postnatal changes in 4-AP- and iberiotoxin-sensitive K(+) current activity, whereas less developmental changes were verified in A-types. CONCLUSION: These data demonstrate for the first time that the neuroexcitability of C-type VGNs increases significantly compared with A-types within initial postnatal weeks evidenced by AP discharge profiles and coordinate ion channel changes, which explain, at least in part, that initial postnatal weeks may be crucial for ontogenesis in visceral afferent reflex function
    • 

    corecore