18,575 research outputs found
Toward Understanding Why Adam Converges Faster Than SGD for Transformers
While stochastic gradient descent (SGD) is still the most popular
optimization algorithm in deep learning, adaptive algorithms such as Adam have
established empirical advantages over SGD in some deep learning applications
such as training transformers. However, it remains a question that why Adam
converges significantly faster than SGD in these scenarios. In this paper, we
propose one explanation of why Adam converges faster than SGD using a new
concept directional sharpness. We argue that the performance of optimization
algorithms is closely related to the directional sharpness of the update steps,
and show SGD has much worse directional sharpness compared to adaptive
algorithms. We further observe that only a small fraction of the coordinates
causes the bad sharpness and slow convergence of SGD, and propose to use
coordinate-wise clipping as a solution to SGD and other optimization
algorithms. We demonstrate the effect of coordinate-wise clipping on sharpness
reduction and speeding up the convergence of optimization algorithms under
various settings. We show that coordinate-wise clipping improves the local loss
reduction when only a small fraction of the coordinates has bad sharpness. We
conclude that the sharpness reduction effect of adaptive coordinate-wise
scaling is the reason for Adam's success in practice and suggest the use of
coordinate-wise clipping as a universal technique to speed up deep learning
optimization.Comment: 37 pages, 16 figure
Will Ezhou become an air cargo superhub in China? A comparison to Memphis
Purpose: As China boosts high-end manufacturing and modern services along with industrial relocation to its central and west regions, air cargo hubs become more critical for development in these regions. Meanwhile, aviation logistics has been listed as a new momentum for further economic growth in multiple Chinese cities, among which Ezhou is said to become Asiaâs first and the worldâs fourth professional cargo airport. This article assesses the possibility for Ezhou to realize this goal, based on a comparison to the US busiest air cargo hub, Memphis. Design/methodology: Factors under comparison include Geographical location, city foundation, weather conditions, traffic connections, and policy support. Also, this article evaluates Ezhouâs privileges against other Chinese cities, taking Zhengzhou as an example. Findings: Ezhou is finally found to be more suited to be a Chinese Memphis. Research limitations/implications: No permission was given to make interviews with the ground handling personnel and gather real-life data to analyze task durations and workersâ body movements. Originality/value: This article is the first to analyze the possible rise of an air cargo hub in China in English literature.Peer Reviewe
Increase in neuroexcitability of unmyelinated C-type vagal ganglion neurons during initial postnatal development of visceral afferent reflex functions
BACKGROUND:
Baroreflex gain increase up closely to adult level during initial postnatal weeks, and any interruption within this period will increase the risk of cardiovascular problems in later of life span. We hypothesize that this short period after birth might be critical for postnatal development of vagal ganglion neurons (VGNs).
METHODS:
To evaluate neuroexcitability evidenced by discharge profiles and coordinate changes, ion currents were collected from identified A- and C-type VGNs at different developmental stages using whole-cell patch clamping.
RESULTS:
C-type VGNs underwent significant age-dependent transition from single action potential (AP) to repetitive discharge. The coordinate changes between TTX-S and TTX-R Na(+) currents were also confirmed and well simulated by computer modeling. Although 4-AP or iberiotoxin age dependently increased firing frequency, AP duration was prolonged in an opposite fashion, which paralleled well with postnatal changes in 4-AP- and iberiotoxin-sensitive K(+) current activity, whereas less developmental changes were verified in A-types.
CONCLUSION:
These data demonstrate for the first time that the neuroexcitability of C-type VGNs increases significantly compared with A-types within initial postnatal weeks evidenced by AP discharge profiles and coordinate ion channel changes, which explain, at least in part, that initial postnatal weeks may be crucial for ontogenesis in visceral afferent reflex function
- âŠ