10,114 research outputs found
Global Momentum Compression for Sparse Communication in Distributed SGD
With the rapid growth of data, distributed stochastic gradient descent~(DSGD)
has been widely used for solving large-scale machine learning problems. Due to
the latency and limited bandwidth of network, communication has become the
bottleneck of DSGD when we need to train large scale models, like deep neural
networks. Communication compression with sparsified gradient, abbreviated as
\emph{sparse communication}, has been widely used for reducing communication
cost in DSGD. Recently, there has appeared one method, called deep gradient
compression~(DGC), to combine memory gradient and momentum SGD for sparse
communication. DGC has achieved promising performance in practise. However, the
theory about the convergence of DGC is lack. In this paper, we propose a novel
method, called \emph{\underline{g}}lobal \emph{\underline{m}}omentum
\emph{\underline{c}}ompression~(GMC), for sparse communication in DSGD. GMC
also combines memory gradient and momentum SGD. But different from DGC which
adopts local momentum, GMC adopts global momentum. We theoretically prove the
convergence rate of GMC for both convex and non-convex problems. To the best of
our knowledge, this is the first work that proves the convergence of
distributed momentum SGD~(DMSGD) with sparse communication and memory gradient.
Empirical results show that, compared with the DMSGD counterpart without sparse
communication, GMC can reduce the communication cost by approximately 100 fold
without loss of generalization accuracy. GMC can also achieve
comparable~(sometimes better) performance compared with DGC, with extra
theoretical guarantee
The Effects of a Non-Ferroelectric Slab on the Polarization and the Susceptibility of the Ferroelectric Multilayer
The polarization and the susceptibility of a ferroelectric multilayer with a
non-ferroelectric slab are investigated within the framework of transverse
Ising model with a four-spin interaction term. The effect of the thickness and
the position of the non-ferroelectric slab are investigated in this paper. We
find that the increase of the thickness of the non-ferroelectric will decrease
the polarization and the susceptibility of the film. If the position of the
non-ferroelcetric slab shifts from the center of the film to the surface, the
number of the peaks of the susceptibility will change. And a step-like
polarization curve is found.Comment: 15 pages, 4 figure
Stability studies of ZnO and AlN thin film acoustic wave devices in acid and alkali harsh environments
Surface acoustic wave (SAW) devices based on piezoelectric thin-films such as ZnO and AlN are widely used in sensing, microfluidics and lab-on-a-chip applications. However, for many of these applications, the SAW devices will inevitably be used in acid or alkali harsh environments, which may cause their early failures. In this work, we investigated the behavior and degradation mechanisms of thin film based SAW devices in acid and alkali harsh environments. Results show that under the acid and alkali attacks, chemical reaction and corrosion of ZnO devices are very fast (usually within 45 s). During the corrosion, the crystalline orientation of the ZnO film is not changed, but its grain defects are significantly increased and the grain sizes are decreased. The velocity of ZnO-based SAW devices is decreased due to the formation of porous structures induced by the chemical reactions. Whereas an AlN thin-film based SAW device does not perform well in acid–alkali conditions, it might be able to maintain a normal performance without obvious degradation for more than ten hours in acid or alkali solutions. This work could provide guidance for the applications of both ZnO or AlN-based SAW devices in acid/alkali harsh environments
Contrastive Attention for Automatic Chest X-ray Report Generation
Recently, chest X-ray report generation, which aims to automatically generate
descriptions of given chest X-ray images, has received growing research
interests. The key challenge of chest X-ray report generation is to accurately
capture and describe the abnormal regions. In most cases, the normal regions
dominate the entire chest X-ray image, and the corresponding descriptions of
these normal regions dominate the final report. Due to such data bias,
learning-based models may fail to attend to abnormal regions. In this work, to
effectively capture and describe abnormal regions, we propose the Contrastive
Attention (CA) model. Instead of solely focusing on the current input image,
the CA model compares the current input image with normal images to distill the
contrastive information. The acquired contrastive information can better
represent the visual features of abnormal regions. According to the experiments
on the public IU-X-ray and MIMIC-CXR datasets, incorporating our CA into
several existing models can boost their performance across most metrics. In
addition, according to the analysis, the CA model can help existing models
better attend to the abnormal regions and provide more accurate descriptions
which are crucial for an interpretable diagnosis. Specifically, we achieve the
state-of-the-art results on the two public datasets.Comment: Appear in Findings of ACL 2021 (The Joint Conference of the 59th
Annual Meeting of the Association for Computational Linguistics and the 11th
International Joint Conference on Natural Language Processing (ACL-IJCNLP
2021)
- …