156 research outputs found
Distributed Deep Learning Optimization of Heat Equation Inverse Problem Solvers
The inversion problem of partial differential equation plays a crucial role in cyber-physical systems applications. This paper presents a novel deep learning optimization approach to constructing a solver of heat equation inversion. To improve the computational efficiency in large-scale industrial applications, data and model parallelisms are incorporated on a platform of multiple GPUs. The advanced Ring-AllReduce architecture is harnessed to achieve an acceleration ratio of 3.46. Then a new multi-GPUs distributed optimization method GradReduce is proposed based on Ring-AllReduce architecture. This method optimizes the original data communication mechanism based on mechanical time and frequency by introducing the gradient transmission scheme solved by linear programming. The experimental results show that the proposed method can achieve an acceleration ratio of 3.84 on a heterogeneous system platform with two CPUs and four GPUs
Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
Current captioning approaches tend to generate correct but "generic"
descriptions that lack real-world knowledge, e.g., named entities and
contextual information. Considering that Vision-Language Pre-Training (VLP)
models master massive such knowledge from large-scale web-harvested data, it is
promising to utilize the generalizability of VLP models to incorporate
knowledge into image descriptions. However, using VLP models faces challenges:
zero-shot inference suffers from knowledge hallucination that leads to
low-quality descriptions, but the generic bias in downstream task fine-tuning
hinders the VLP model from expressing knowledge. To address these concerns, we
propose a simple yet effective method called Knowledge-guided Replay
(K-Replay), which enables the retention of pre-training knowledge during
fine-tuning. Our approach consists of two parts: (1) a knowledge prediction
task on automatically collected replay exemplars to continuously awaken the VLP
model's memory about knowledge, thus preventing the model from collapsing into
the generic pattern; (2) a knowledge distillation constraint to improve the
faithfulness of generated descriptions hence alleviating the knowledge
hallucination. To evaluate knowledge-enhanced descriptions, we construct a
novel captioning benchmark KnowCap, containing knowledge of landmarks, famous
brands, special foods and movie characters. Experimental results show that our
approach effectively incorporates knowledge into descriptions, outperforming
strong VLP baseline by 20.9 points (78.7->99.6) in CIDEr score and 20.5
percentage points (34.0%->54.5%) in knowledge recognition accuracy. Our code
and data is available at https://github.com/njucckevin/KnowCap.Comment: Accepted at ACM Multimedia (ACMMM) 202
DOTA: A Dynamically-Operated Photonic Tensor Core for Energy-Efficient Transformer Accelerator
The wide adoption and significant computing resource consumption of
attention-based Transformers, e.g., Vision Transformer and large language
models, have driven the demands for efficient hardware accelerators. While
electronic accelerators have been commonly used, there is a growing interest in
exploring photonics as an alternative technology due to its high energy
efficiency and ultra-fast processing speed. Optical neural networks (ONNs) have
demonstrated promising results for convolutional neural network (CNN) workloads
that only require weight-static linear operations. However, they fail to
efficiently support Transformer architectures with attention operations due to
the lack of ability to process dynamic full-range tensor multiplication. In
this work, we propose a customized high-performance and energy-efficient
photonic Transformer accelerator, DOTA. To overcome the fundamental limitation
of existing ONNs, we introduce a novel photonic tensor core, consisting of a
crossbar array of interference-based optical vector dot-product engines, that
supports highly-parallel, dynamic, and full-range matrix-matrix multiplication.
Our comprehensive evaluation demonstrates that DOTA achieves a >4x energy and a
>10x latency reduction compared to prior photonic accelerators, and delivers
over 20x energy reduction and 2 to 3 orders of magnitude lower latency compared
to the electronic Transformer accelerator. Our work highlights the immense
potential of photonic computing for efficient hardware accelerators,
particularly for advanced machine learning workloads.Comment: The short version is accepted by Next-Gen AI System Workshop at MLSys
202
Tunable quantum dots in monolithic Fabry-Perot microcavities for high-performance single-photon sources
Cavity-enhanced single quantum dots (QDs) are the main approach towards
ultra-high-performance solid-state quantum light sources for scalable photonic
quantum technologies. Nevertheless, harnessing the Purcell effect requires
precise spectral and spatial alignment of the QDs' emission with the cavity
mode, which is challenging for most cavities. Here we have successfully
integrated miniaturized Fabry-Perot microcavities with a piezoelectric
actuator, and demonstrated a bright single photon source derived from a
deterministically coupled QD within this microcavity. Leveraging the
cavity-membrane structures, we have achieved large spectral-tunability via
strain tuning. On resonance, we have obtained a high Purcell factor of
approximately 9. The source delivers single photons with simultaneous high
extraction efficiency of 0.58, high purity of 0.956(2) and high
indistinguishability of 0.922(4). Together with a small footprint, our scheme
facilitates the scalable integration of indistinguishable quantum light sources
on-chip, and therefore removes a major barrier to the solid-state quantum
information platforms based on QDs.Comment: 12 pages, 4 figure
- …