Search CORE

350 research outputs found

Softmax Acceleration with Adaptive Numeric Format for both Training and Inference

Author: Xia Tianhua
Zhang Sai Qian
Publication venue
Publication date: 22/11/2023
Field of study

The attention mechanism is a pivotal element within the Transformer architecture, making a substantial contribution to its exceptional performance. Within this attention mechanism, Softmax is an imperative component that enables the model to assess the degree of correlation between various segments of the input. Yet, prior research has shown that Softmax operations can significantly increase processing latency and energy consumption in the Transformer network due to their internal nonlinear operations and data dependencies. In this work, we proposed~\textit{Hyft}, a hardware efficient floating point Softmax accelerator for both training and inference. Hyft aims to reduce the implementation cost of different nonlinear arithmetic operations by adaptively converting intermediate results into the most suitable numeric format for each specific operation, leading to reconfigurable accelerator with hybrid numeric format. The evaluation results highlight that Hyft achieves a remarkable

15\times

reduction in hardware resource utilization and a

20 \times

reduction in processing latency, all while maintaining a negligible impact on Transformer accuracy

arXiv.org e-Print Archive

BIM: Block-Wise Self-Supervised Learning with Masked Image Modeling

Author: Luo Yixuan
Ren Mengye
Zhang Sai Qian
Publication venue
Publication date: 28/11/2023
Field of study

Like masked language modeling (MLM) in natural language processing, masked image modeling (MIM) aims to extract valuable insights from image patches to enhance the feature extraction capabilities of the underlying deep neural network (DNN). Contrasted with other training paradigms like supervised learning and unsupervised contrastive learning, masked image modeling (MIM) pretraining typically demands significant computational resources in order to manage large training data batches (e.g., 4096). The significant memory and computation requirements pose a considerable challenge to its broad adoption. To mitigate this, we introduce a novel learning framework, termed~\textit{Block-Wise Masked Image Modeling} (BIM). This framework involves decomposing the MIM tasks into several sub-tasks with independent computation patterns, resulting in block-wise back-propagation operations instead of the traditional end-to-end approach. Our proposed BIM maintains superior performance compared to conventional MIM while greatly reducing peak memory consumption. Moreover, BIM naturally enables the concurrent training of numerous DNN backbones of varying depths. This leads to the creation of multiple trained DNN backbones, each tailored to different hardware platforms with distinct computing capabilities. This approach significantly reduces computational costs in comparison with training each DNN backbone individually. Our framework offers a promising solution for resource constrained training of MIM

arXiv.org e-Print Archive

4,6-Dimethylpyrimidin-2(1H)-one–urea–water (1/1/1)

Author: Tao Zhu
Wu Feng
Xue Sai-Feng
Zhang Yun-Qian
Zhu Qian-Jiang
Publication venue: International Union of Crystallography
Publication date: 01/08/2008
Field of study

In the crystal structure of the title compound, C6H8N2O·CH4N2O·H2O, molecules are linked via N—H⋯O, O—H⋯N and O—H⋯O hydrogen bonds, forming a three–dimensional framework

Directory of Open Access Journals

PubMed Central

SphereFed: Hyperspherical Federated Learning

Author: Dong Xin
Kung H. T.
Li Ang
Zhang Sai Qian
Publication venue
Publication date: 19/07/2022
Field of study

Federated Learning aims at training a global model from multiple decentralized devices (i.e. clients) without exchanging their private local data. A key challenge is the handling of non-i.i.d. (independent identically distributed) data across multiple clients that may induce disparities of their local features. We introduce the Hyperspherical Federated Learning (SphereFed) framework to address the non-i.i.d. issue by constraining learned representations of data points to be on a unit hypersphere shared by clients. Specifically, all clients learn their local representations by minimizing the loss with respect to a fixed classifier whose weights span the unit hypersphere. After federated training in improving the global model, this classifier is further calibrated with a closed-form solution by minimizing a mean squared loss. We show that the calibration solution can be computed efficiently and distributedly without direct access of local data. Extensive experiments indicate that our SphereFed approach is able to improve the accuracy of multiple existing federated learning algorithms by a considerable margin (up to 6% on challenging datasets) with enhanced computation and communication efficiency across datasets and model architectures.Comment: European Conference on Computer Vision 202

arXiv.org e-Print Archive