Search CORE

39 research outputs found

FlattenQuant: Breaking Through the Inference Compute-bound for Large Language Models with Per-tensor Quantization

Author: Pan Aimin
Peng Shuang
Wang Fangyu
Yang Fei
Zhang Yi
Publication venue
Publication date: 27/02/2024
Field of study

Large language models (LLMs) have demonstrated state-of-the-art performance across various tasks. However, the latency of inference and the large GPU memory consumption of LLMs restrict their deployment performance. Recently, there have been some efficient attempts to quantize LLMs, yet inference with large batch size or long sequence still has the issue of being compute-bound. Fine-grained quantization methods have showcased their proficiency in achieving low-bit quantization for LLMs, while requiring FP16 data type for linear layer computations, which is time-consuming when dealing with large batch size or long sequence. In this paper, we introduce a method called FlattenQuant, which significantly reduces the maximum value of the tensor by flattening the large channels in the tensor, to achieve low bit per-tensor quantization with minimal accuracy loss. Our experiments show that FlattenQuant can directly use 4 bits to achieve 48.29% of the linear layer calculation in LLMs, with the remaining layers using 8 bits. The 4-bit matrix multiplication introduced in the FlattenQuant method can effectively address the compute-bound caused by large matrix calculation. Our work achieves up to 2

\times

speedup and 2.3

\times

memory reduction for LLMs with negligible loss in accuracy

arXiv.org e-Print Archive

ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based Approaches

Author: Fangyu Zheng
Guang Fan
Jingqiang Lin
Lipeng Wan
Tian Zhou
Wenxu Tang
Yi Bian
Yixuan Song
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 22/01/2024
Field of study

The remarkable performance capabilities of AI accelerators offer promising opportunities for accelerating cryptographic algorithms, particularly in the context of lattice-based cryptography. However, current approaches to leveraging AI accelerators often remain at a rudimentary level of implementation, overlooking the intricate internal mechanisms of these devices. Consequently, a significant number of computational resources is underutilized. In this paper, we present a comprehensive exploration of NVIDIA Tensor Cores and introduce a novel framework tailored specifically for Kyber. Firstly, we propose two innovative approaches that efficiently break down Kyber\u27s NTT into iterative matrix multiplications, resulting in approximately a 75% reduction in costs compared to the state-of-the-art scanning-based methods.Secondly, by reversing the internal mechanisms, we precisely manipulate the internal resources of Tensor Cores using assembly-level code instead of inefficient standard interfaces, eliminating memory accesses and redundant function calls. Finally, building upon our highly optimized NTT, we provide a complete implementation for all parameter sets of Kyber. Our implementation surpasses the state-of-the-art Tensor Core based work, achieving remarkable speed-ups of 1.93x, 1.65x, 1.22x and 3.55x for polyvec_ntt, KeyGen, Enc and Dec in Kyber-1024, respectively. Even when considering execution latency, our throughput-oriented full Kyber implementation maintains an acceptable execution latency. For instance, the execution latency ranges from 1.02 to 5.68 milliseconds for Kyber-1024 on R3080 when achieving the peak throughput

Cryptology ePrint Archive

ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based Approaches

Author: Fangyu Zheng
Guang Fan
Jingqiang Lin
Lipeng Wan
Tian Zhou
Wenxu Tang
Yi Bian
Yixuan Song
Publication venue: Ruhr-Universität Bochum
Publication date: 01/03/2024
Field of study

The remarkable performance capabilities of AI accelerators offer promising opportunities for accelerating cryptographic algorithms, particularly in the context of lattice-based cryptography. However, current approaches to leveraging AI accelerators often remain at a rudimentary level of implementation, overlooking the intricate internal mechanisms of these devices. Consequently, a significant number of computational resources is underutilized. In this paper, we present a comprehensive exploration of NVIDIA Tensor Cores and introduce a novel framework tailored specifically for Kyber. Firstly, we propose two innovative approaches that efficiently break down Kyber’s NTT into iterative matrix multiplications, resulting in approximately a 75% reduction in costs compared to the state-of-the-art scanning-based methods. Secondly, by reversing the internal mechanisms, we precisely manipulate the internal resources of Tensor Cores using assembly-level code instead of inefficient standard interfaces, eliminating memory accesses and redundant function calls. Finally, building upon our highly optimized NTT, we provide a complete implementation for all parameter sets of Kyber. Our implementation surpasses the state-of-the-art Tensor Core based work, achieving remarkable speed-ups of 1.93x, 1.65x, 1.22x and 3.55x for polyvec_ntt, KeyGen, Enc and Dec in Kyber-1024, respectively. Even when considering execution latency, our throughput-oriented full Kyber implementation maintains an acceptable execution latency. For instance, the execution latency ranges from 1.02 to 5.68 milliseconds for Kyber-1024 on R3080 when achieving the peak throughput

Directory of Open Access Journals

DeepOpht: Medical Report Generation for Retinal Images via Deep Models and Visual Explanation

Author: Chang Hernghua
Huang Jia-Hong
Lin I-Hung
Liu Fangyu
Liu Yi-Chieh
Morikawa Hiromasa
Tegner Jesper
Tian Meng
Wang Kang
Worring Marcel
Wu Ting-Wei
Yang Chao-Han Huck
Publication venue
Publication date: 01/11/2020
Field of study

In this work, we propose an AI-based method that intends to improve the conventional retinal disease treatment procedure and help ophthalmologists increase diagnosis efficiency and accuracy. The proposed method is composed of a deep neural networks-based (DNN-based) module, including a retinal disease identifier and clinical description generator, and a DNN visual explanation module. To train and validate the effectiveness of our DNN-based module, we propose a large-scale retinal disease image dataset. Also, as ground truth, we provide a retinal image dataset manually labeled by ophthalmologists to qualitatively show, the proposed AI-based method is effective. With our experimental results, we show that the proposed method is quantitatively and qualitatively effective. Our method is capable of creating meaningful retinal image descriptions and visual explanations that are clinically relevant.Comment: Accepted to IEEE WACV 202

arXiv.org e-Print Archive

International Migration, Integration and Social Cohesion online publications

Efficacy and safety of the compound Chinese medicine SaiLuoTong in vascular dementia: A randomized clinical trial

Author: Chen Shuoqi
Chu Changbiao
Gauthier Serge
Gong Min
He Jia
Jia Jianping
Jia Longfei
Li Fang
Li Fangyu
Liang Junhua
Qin Wei
Shi Lu
Song Haiqing
Tang Yi
Wang Fen
Wei Cuibai
Xu Hui
Yang Shanshan
Zhou Aihong
Zuo Xiumei
Publication venue: Henry Ford Health System Scholarly Commons
Publication date: 01/01/2018
Field of study

Introduction: No licensed medications are available to treat vascular dementia (VaD). Methods: Patients were randomly assigned to experimental groups (SaiLuoTong [SLT] 360 or 240 mg for groups A and B for 52 weeks, respectively) or placebo group (SLT 360 mg and 240 mg for group C only from weeks 27 to 52, respectively). Results: Three hundred twenty-five patients were included in final analysis. At week 26, the difference in VaD Assessment Scale-cognitive subscale scores was 2.67 (95% confidence interval, 1.54 to 3.81) for groups A versus C, and 2.48 (1.34 to 3.62) for groups B versus C (both Discussion: This study suggests that SLT is effective for treatment of VaD, and this compound Chinese medicine may represent a better choice to treat VaD

Henry Ford Health System Scholarly Commons

Mechanisms and recent advances in the diagnosis and treatment of nitrous oxide-induced peripheral neuropathy: a narrative review

Author: Ahmad Alhaskawi
Changxin Wang
Fangyu Yi
Haiying Zhou
Hui Lu
Mohamed Hasan Abdulla Hasan Abdulla
Olga Alenikova
Sahar Ahmed Abdalbary
Sohaib Hasan Abdullah Ezzi
Vishnu Goutham Kota
Weijie Zhou
Xiaodi Zou
Yanzhao Dong
Publication venue: Frontiers Media S.A.
Publication date: 01/05/2024
Field of study

Under standard conditions, nitrous oxide (N2O) manifests as a colorless, odorless gas with a mildly sweet taste. The compound finds applications in various fields, including its use as an aerosol propellants, an accelerant in motor racing, and an anesthetic in surgical procedures and dentistry. Unfortunately, the recreational misuse of N2O has become prevalent among young individuals due to its euphoric and hallucinogenic effects. Compounding this issue is the fact that nitrous oxide can be easily obtained from over-the-counter household items, facilitating its non-medical use. The global community has witnessed a surge in the recreational utilization of nitrous oxide gas in recent years. Despite the widespread non-medical abuse of N2O, there remains inadequate understanding of the potential adverse effects resulting from exposure to it. This paper provides an overview of management findings, laboratory and electrodiagnostic characteristics, as well as clinical presentations associated with neurological disorders induced by nitrous oxide usage

Directory of Open Access Journals

The efficacy of Cognitive training in patients with VAsCular Cognitive Impairment, No dEmentia (the Cog-VACCINE study): study protocol for a randomized controlled trial

Author: A Engvig
A Engvig
A Gutchess
AM Kueider
AP Association
B Winblad
BC Stephan
C Herrera
C Ramos-Estebanez
C Wentzel
CP Hughes
DE Barnes
F Fazekas
Fang Li
Fangyu Li
H Jokinen
H Li
I Boutron
J Jia
JA Anguera
JH Kramer
Jianping Jia
Jianwei Yang
K Rockwood
KF Schulz
L Pantoni
MA Fiatarone Singh
MC Tierney
MF Folstein
MJ Prince
MY Zhang
NJ Gates
P Sachdev
P Scheltens
P Toril
Qing Liu
S Belleville
SB Chapman
SM Hosseini
SS Simon
V Hachinski
VC Buschert
X Han
Yi Tang
Yi Xing
Zude Zhu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref