Search CORE

44 research outputs found

Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?

Author: Cheng Jianyi
Constantinides George A.
Shumailov Ilia
Zhang Cheng
Zhao Yiren
Publication venue
Publication date: 21/10/2023
Field of study

The inference of Large language models (LLMs) requires immense computation and memory resources. To curtail these costs, quantisation has merged as a promising solution, but existing LLM quantisation mainly focuses on 8-bit. In this work, we explore the statistical and learning properties of the LLM layer and attribute the bottleneck of LLM quantisation to numerical scaling offsets. To address this, we adapt block quantisations for LLMs, a family of methods that share scaling factors across packed numbers. Block quantisations efficiently reduce the numerical scaling offsets solely from an arithmetic perspective, without additional treatments in the computational path. Our nearly-lossless quantised 6-bit LLMs achieve a

19\times

higher arithmetic density and

5\times

memory density than the float32 baseline, surpassing the prior art 8-bit quantisation by

2.5\times

in arithmetic density and

1.2\times

in memory density, without requiring any data calibration or re-training. We also share our insights into sub-8-bit LLM quantisation, including the mismatch between activation and weight distributions, optimal fine-tuning strategies, and a lower quantisation granularity inherent in the statistical properties of LLMs. The latter two tricks enable nearly-lossless 4-bit LLMs on downstream tasks. Our code is open-sourced.Comment: Accepted by EMNLP202

arXiv.org e-Print Archive

Fast Prototyping Next-Generation Accelerators for New ML Models using MASE: ML Accelerator System Exploration

Author: Bouganis Christos-Savvas
Cheng Jianyi
Montgomerie-Corcoran Alex
Xiao Can
Yu Zhewen
Zhang Cheng
Zhao Yiren
Publication venue
Publication date: 28/07/2023
Field of study

Machine learning (ML) accelerators have been studied and used extensively to compute ML models with high performance and low power. However, designing such accelerators normally takes a long time and requires significant effort. Unfortunately, the pace of development of ML software models is much faster than the accelerator design cycle, leading to frequent and drastic modifications in the model architecture, thus rendering many accelerators obsolete. Existing design tools and frameworks can provide quick accelerator prototyping, but only for a limited range of models that can fit into a single hardware device, such as an FPGA. Furthermore, with the emergence of large language models, such as GPT-3, there is an increased need for hardware prototyping of these large models within a many-accelerator system to ensure the hardware can scale with the ever-growing model sizes. In this paper, we propose an efficient and scalable approach for exploring accelerator systems to compute large ML models. We developed a tool named MASE that can directly map large ML models onto an efficient streaming accelerator system. Over a set of ML models, we show that MASE can achieve better energy efficiency to GPUs when computing inference for recent transformer models. Our tool will open-sourced upon publication

arXiv.org e-Print Archive

Effect of Tryptophan Hydroxylase-2 rs7305115 SNP on suicide attempts risk in major depression

Author: Cheng Yiren
Cheng Zaohuo
Li Ke
Ling Yang
Liu Chaojun
Liu Qinghai
Shi Guizhi
Wan Gairong
Yao Jianjun
Yuan Guozhen
Zhang Changsong
Zhang Yuqi
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Suicide and major depressive disorders (MDD) are strongly associated, and genetic factors are responsible for at least part of the variability in suicide risk. We investigated whether variation at the tryptophan hydroxylase-2 (TPH2) gene rs7305115 SNP may predispose to suicide attempts in MDD. Methods We genotyped TPH2 gene rs7305115 SNP in 215 MDD patients with suicide and matched MDD patients without suicide. Differences in behavioral and personality traits according to genotypic variation were investigated by logistic regression analysis. Results There were no significant differences between MDD patients with suicide and controls in genotypic (AG and GG) frequencies for rs7305115 SNP, but the distribution of AA genotype differed significantly (14.4% vs. 29.3%, <it>p </it>< 0.001). The G-allele frequency was significantly higher in cases than control group (58.1% vs.45.6%, <it>p </it>< 0.001), but the A-allele carrier indicated a decreased trend in MDD with suicide behaviors than control group (41.9% vs.54.4%, <it>p </it>< 0.001). The multivariate logistic regression analysis indicated that TPH2 rs7305115 AA (OR 0.33, 95% CI 0.22-0.99), family history of suicide (OR 2.98, 95% CI 1.17-5.04), negative life events half year ago (OR 6.64, 95% CI 2.48-11.04) and hopelessness (OR 7.68, 95% CI 5.79-13.74) were significantly associated with the suicide behaviors in MDD patients. Conclusions The study suggested that hopelessness, negative life events and family history of suicide were risk factors of attempted suicide in MDD while the TPH2 rs7305115A remained a significant protective predictor of suicide attempts.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Advances in research and application of artificial intelligence and radiomic predictive models based on intracranial aneurysm images

Author: Cheng Yang
Ping Zhou
Ping Zhou
Xiang Zhan
Yan Peng
Yiheng Hu
Yiren Wang
Yiren Wang
Yuxin Zhong
Zhen Zeng
Zhongjian Wen
Zhongjian Wen
Publication venue: Frontiers Media S.A.
Publication date: 01/04/2024
Field of study

Intracranial aneurysm is a high-risk disease, with imaging playing a crucial role in their diagnosis and treatment. The rapid advancement of artificial intelligence in imaging technology holds promise for the development of AI-based radiomics predictive models. These models could potentially enable the automatic detection and diagnosis of intracranial aneurysms, assess their status, and predict outcomes, thereby assisting in the creation of personalized treatment plans. In addition, these techniques could improve diagnostic efficiency for physicians and patient prognoses. This article aims to review the progress of artificial intelligence radiomics in the study of intracranial aneurysms, addressing the challenges faced and future prospects, in hopes of introducing new ideas for the precise diagnosis and treatment of intracranial aneurysms

Directory of Open Access Journals

Comparative proteomic analysis of plasma from bipolar depression and depressive disorder: identification of proteins associated with immune regulatory

Author: ChengLong Huang
ChengLong Rao
Dong Wu
HaiYang Shi
Jian Zhou
JianYong Tang
Jin Chen
Ke Cheng
Li Liao
Peng Xie
YiRen Song
YongTao Yang
You Wu
Publication venue: Springer Nature
Publication date: 01/01/2015
Field of study

Springer - Publisher Connector

A combined method for gas-bearing layer identification in a complex sandstone reservoir

Author: Cheng Lu
Donghui Xing
Donghui Xing
Donghui Xing
Hongfeng Lu
Hongfeng Lu
Hongfeng Lu
Hui Li
Peng Zhang
Yi Ding
Yiren Fan
Publication venue: 'Frontiers Media SA'
Publication date: 01/07/2022
Field of study

Langgu Depression is a mature oil and gas exploration area with complicated lithological and physical properties. The varying formation fluid, low-resistivity hydrocarbon-bearing reservoirs, and non-uniform logging series greatly increase the difficulty of gas reservoir identification. The Monte Carlo method is employed to simulate the neutron–gamma logging responses to gas saturation and the influential factors. According to the result, a new gas identification chart eliminating the influence of porosity and formation water salinity is proposed to identify gas reservoirs in the old wells. At the same time, a fluid factor extracted from array acoustic logging and core measurement data is sensitive to the development of gas-bearing layers and useful for the identification of gas reservoirs in the new wells with array acoustic logging. The field examples show that the new combined method greatly improves the ability to identify gas-bearing layers and works well in old well reexamination and new well interpretation

Directory of Open Access Journals

TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities

Author: Chen Chen
Chen Sihong
Chen Xiaoshuai
Chen Yiren
Du Xiaoyong
Guo Han
Guo Weigang
Hou Cheng
Huang Shan
Kang Zhanhui
Li Feifei
Li Yudong
Liu Haoyan
Liu Liqun
Liu Weijie
Mao Weiquan
Shen Linlin
Shi Wenhang
Sun Ningyuan
Sun Xingwu
Tian Rong
Wu Taiqiang
Yan Kimmo
Zhao Jing
Zhao Zhe
Zhu Tao
Publication venue
Publication date: 13/12/2022
Field of study

Recently, the success of pre-training in text domain has been fully extended to vision, audio, and cross-modal scenarios. The proposed pre-training models of different modalities are showing a rising trend of homogeneity in their model structures, which brings the opportunity to implement different pre-training models within a uniform framework. In this paper, we present TencentPretrain, a toolkit supporting pre-training models of different modalities. The core feature of TencentPretrain is the modular design. The toolkit uniformly divides pre-training models into 5 components: embedding, encoder, target embedding, decoder, and target. As almost all of common modules are provided in each component, users can choose the desired modules from different components to build a complete pre-training model. The modular design enables users to efficiently reproduce existing pre-training models or build brand-new one. We test the toolkit on text, vision, and audio benchmarks and show that it can match the performance of the original implementations

arXiv.org e-Print Archive