44 research outputs found
Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?
The inference of Large language models (LLMs) requires immense computation
and memory resources. To curtail these costs, quantisation has merged as a
promising solution, but existing LLM quantisation mainly focuses on 8-bit. In
this work, we explore the statistical and learning properties of the LLM layer
and attribute the bottleneck of LLM quantisation to numerical scaling offsets.
To address this, we adapt block quantisations for LLMs, a family of methods
that share scaling factors across packed numbers. Block quantisations
efficiently reduce the numerical scaling offsets solely from an arithmetic
perspective, without additional treatments in the computational path. Our
nearly-lossless quantised 6-bit LLMs achieve a higher arithmetic
density and memory density than the float32 baseline, surpassing the
prior art 8-bit quantisation by in arithmetic density and
in memory density, without requiring any data calibration or
re-training. We also share our insights into sub-8-bit LLM quantisation,
including the mismatch between activation and weight distributions, optimal
fine-tuning strategies, and a lower quantisation granularity inherent in the
statistical properties of LLMs. The latter two tricks enable nearly-lossless
4-bit LLMs on downstream tasks. Our code is open-sourced.Comment: Accepted by EMNLP202
Fast Prototyping Next-Generation Accelerators for New ML Models using MASE: ML Accelerator System Exploration
Machine learning (ML) accelerators have been studied and used extensively to
compute ML models with high performance and low power. However, designing such
accelerators normally takes a long time and requires significant effort.
Unfortunately, the pace of development of ML software models is much faster
than the accelerator design cycle, leading to frequent and drastic
modifications in the model architecture, thus rendering many accelerators
obsolete. Existing design tools and frameworks can provide quick accelerator
prototyping, but only for a limited range of models that can fit into a single
hardware device, such as an FPGA. Furthermore, with the emergence of large
language models, such as GPT-3, there is an increased need for hardware
prototyping of these large models within a many-accelerator system to ensure
the hardware can scale with the ever-growing model sizes. In this paper, we
propose an efficient and scalable approach for exploring accelerator systems to
compute large ML models. We developed a tool named MASE that can directly map
large ML models onto an efficient streaming accelerator system. Over a set of
ML models, we show that MASE can achieve better energy efficiency to GPUs when
computing inference for recent transformer models. Our tool will open-sourced
upon publication
Effect of Tryptophan Hydroxylase-2 rs7305115 SNP on suicide attempts risk in major depression
<p>Abstract</p> <p>Background</p> <p>Suicide and major depressive disorders (MDD) are strongly associated, and genetic factors are responsible for at least part of the variability in suicide risk. We investigated whether variation at the tryptophan hydroxylase-2 (TPH2) gene rs7305115 SNP may predispose to suicide attempts in MDD.</p> <p>Methods</p> <p>We genotyped TPH2 gene rs7305115 SNP in 215 MDD patients with suicide and matched MDD patients without suicide. Differences in behavioral and personality traits according to genotypic variation were investigated by logistic regression analysis.</p> <p>Results</p> <p>There were no significant differences between MDD patients with suicide and controls in genotypic (AG and GG) frequencies for rs7305115 SNP, but the distribution of AA genotype differed significantly (14.4% vs. 29.3%, <it>p </it>< 0.001). The G-allele frequency was significantly higher in cases than control group (58.1% vs.45.6%, <it>p </it>< 0.001), but the A-allele carrier indicated a decreased trend in MDD with suicide behaviors than control group (41.9% vs.54.4%, <it>p </it>< 0.001). The multivariate logistic regression analysis indicated that TPH2 rs7305115 AA (OR 0.33, 95% CI 0.22-0.99), family history of suicide (OR 2.98, 95% CI 1.17-5.04), negative life events half year ago (OR 6.64, 95% CI 2.48-11.04) and hopelessness (OR 7.68, 95% CI 5.79-13.74) were significantly associated with the suicide behaviors in MDD patients.</p> <p>Conclusions</p> <p>The study suggested that hopelessness, negative life events and family history of suicide were risk factors of attempted suicide in MDD while the TPH2 rs7305115A remained a significant protective predictor of suicide attempts.</p
Advances in research and application of artificial intelligence and radiomic predictive models based on intracranial aneurysm images
Intracranial aneurysm is a high-risk disease, with imaging playing a crucial role in their diagnosis and treatment. The rapid advancement of artificial intelligence in imaging technology holds promise for the development of AI-based radiomics predictive models. These models could potentially enable the automatic detection and diagnosis of intracranial aneurysms, assess their status, and predict outcomes, thereby assisting in the creation of personalized treatment plans. In addition, these techniques could improve diagnostic efficiency for physicians and patient prognoses. This article aims to review the progress of artificial intelligence radiomics in the study of intracranial aneurysms, addressing the challenges faced and future prospects, in hopes of introducing new ideas for the precise diagnosis and treatment of intracranial aneurysms
A combined method for gas-bearing layer identification in a complex sandstone reservoir
Langgu Depression is a mature oil and gas exploration area with complicated lithological and physical properties. The varying formation fluid, low-resistivity hydrocarbon-bearing reservoirs, and non-uniform logging series greatly increase the difficulty of gas reservoir identification. The Monte Carlo method is employed to simulate the neutron–gamma logging responses to gas saturation and the influential factors. According to the result, a new gas identification chart eliminating the influence of porosity and formation water salinity is proposed to identify gas reservoirs in the old wells. At the same time, a fluid factor extracted from array acoustic logging and core measurement data is sensitive to the development of gas-bearing layers and useful for the identification of gas reservoirs in the new wells with array acoustic logging. The field examples show that the new combined method greatly improves the ability to identify gas-bearing layers and works well in old well reexamination and new well interpretation
TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities
Recently, the success of pre-training in text domain has been fully extended
to vision, audio, and cross-modal scenarios. The proposed pre-training models
of different modalities are showing a rising trend of homogeneity in their
model structures, which brings the opportunity to implement different
pre-training models within a uniform framework. In this paper, we present
TencentPretrain, a toolkit supporting pre-training models of different
modalities. The core feature of TencentPretrain is the modular design. The
toolkit uniformly divides pre-training models into 5 components: embedding,
encoder, target embedding, decoder, and target. As almost all of common modules
are provided in each component, users can choose the desired modules from
different components to build a complete pre-training model. The modular design
enables users to efficiently reproduce existing pre-training models or build
brand-new one. We test the toolkit on text, vision, and audio benchmarks and
show that it can match the performance of the original implementations