44 research outputs found

    Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?

    Full text link
    The inference of Large language models (LLMs) requires immense computation and memory resources. To curtail these costs, quantisation has merged as a promising solution, but existing LLM quantisation mainly focuses on 8-bit. In this work, we explore the statistical and learning properties of the LLM layer and attribute the bottleneck of LLM quantisation to numerical scaling offsets. To address this, we adapt block quantisations for LLMs, a family of methods that share scaling factors across packed numbers. Block quantisations efficiently reduce the numerical scaling offsets solely from an arithmetic perspective, without additional treatments in the computational path. Our nearly-lossless quantised 6-bit LLMs achieve a 19Ă—19\times higher arithmetic density and 5Ă—5\times memory density than the float32 baseline, surpassing the prior art 8-bit quantisation by 2.5Ă—2.5\times in arithmetic density and 1.2Ă—1.2\times in memory density, without requiring any data calibration or re-training. We also share our insights into sub-8-bit LLM quantisation, including the mismatch between activation and weight distributions, optimal fine-tuning strategies, and a lower quantisation granularity inherent in the statistical properties of LLMs. The latter two tricks enable nearly-lossless 4-bit LLMs on downstream tasks. Our code is open-sourced.Comment: Accepted by EMNLP202

    Fast Prototyping Next-Generation Accelerators for New ML Models using MASE: ML Accelerator System Exploration

    Full text link
    Machine learning (ML) accelerators have been studied and used extensively to compute ML models with high performance and low power. However, designing such accelerators normally takes a long time and requires significant effort. Unfortunately, the pace of development of ML software models is much faster than the accelerator design cycle, leading to frequent and drastic modifications in the model architecture, thus rendering many accelerators obsolete. Existing design tools and frameworks can provide quick accelerator prototyping, but only for a limited range of models that can fit into a single hardware device, such as an FPGA. Furthermore, with the emergence of large language models, such as GPT-3, there is an increased need for hardware prototyping of these large models within a many-accelerator system to ensure the hardware can scale with the ever-growing model sizes. In this paper, we propose an efficient and scalable approach for exploring accelerator systems to compute large ML models. We developed a tool named MASE that can directly map large ML models onto an efficient streaming accelerator system. Over a set of ML models, we show that MASE can achieve better energy efficiency to GPUs when computing inference for recent transformer models. Our tool will open-sourced upon publication

    Effect of Tryptophan Hydroxylase-2 rs7305115 SNP on suicide attempts risk in major depression

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Suicide and major depressive disorders (MDD) are strongly associated, and genetic factors are responsible for at least part of the variability in suicide risk. We investigated whether variation at the tryptophan hydroxylase-2 (TPH2) gene rs7305115 SNP may predispose to suicide attempts in MDD.</p> <p>Methods</p> <p>We genotyped TPH2 gene rs7305115 SNP in 215 MDD patients with suicide and matched MDD patients without suicide. Differences in behavioral and personality traits according to genotypic variation were investigated by logistic regression analysis.</p> <p>Results</p> <p>There were no significant differences between MDD patients with suicide and controls in genotypic (AG and GG) frequencies for rs7305115 SNP, but the distribution of AA genotype differed significantly (14.4% vs. 29.3%, <it>p </it>< 0.001). The G-allele frequency was significantly higher in cases than control group (58.1% vs.45.6%, <it>p </it>< 0.001), but the A-allele carrier indicated a decreased trend in MDD with suicide behaviors than control group (41.9% vs.54.4%, <it>p </it>< 0.001). The multivariate logistic regression analysis indicated that TPH2 rs7305115 AA (OR 0.33, 95% CI 0.22-0.99), family history of suicide (OR 2.98, 95% CI 1.17-5.04), negative life events half year ago (OR 6.64, 95% CI 2.48-11.04) and hopelessness (OR 7.68, 95% CI 5.79-13.74) were significantly associated with the suicide behaviors in MDD patients.</p> <p>Conclusions</p> <p>The study suggested that hopelessness, negative life events and family history of suicide were risk factors of attempted suicide in MDD while the TPH2 rs7305115A remained a significant protective predictor of suicide attempts.</p

    Advances in research and application of artificial intelligence and radiomic predictive models based on intracranial aneurysm images

    Get PDF
    Intracranial aneurysm is a high-risk disease, with imaging playing a crucial role in their diagnosis and treatment. The rapid advancement of artificial intelligence in imaging technology holds promise for the development of AI-based radiomics predictive models. These models could potentially enable the automatic detection and diagnosis of intracranial aneurysms, assess their status, and predict outcomes, thereby assisting in the creation of personalized treatment plans. In addition, these techniques could improve diagnostic efficiency for physicians and patient prognoses. This article aims to review the progress of artificial intelligence radiomics in the study of intracranial aneurysms, addressing the challenges faced and future prospects, in hopes of introducing new ideas for the precise diagnosis and treatment of intracranial aneurysms

    A combined method for gas-bearing layer identification in a complex sandstone reservoir

    Get PDF
    Langgu Depression is a mature oil and gas exploration area with complicated lithological and physical properties. The varying formation fluid, low-resistivity hydrocarbon-bearing reservoirs, and non-uniform logging series greatly increase the difficulty of gas reservoir identification. The Monte Carlo method is employed to simulate the neutron–gamma logging responses to gas saturation and the influential factors. According to the result, a new gas identification chart eliminating the influence of porosity and formation water salinity is proposed to identify gas reservoirs in the old wells. At the same time, a fluid factor extracted from array acoustic logging and core measurement data is sensitive to the development of gas-bearing layers and useful for the identification of gas reservoirs in the new wells with array acoustic logging. The field examples show that the new combined method greatly improves the ability to identify gas-bearing layers and works well in old well reexamination and new well interpretation

    TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities

    Full text link
    Recently, the success of pre-training in text domain has been fully extended to vision, audio, and cross-modal scenarios. The proposed pre-training models of different modalities are showing a rising trend of homogeneity in their model structures, which brings the opportunity to implement different pre-training models within a uniform framework. In this paper, we present TencentPretrain, a toolkit supporting pre-training models of different modalities. The core feature of TencentPretrain is the modular design. The toolkit uniformly divides pre-training models into 5 components: embedding, encoder, target embedding, decoder, and target. As almost all of common modules are provided in each component, users can choose the desired modules from different components to build a complete pre-training model. The modular design enables users to efficiently reproduce existing pre-training models or build brand-new one. We test the toolkit on text, vision, and audio benchmarks and show that it can match the performance of the original implementations
    corecore