Search CORE

27 research outputs found

Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

Author: Li Dong
Qin Zhen
Shen Xuyang
Sun Weigao
Sun Weixuan
Zhong Yiran
Publication venue
Publication date: 15/01/2024
Field of study

Linear attention is an efficient attention mechanism that has recently emerged as a promising alternative to conventional softmax attention. With its ability to process tokens in linear computational complexities, linear attention, in theory, can handle sequences of unlimited length without sacrificing speed, i.e., maintaining a constant training speed for various sequence lengths with a fixed memory consumption. However, due to the issue with cumulative summation (cumsum), current linear attention algorithms cannot demonstrate their theoretical advantage in a causal setting. In this paper, we present Lightning Attention-2, the first linear attention implementation that enables linear attention to realize its theoretical computational benefits. To achieve this, we leverage the thought of tiling, separately handling the intra-block and inter-block components in linear attention calculation. Specifically, we utilize the conventional attention computation mechanism for the intra-blocks and apply linear attention kernel tricks for the inter-blocks. A tiling technique is adopted through both forward and backward procedures to take full advantage of the GPU hardware. We implement our algorithm in Triton to make it IO-aware and hardware-friendly. Various experiments are conducted on different model sizes and sequence lengths. Lightning Attention-2 retains consistent training and inference speed regardless of input sequence length and is significantly faster than other attention mechanisms. The source code is available at https://github.com/OpenNLPLab/lightning-attention.Comment: Technical Report. Yiran Zhong is the corresponding author. The source code is available at https://github.com/OpenNLPLab/lightning-attentio

arXiv.org e-Print Archive

TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer

Author: Han Xiaodong
Li Dong
Luo Xiao
Lv Baohong
Qiao Yu
Qin Zhen
Shen Xuyang
Sun Weigao
Sun Weixuan
Wei Yunshen
Zhong Yiran
Publication venue
Publication date: 19/01/2024
Field of study

We present TransNormerLLM, the first linear attention-based Large Language Model (LLM) that outperforms conventional softmax attention-based models in terms of both accuracy and efficiency. TransNormerLLM evolves from the previous linear attention architecture TransNormer by making advanced modifications that include positional embedding, linear attention acceleration, gating mechanisms, tensor normalization, and inference acceleration and stabilization. Specifically, we use LRPE together with an exponential decay to avoid attention dilution issues while allowing the model to retain global interactions between tokens. Additionally, we propose Lightning Attention, a cutting-edge technique that accelerates linear attention by more than twice in runtime and reduces memory usage by a remarkable four times. To further enhance the performance of TransNormer, we leverage a gating mechanism for smooth training and a new tensor normalization scheme to accelerate the model, resulting in an impressive acceleration of over

20\%

. Furthermore, we develop a robust inference algorithm that ensures numerical stability and consistent inference speed, regardless of the sequence length, showcasing superior efficiency during both training and inference stages. We also implement an efficient model parallel schema for TransNormerLLM, enabling seamless deployment on large-scale clusters and facilitating expansion to even more extensive models, i.e., LLMs with 175B parameters. We validate our model design through a series of ablations and train models with sizes of 385M, 1B, and 7B on our self-collected corpus. Benchmark results demonstrate that our models not only match the performance of state-of-the-art LLMs with Transformer but are also significantly faster. Code is released at: https://github.com/OpenNLPLab/TransnormerLLM.Comment: Technical Report. Yiran Zhong is the corresponding author. Zhen Qin, Dong Li, Weigao Sun, Weixuan Sun, Xuyang Shen contribute equally to this paper. Code is released at: https://github.com/OpenNLPLab/TransnormerLL

arXiv.org e-Print Archive

Recommended from our members

Stabilizing *CO2 Intermediates at the Acidic Interface using Molecularly Dispersed Cobalt Phthalocyanine as Catalysts for CO2 Reduction

Author: Cheng Dongfang
Fang Susu
Feng Shijia
Ji Liyao
Liang Yongye
Luo Yao
Sautet Philippe
Shen Mengxin
Wang Jingyang
Wang Xiaojun
Xu Weigao
Zhang Xing
Zhao Wei
Zheng Hongzhi
Zhu Jia
Publication venue: eScholarship, University of California
Publication date: 19/02/2024
Field of study

CO2 electroreduction (CO2 R) operating in acidic media circumvents the problems of carbonate formation and CO2 crossover in neutral/alkaline electrolyzers. Alkali cations have been universally recognized as indispensable components for acidic CO2 R, while they cause the inevitable issue of salt precipitation. It is therefore desirable to realize alkali-cation-free CO2 R in pure acid. However, without alkali cations, stabilizing *CO2 intermediates by catalyst itself at the acidic interface poses as a challenge. Herein, we first demonstrate that a carbon nanotube-supported molecularly dispersed cobalt phthalocyanine (CoPc@CNT) catalyst provides the Co single-atom active site with energetically localized d states to strengthen the adsorbate-surface interactions, which stabilizes *CO2 intermediates at the acidic interface (pH=1). As a result, we realize CO2 conversion to CO in pure acid with a faradaic efficiency of 60 % at pH=2 in flow cell. Furthermore, CO2 is successfully converted in cation exchanged membrane-based electrode assembly with a faradaic efficiency of 73 %. For CoPc@CNT, acidic conditions also promote the intrinsic activity of CO2 R compared to alkaline conditions, since the potential-limiting step, *CO2 to *COOH, is pH-dependent. This work provides a new understanding for the stabilization of reaction intermediates and facilitates the designs of catalysts and devices for acidic CO2 R

eScholarship - University of California

Immune analysis of expression of IL-17 relative ligands and their receptors in bladder cancer: comparison with polyp and cystitis

Author: A Angulo De
A Boltjes
A Dabrowska
A Takeuchi
AJ Robertson
B Ruffell
C Lamagna
H Takahashi
J Ferlay
Jing Jiang
JK Kolls
L Wang
Lijing Zhao
M Doroudchi
M Numasaki
M Ploeg
PM Hanno
Qinlong Hou
R Baharlou
R Baharlou
R Divekar
R Pappu
RH Young
SH Chang
Sun Ying
T Benatar
T Starnes
Wanguo Yang
Weigao Shen
X Song
Y Liu
Y Logadottir
Yanbo Liu
Zhenjiang Wang
Zuowen Liang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Oscillations of second-order nonlinear impulsive ordinary differential equations

Author: Alexander
Bainov
Bainov
Chen
Gopalsamy
Lakshmikantham
Shen
Weigao Ge
Zhang
Zhimin He
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

On the Management of Large-Diameter Trees in China’s Forests

Author: Aihua Shen
Bo Jiang
Chuping Wu
Jiajia Liu
Shenhao Yao
Shuzhen Yang
Weigao Yuan
Publication venue: 'MDPI AG'
Publication date: 16/01/2020
Field of study

Large-diameter trees have mainly been used for timber production in forestry practices. Recently, their critical roles played in biodiversity conservation and maintenance of ecosystem functions have been recognized. However, current forestry policy on the management of large-diameter trees is weak. As China is the biggest consumer of large-diameter timbers, how to maintain sustainable large-diameter timber resources as well as maximize ecological functions of the forests is a critical question to address. Here we summarize historical uses, distribution patterns, and management strategies of large-diameter trees in China. We found that large-diameter trees are mainly distributed in old-growth forests. Although China’s forest cover has increased rapidly in the past decades, large-diameter trees are rarely found in plantation forests and secondary forests. We suggest that knowledge of large-diameter trees should be widely disseminated in local forestry departments, especially their irreplaceable value in terms of biodiversity conservation and ecosystem functions. Protection of large-diameter trees, especially those in old-growth forests, is critical for sustainable forestry. To meet the increasing demand of large-diameter timbers, plantation forests and secondary forests should apply forest density management with thinning to cultivate more large-diameter trees

Multidisciplinary Digital Publishing Institute

Expression and location of IL-17A, E, F and their receptors in colorectal adenocarcinoma: comparison with benign intestinal disease

Author: An Liping
Jiang Jing
Liu Yanbo
Shen Weigao
Sun Xuemei
Sun Ying
Wang Zhuxing
Yang Xueliang
Zhao Xiaohui
Publication venue: 'Elsevier BV'
Publication date: 07/03/2018
Field of study

Crossref

King's Research Portal

Spatiotemporal Variations of Aboveground Biomass under Different Terrain Conditions

Author: Aihua Shen
Bo Jiang
Chaofan Wu
Chuping Wu
Enyan Zhu
Jinsong Deng
Ke Wang
Shan He
Weigao Yuan
Yue Lin
Publication venue: 'MDPI AG'
Publication date: 01/12/2018
Field of study

Biomass is a key biophysical parameter used to estimate carbon storage and forest productivity. Spatially-explicit estimation of biomass provides invaluable information for carbon stock calculation and scientific forest management. Nevertheless, there still exists large uncertainty concerning the relationship between biomass and influential factors. In this study, aboveground biomass (AGB) was estimated using the random forest algorithm based on remote sensing imagery (Landsat) and field data for three regions with different topographic conditions in Zhejiang Province, China. AGB distribution and change combined with stratified terrain classifications were analyzed to investigate the relations between AGB and topography conditions. The results indicated that AGB in three regions increased from 2010 to 2015 and the magnitude of growth varied with elevation, slope, and aspect. In the basin region, slope had a greater influence on AGB, and we attributed this negative AGB-elevation relationship to ecological forest construction. In the mountain area, terrain features, especially elevation, showed significant relations with AGB. Moreover, AGB and its growth showed positive relations with elevation and slope. In the island region, slope also played a relatively more important role in explaining the relationship. These results demonstrate that AGB varies with terrain conditions and its change is a consequence of interactions between the natural environment and anthropogenic behavior, implying that biomass retrieval based on Landsat imagery could provide considerable important information related to regional heterogeneity investigations

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Thermal conductivity of suspended single crystal CH3NH3PbI3 platelets at room temperature

Author: Du Wenna
Ha Son Tung
Liu Xinfeng
Shang Qiuyu
Shen Chao
Wu Zhiyong
Xing Jun
Xiong Qihua
Xu Weigao
Zhang Qing
Publication venue: 'Royal Society of Chemistry (RSC)'
Publication date: 01/01/2017
Field of study

Recently, organic–inorganic lead halide perovskites have gained great attention for their breakthrough in photovoltaic and optoelectronics. However, their thermal transport properties that affect the device lifetime and stability are still rarely explored. In this work, the thermal conductivity properties of single crystal CH3NH3PbI3 platelets grown by chemical vapor deposition are studied via non-contact micro-photoluminescence (PL) spectroscopy. We developed a measurement methodology and derived expressions suitable for the thermal conductivity extraction for micro-sized perovskites. The room temperature thermal conductivity of ∼0.14 ± 0.02 W m−1 K−1 is extracted from the dependence of the PL peak energy on the excitation laser power. On changing the film thickness from 80 to 400 nm, the thermal conductivity does not show noticeable variations, indicating the minimal substrate effects due to the advantage of the suspended configuration. The ultra-low thermal conductivity of perovskites, especially thin films, suggests their promising applications for thermal isolation, such as thermal insulation and thermo-electricity.NRF (Natl Research Foundation, S’pore)MOE (Min. of Education, S’pore)Accepted versio

Crossref

Knowledge Repository of SEMI,CAS

DR-NTU (Digital Repository of NTU)

Oscillatory criteria for third-order nonlinear difference equation with impulses

Author: Gimenes
Gimenes
Jankowski
Lakshmikantham
Li
Li
Liang
Luo
Peng
Peng
Peng
Shen
Wan
Wei
Wei Lu
Weigao Ge
Wu
Zhihong Zhao
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref