27 research outputs found
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Linear attention is an efficient attention mechanism that has recently
emerged as a promising alternative to conventional softmax attention. With its
ability to process tokens in linear computational complexities, linear
attention, in theory, can handle sequences of unlimited length without
sacrificing speed, i.e., maintaining a constant training speed for various
sequence lengths with a fixed memory consumption. However, due to the issue
with cumulative summation (cumsum), current linear attention algorithms cannot
demonstrate their theoretical advantage in a causal setting. In this paper, we
present Lightning Attention-2, the first linear attention implementation that
enables linear attention to realize its theoretical computational benefits. To
achieve this, we leverage the thought of tiling, separately handling the
intra-block and inter-block components in linear attention calculation.
Specifically, we utilize the conventional attention computation mechanism for
the intra-blocks and apply linear attention kernel tricks for the inter-blocks.
A tiling technique is adopted through both forward and backward procedures to
take full advantage of the GPU hardware. We implement our algorithm in Triton
to make it IO-aware and hardware-friendly. Various experiments are conducted on
different model sizes and sequence lengths. Lightning Attention-2 retains
consistent training and inference speed regardless of input sequence length and
is significantly faster than other attention mechanisms. The source code is
available at https://github.com/OpenNLPLab/lightning-attention.Comment: Technical Report. Yiran Zhong is the corresponding author. The source
code is available at https://github.com/OpenNLPLab/lightning-attentio
TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer
We present TransNormerLLM, the first linear attention-based Large Language
Model (LLM) that outperforms conventional softmax attention-based models in
terms of both accuracy and efficiency. TransNormerLLM evolves from the previous
linear attention architecture TransNormer by making advanced modifications that
include positional embedding, linear attention acceleration, gating mechanisms,
tensor normalization, and inference acceleration and stabilization.
Specifically, we use LRPE together with an exponential decay to avoid attention
dilution issues while allowing the model to retain global interactions between
tokens. Additionally, we propose Lightning Attention, a cutting-edge technique
that accelerates linear attention by more than twice in runtime and reduces
memory usage by a remarkable four times. To further enhance the performance of
TransNormer, we leverage a gating mechanism for smooth training and a new
tensor normalization scheme to accelerate the model, resulting in an impressive
acceleration of over . Furthermore, we develop a robust inference
algorithm that ensures numerical stability and consistent inference speed,
regardless of the sequence length, showcasing superior efficiency during both
training and inference stages. We also implement an efficient model parallel
schema for TransNormerLLM, enabling seamless deployment on large-scale clusters
and facilitating expansion to even more extensive models, i.e., LLMs with 175B
parameters. We validate our model design through a series of ablations and
train models with sizes of 385M, 1B, and 7B on our self-collected corpus.
Benchmark results demonstrate that our models not only match the performance of
state-of-the-art LLMs with Transformer but are also significantly faster. Code
is released at: https://github.com/OpenNLPLab/TransnormerLLM.Comment: Technical Report. Yiran Zhong is the corresponding author. Zhen Qin,
Dong Li, Weigao Sun, Weixuan Sun, Xuyang Shen contribute equally to this
paper. Code is released at: https://github.com/OpenNLPLab/TransnormerLL
Recommended from our members
Stabilizing *CO2 Intermediates at the Acidic Interface using Molecularly Dispersed Cobalt Phthalocyanine as Catalysts for CO2 Reduction
CO2 electroreduction (CO2 R) operating in acidic media circumvents the problems of carbonate formation and CO2 crossover in neutral/alkaline electrolyzers. Alkali cations have been universally recognized as indispensable components for acidic CO2 R, while they cause the inevitable issue of salt precipitation. It is therefore desirable to realize alkali-cation-free CO2 R in pure acid. However, without alkali cations, stabilizing *CO2 intermediates by catalyst itself at the acidic interface poses as a challenge. Herein, we first demonstrate that a carbon nanotube-supported molecularly dispersed cobalt phthalocyanine (CoPc@CNT) catalyst provides the Co single-atom active site with energetically localized d states to strengthen the adsorbate-surface interactions, which stabilizes *CO2 intermediates at the acidic interface (pH=1). As a result, we realize CO2 conversion to CO in pure acid with a faradaic efficiency of 60ā% at pH=2 in flow cell. Furthermore, CO2 is successfully converted in cation exchanged membrane-based electrode assembly with a faradaic efficiency of 73ā%. For CoPc@CNT, acidic conditions also promote the intrinsic activity of CO2 R compared to alkaline conditions, since the potential-limiting step, *CO2 to *COOH, is pH-dependent. This work provides a new understanding for the stabilization of reaction intermediates and facilitates the designs of catalysts and devices for acidic CO2 R
On the Management of Large-Diameter Trees in Chinaās Forests
Large-diameter trees have mainly been used for timber production in forestry practices. Recently, their critical roles played in biodiversity conservation and maintenance of ecosystem functions have been recognized. However, current forestry policy on the management of large-diameter trees is weak. As China is the biggest consumer of large-diameter timbers, how to maintain sustainable large-diameter timber resources as well as maximize ecological functions of the forests is a critical question to address. Here we summarize historical uses, distribution patterns, and management strategies of large-diameter trees in China. We found that large-diameter trees are mainly distributed in old-growth forests. Although China’s forest cover has increased rapidly in the past decades, large-diameter trees are rarely found in plantation forests and secondary forests. We suggest that knowledge of large-diameter trees should be widely disseminated in local forestry departments, especially their irreplaceable value in terms of biodiversity conservation and ecosystem functions. Protection of large-diameter trees, especially those in old-growth forests, is critical for sustainable forestry. To meet the increasing demand of large-diameter timbers, plantation forests and secondary forests should apply forest density management with thinning to cultivate more large-diameter trees
Spatiotemporal Variations of Aboveground Biomass under Different Terrain Conditions
Biomass is a key biophysical parameter used to estimate carbon storage and forest productivity. Spatially-explicit estimation of biomass provides invaluable information for carbon stock calculation and scientific forest management. Nevertheless, there still exists large uncertainty concerning the relationship between biomass and influential factors. In this study, aboveground biomass (AGB) was estimated using the random forest algorithm based on remote sensing imagery (Landsat) and field data for three regions with different topographic conditions in Zhejiang Province, China. AGB distribution and change combined with stratified terrain classifications were analyzed to investigate the relations between AGB and topography conditions. The results indicated that AGB in three regions increased from 2010 to 2015 and the magnitude of growth varied with elevation, slope, and aspect. In the basin region, slope had a greater influence on AGB, and we attributed this negative AGB-elevation relationship to ecological forest construction. In the mountain area, terrain features, especially elevation, showed significant relations with AGB. Moreover, AGB and its growth showed positive relations with elevation and slope. In the island region, slope also played a relatively more important role in explaining the relationship. These results demonstrate that AGB varies with terrain conditions and its change is a consequence of interactions between the natural environment and anthropogenic behavior, implying that biomass retrieval based on Landsat imagery could provide considerable important information related to regional heterogeneity investigations
Thermal conductivity of suspended single crystal CH3NH3PbI3 platelets at room temperature
Recently, organicāinorganic lead halide perovskites have gained great attention for their breakthrough in photovoltaic and optoelectronics. However, their thermal transport properties that affect the device lifetime and stability are still rarely explored. In this work, the thermal conductivity properties of single crystal CH3NH3PbI3 platelets grown by chemical vapor deposition are studied via non-contact micro-photoluminescence (PL) spectroscopy. We developed a measurement methodology and derived expressions suitable for the thermal conductivity extraction for micro-sized perovskites. The room temperature thermal conductivity of ā¼0.14 Ā± 0.02 W mā1 Kā1 is extracted from the dependence of the PL peak energy on the excitation laser power. On changing the film thickness from 80 to 400 nm, the thermal conductivity does not show noticeable variations, indicating the minimal substrate effects due to the advantage of the suspended configuration. The ultra-low thermal conductivity of perovskites, especially thin films, suggests their promising applications for thermal isolation, such as thermal insulation and thermo-electricity.NRF (Natl Research Foundation, Sāpore)MOE (Min. of Education, Sāpore)Accepted versio