27 research outputs found

    Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

    Full text link
    Linear attention is an efficient attention mechanism that has recently emerged as a promising alternative to conventional softmax attention. With its ability to process tokens in linear computational complexities, linear attention, in theory, can handle sequences of unlimited length without sacrificing speed, i.e., maintaining a constant training speed for various sequence lengths with a fixed memory consumption. However, due to the issue with cumulative summation (cumsum), current linear attention algorithms cannot demonstrate their theoretical advantage in a causal setting. In this paper, we present Lightning Attention-2, the first linear attention implementation that enables linear attention to realize its theoretical computational benefits. To achieve this, we leverage the thought of tiling, separately handling the intra-block and inter-block components in linear attention calculation. Specifically, we utilize the conventional attention computation mechanism for the intra-blocks and apply linear attention kernel tricks for the inter-blocks. A tiling technique is adopted through both forward and backward procedures to take full advantage of the GPU hardware. We implement our algorithm in Triton to make it IO-aware and hardware-friendly. Various experiments are conducted on different model sizes and sequence lengths. Lightning Attention-2 retains consistent training and inference speed regardless of input sequence length and is significantly faster than other attention mechanisms. The source code is available at https://github.com/OpenNLPLab/lightning-attention.Comment: Technical Report. Yiran Zhong is the corresponding author. The source code is available at https://github.com/OpenNLPLab/lightning-attentio

    TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer

    Full text link
    We present TransNormerLLM, the first linear attention-based Large Language Model (LLM) that outperforms conventional softmax attention-based models in terms of both accuracy and efficiency. TransNormerLLM evolves from the previous linear attention architecture TransNormer by making advanced modifications that include positional embedding, linear attention acceleration, gating mechanisms, tensor normalization, and inference acceleration and stabilization. Specifically, we use LRPE together with an exponential decay to avoid attention dilution issues while allowing the model to retain global interactions between tokens. Additionally, we propose Lightning Attention, a cutting-edge technique that accelerates linear attention by more than twice in runtime and reduces memory usage by a remarkable four times. To further enhance the performance of TransNormer, we leverage a gating mechanism for smooth training and a new tensor normalization scheme to accelerate the model, resulting in an impressive acceleration of over 20%20\%. Furthermore, we develop a robust inference algorithm that ensures numerical stability and consistent inference speed, regardless of the sequence length, showcasing superior efficiency during both training and inference stages. We also implement an efficient model parallel schema for TransNormerLLM, enabling seamless deployment on large-scale clusters and facilitating expansion to even more extensive models, i.e., LLMs with 175B parameters. We validate our model design through a series of ablations and train models with sizes of 385M, 1B, and 7B on our self-collected corpus. Benchmark results demonstrate that our models not only match the performance of state-of-the-art LLMs with Transformer but are also significantly faster. Code is released at: https://github.com/OpenNLPLab/TransnormerLLM.Comment: Technical Report. Yiran Zhong is the corresponding author. Zhen Qin, Dong Li, Weigao Sun, Weixuan Sun, Xuyang Shen contribute equally to this paper. Code is released at: https://github.com/OpenNLPLab/TransnormerLL

    On the Management of Large-Diameter Trees in Chinaā€™s Forests

    No full text
    Large-diameter trees have mainly been used for timber production in forestry practices. Recently, their critical roles played in biodiversity conservation and maintenance of ecosystem functions have been recognized. However, current forestry policy on the management of large-diameter trees is weak. As China is the biggest consumer of large-diameter timbers, how to maintain sustainable large-diameter timber resources as well as maximize ecological functions of the forests is a critical question to address. Here we summarize historical uses, distribution patterns, and management strategies of large-diameter trees in China. We found that large-diameter trees are mainly distributed in old-growth forests. Although China’s forest cover has increased rapidly in the past decades, large-diameter trees are rarely found in plantation forests and secondary forests. We suggest that knowledge of large-diameter trees should be widely disseminated in local forestry departments, especially their irreplaceable value in terms of biodiversity conservation and ecosystem functions. Protection of large-diameter trees, especially those in old-growth forests, is critical for sustainable forestry. To meet the increasing demand of large-diameter timbers, plantation forests and secondary forests should apply forest density management with thinning to cultivate more large-diameter trees

    Spatiotemporal Variations of Aboveground Biomass under Different Terrain Conditions

    No full text
    Biomass is a key biophysical parameter used to estimate carbon storage and forest productivity. Spatially-explicit estimation of biomass provides invaluable information for carbon stock calculation and scientific forest management. Nevertheless, there still exists large uncertainty concerning the relationship between biomass and influential factors. In this study, aboveground biomass (AGB) was estimated using the random forest algorithm based on remote sensing imagery (Landsat) and field data for three regions with different topographic conditions in Zhejiang Province, China. AGB distribution and change combined with stratified terrain classifications were analyzed to investigate the relations between AGB and topography conditions. The results indicated that AGB in three regions increased from 2010 to 2015 and the magnitude of growth varied with elevation, slope, and aspect. In the basin region, slope had a greater influence on AGB, and we attributed this negative AGB-elevation relationship to ecological forest construction. In the mountain area, terrain features, especially elevation, showed significant relations with AGB. Moreover, AGB and its growth showed positive relations with elevation and slope. In the island region, slope also played a relatively more important role in explaining the relationship. These results demonstrate that AGB varies with terrain conditions and its change is a consequence of interactions between the natural environment and anthropogenic behavior, implying that biomass retrieval based on Landsat imagery could provide considerable important information related to regional heterogeneity investigations

    Thermal conductivity of suspended single crystal CH3NH3PbI3 platelets at room temperature

    No full text
    Recently, organicā€“inorganic lead halide perovskites have gained great attention for their breakthrough in photovoltaic and optoelectronics. However, their thermal transport properties that affect the device lifetime and stability are still rarely explored. In this work, the thermal conductivity properties of single crystal CH3NH3PbI3 platelets grown by chemical vapor deposition are studied via non-contact micro-photoluminescence (PL) spectroscopy. We developed a measurement methodology and derived expressions suitable for the thermal conductivity extraction for micro-sized perovskites. The room temperature thermal conductivity of āˆ¼0.14 Ā± 0.02 W māˆ’1 Kāˆ’1 is extracted from the dependence of the PL peak energy on the excitation laser power. On changing the film thickness from 80 to 400 nm, the thermal conductivity does not show noticeable variations, indicating the minimal substrate effects due to the advantage of the suspended configuration. The ultra-low thermal conductivity of perovskites, especially thin films, suggests their promising applications for thermal isolation, such as thermal insulation and thermo-electricity.NRF (Natl Research Foundation, Sā€™pore)MOE (Min. of Education, Sā€™pore)Accepted versio
    corecore