136 research outputs found

    Efficient Streaming Language Models with Attention Sinks

    Full text link
    Deploying Large Language Models (LLMs) in streaming applications such as multi-round dialogue, where long interactions are expected, is urgently needed but poses two major challenges. Firstly, during the decoding stage, caching previous tokens' Key and Value states (KV) consumes extensive memory. Secondly, popular LLMs cannot generalize to longer texts than the training sequence length. Window attention, where only the most recent KVs are cached, is a natural approach -- but we show that it fails when the text length surpasses the cache size. We observe an interesting phenomenon, namely attention sink, that keeping the KV of initial tokens will largely recover the performance of window attention. In this paper, we first demonstrate that the emergence of attention sink is due to the strong attention scores towards initial tokens as a "sink" even if they are not semantically important. Based on the above analysis, we introduce StreamingLLM, an efficient framework that enables LLMs trained with a finite length attention window to generalize to infinite sequence lengths without any fine-tuning. We show that StreamingLLM can enable Llama-2, MPT, Falcon, and Pythia to perform stable and efficient language modeling with up to 4 million tokens and more. In addition, we discover that adding a placeholder token as a dedicated attention sink during pre-training can further improve streaming deployment. In streaming settings, StreamingLLM outperforms the sliding window recomputation baseline by up to 22.2x speedup. Code and datasets are provided at https://github.com/mit-han-lab/streaming-llm.Comment: ICLR 202

    SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

    Full text link
    Large language models (LLMs) show excellent performance but are compute- and memory-intensive. Quantization can reduce memory and accelerate inference. However, for LLMs beyond 100 billion parameters, existing methods cannot maintain accuracy or do not run efficiently on hardware. We propose SmoothQuant, a training-free, accuracy-preserving, and general-purpose post-training quantization (PTQ) solution to enable 8-bit weight, 8-bit activation (W8A8) quantization for LLMs that can be implemented efficiently. We observe that systematic outliers appear at fixed activation channels. Based on the fact that weights are easy to quantize while activations are not, SmoothQuant smooths the activation outliers by offline migrating the quantization difficulty from activations to weights with a mathematically equivalent transformation. SmoothQuant enables an INT8 quantization of both weights and activations for all the GEMMs in LLMs, including OPT-175B, BLOOM-176B, and GLM-130B. SmoothQuant has better hardware efficiency than existing techniques using mixed-precision activation quantization or weight-only quantization. We demonstrate up to 1.56x speedup and 2x memory reduction for LLMs with negligible loss in accuracy. Thanks to the hardware-friendly design, we integrate SmoothQuant into FasterTransformer, a state-of-the-art LLM serving framework, and achieve faster inference speed with half the number of GPUs compared to FP16. Our work offers a turn-key solution that reduces hardware costs and democratizes LLMs. Code is available at: https://github.com/mit-han-lab/smoothquant.Comment: The first two authors contributed equally to this wor

    The Ecological Restoration of Heavily Degraded Saline Wetland in the Yellow River Delta

    Get PDF
    As a result of discontinuous water flow, agriculture, and increasing urban use of fresh water affecting the natural wetlands of the Yellow River Delta, these areas have experienced significant degradation in the past two decades, ultimately diminishing the overall natural wetland land area in the region. This study aimed to address the issue of decreasing fresh water in the Yellow River Delta by studying the effects of three different approaches to restoration on long-term wetland recovery. The results of the study demonstrated that soil salt and available Na contents significantly decreased in response to all three restoration treatments. Impacts of the restoration treatments were more significant in 2009 than in 2010, as shown by the high rate of activity in the reed debris group. The highest phosphatase activity of the experimental period was also observed in the reed debris group. Meanwhile, a marked variation in soil nutrient elements (total carbon (TC), total nitrogen (TN), available phosphorus, and available potassium) was observed in the restoration treatment plots throughout the experimental period. TC and TN contents were generally higher in the restoration treatment groups than in the control group. Moreover, urease and phosphatase activity levels were highly correlated with one another, as well as with soil nutrient elements. In 2009, the yield of the Suaeda salsa plant was highest in the reed debris treatment group and lowest in the ploughing treatment group. The S. salsa plant did show a positive response to all of the different restoration treatments. Taken together, these results suggest that restoration approaches that implement ploughing techniques aided in the restoration of degraded saline wetlands.As a result of discontinuous water flow, agriculture, and increasing urban use of fresh water affecting the natural wetlands of the Yellow River Delta, these areas have experienced significant degradation in the past two decades, ultimately diminishing the overall natural wetland land area in the region. This study aimed to address the issue of decreasing fresh water in the Yellow River Delta by studying the effects of three different approaches to restoration on long-term wetland recovery. The results of the study demonstrated that soil salt and available Na contents significantly decreased in response to all three restoration treatments. Impacts of the restoration treatments were more significant in 2009 than in 2010, as shown by the high rate of activity in the reed debris group. The highest phosphatase activity of the experimental period was also observed in the reed debris group. Meanwhile, a marked variation in soil nutrient elements (total carbon (TC), total nitrogen (TN), available phosphorus, and available potassium) was observed in the restoration treatment plots throughout the experimental period. TC and TN contents were generally higher in the restoration treatment groups than in the control group. Moreover, urease and phosphatase activity levels were highly correlated with one another, as well as with soil nutrient elements. In 2009, the yield of the Suaeda salsa plant was highest in the reed debris treatment group and lowest in the ploughing treatment group. The S. salsa plant did show a positive response to all of the different restoration treatments. Taken together, these results suggest that restoration approaches that implement ploughing techniques aided in the restoration of degraded saline wetlands

    BitDelta: Your Fine-Tune May Only Be Worth One Bit

    Full text link
    Large Language Models (LLMs) are typically trained in two phases: pre-training on large internet-scale datasets, and fine-tuning for downstream tasks. Given the higher computational demand of pre-training, it's intuitive to assume that fine-tuning adds less new information to the model, and is thus more compressible. We explore this assumption by decomposing the weights of fine-tuned models into their pre-trained components and an additional delta. We introduce a simple method, BitDelta, which successfully quantizes this delta down to 1 bit without compromising performance. This interesting finding not only highlights the potential redundancy of information added during fine-tuning, but also has significant implications for the multi-tenant serving and multi-tenant storage of fine-tuned models. By enabling the use of a single high-precision base model accompanied by multiple 1-bit deltas, BitDelta dramatically reduces GPU memory requirements by more than 10x, which can also be translated to enhanced generation latency in multi-tenant settings. We validate BitDelta through experiments across Llama-2 and Mistral model families, and on models up to 70B parameters, showcasing minimal performance degradation over all tested settings

    Moderate increase of precipitation stimulates CO2 production by regulating soil organic carbon in a saltmarsh

    Get PDF
    Saltmarsh is widely recognized as a blue carbon ecosystem with great carbon storage potential. Yet soil respiration with a major contributor of atmospheric CO2 can offset its carbon sink function. Up to date, mechanisms ruling CO2 emissions from saltmarsh soil remain unclear. In particular, the effect of precipitation on soil CO2 emissions is unclear in coastal wetlands, due the lack of outdoor data in real situations. We conducted a 7-year field manipulation experiment in a saltmarsh in the Yellow River Delta, China. Soil respiration in five treatments (−60%, −40%, +0%, +40%, and + 60% of precipitation) was measured in the field. Topsoils from the last 3 years (2019–2021) were analyzed for CO2 production potential by microcosm experiments. Furthermore, quality and quantity of soil organic carbon and microbial function were tested. Results show that only the moderate precipitation rise of +40% induced a 66.2% increase of CO2 production potential for the microcosm experiments, whereas other data showed a weak impact. Consistently, soil respiration was also found to be strongest at +40%. The CO2 production potential is positively correlated with soil organic carbon, including carbon quantity and quality. But microbial diversity did not show any positive response to precipitation sizes. r-/K-strategy seemed to be a plausible explanation for biological factors. Overall, our finding reveal that a moderate precipitation increase, not decrease or a robust increase, in a saltmarsh is likely to improve soil organic carbon quality and quantity, and bacterial oligotroph:copiotroph ratio, ultimately leading to an enhanced CO2 production

    Atmospheric nitrous acid (HONO) at a rural coastal site in North China: Seasonal variations and effects of biomass burning

    Get PDF
    Nitrous acid (HONO) plays a significant role in atmospheric chemistry due to its contribution to hydroxyl radical (OH). However, no scientific consensus has been achieved about the daytime HONO formation mechanisms. To identify the seasonal variations of HONO chemistry and the impacts of biomass burning (BB), we performed a two-phased field study in winter-spring and summer (covering a harvest season) in 2017 at a rural coastal site in North China. Though the mean HONO concentration in winter-spring (0.26 +/- 0.28 ppbv) was higher than in summer (0.17 + 0.19 ppbv), the maximum HONO concentrations were comparable (similar to 2 ppbv) in the two campaigns. Both the HONO/NOx ratio and nocturnal heterogeneous conversion efficiency of HONO (C-HONO) in summer were over twice of that in winter-spring. The daytime budget analysis also revealed that the strength of P(othe)r (i.e., the HONO sources apart from the reaction of OH + NO) in summer was double of that in winter-spring. BB affected the HONO concentration by enhancing the contribution of heterogeneous HONO production on the aerosol surface but weakening the role of photo-related HONO formation. HONO photolysis was a significant source of OH in both winter-spring and summer, and its contribution could be further enhanced during the BB episode in summer. Our study demonstrates the significant seasonal variations of HONO and the effects of BB, and suggests needs for more multi-season observations and considerations of BB, especially during the harvest time, in HONO research

    Wet and Dry Atmospheric Depositions of Inorganic Nitrogen during Plant Growing Season in the Coastal Zone of Yellow River Delta

    Get PDF
    The ecological problems caused by dry and wet deposition of atmospheric nitrogen have been widespread concern in the world. In this study, wet and dry atmospheric depositions were monitored in plant growing season in the coastal zone of the Yellow River Delta (YRD) using automatic sampling equipment. The results showed that SO42- and Na+ were the predominant anion and cation, respectively, in both wet and dry atmospheric depositions. The total atmospheric nitrogen deposition was ~2264.24 mg m−2, in which dry atmospheric nitrogen deposition was about 32.02%. The highest values of dry and wet atmospheric nitrogen deposition appeared in May and August, respectively. In the studied area, NO3-–N was the main nitrogen form in dry deposition, while the predominant nitrogen in wet atmospheric deposition was NH4+–N with ~56.51% of total wet atmospheric nitrogen deposition. The average monthly attribution rate of atmospheric deposition of NO3-–N and NH4+–N was ~31.38% and ~20.50% for the contents of NO3-–N and NH4+–N in 0–10 cm soil layer, respectively, suggested that the atmospheric nitrogen was one of main sources for soil nitrogen in coastal zone of the YRD

    Fungi and cercozoa regulate methane-associated prokaryotes in wetland methane emissions

    Get PDF
    Wetlands are natural sources of methane (CH4) emissions, providing the largest contribution to the atmospheric CH4 pool. Changes in the ecohydrological environment of coastal salt marshes, especially the surface inundation level, cause instability in the CH4 emission levels of coastal ecosystems. Although soil methane-associated microorganisms play key roles in both CH4 generation and metabolism, how other microorganisms regulate methane emission and their responses to inundation has not been investigated. Here, we studied the responses of prokaryotic, fungal and cercozoan communities following 5 years of inundation treatments in a wetland experimental site, and molecular ecological networks analysis (MENs) was constructed to characterize the interdomain relationship. The result showed that the degree of inundation significantly altered the CH4 emissions, and the abundance of the pmoA gene for methanotrophs shifted more significantly than the mcrA gene for methanogens, and they both showed significant positive correlations to methane flux. Additionally, we found inundation significantly altered the diversity of the prokaryotic and fungal communities, as well as the composition of key species in interactions within prokaryotic, fungal, and cercozoan communities. Mantel tests indicated that the structure of the three communities showed significant correlations to methane emissions (p < 0.05), suggesting that all three microbial communities directly or indirectly contributed to the methane emissions of this ecosystem. Correspondingly, the interdomain networks among microbial communities revealed that methane-associated prokaryotic and cercozoan OTUs were all keystone taxa. Methane-associated OTUs were more likely to interact in pairs and correlated negatively with the fungal and cercozoan communities. In addition, the modules significantly positively correlated with methane flux were affected by environmental stress (i.e., pH) and soil nutrients (i.e., total nitrogen, total phosphorus and organic matter), suggesting that these factors tend to positively regulate methane flux by regulating microbial relationships under inundation. Our findings demonstrated that the inundation altered microbial communities in coastal wetlands, and the fungal and cercozoan communities played vital roles in regulating methane emission through microbial interactions with the methane-associated community

    The ecological adaptability of Phragmites australis to interactive effects of water level and salt stress in the Yellow River Delta

    Get PDF
    Soil salinity and waterlogging are two major environmental problems in estuarine wetlands. To prevent the typical wetland plants from degradation by soil salinization and salt waterlogging and more effectively use the plants to provide wetland ecosystem services, we examined the ecological adaptability of Phragmites australis, a characteristic plant species in the Yellow River Delta, to the interactive effects of water level and salt stress. The results showed that P. australis adapts to salt and water table stressed environments through slowing down the growth rate, maintaining the tiller number, and adjusting the biomass allocation of different organs. The highest plant height and the largest leaf area were at 0 cm water table treatment; the 0.5 % NaCl treatment increased the aboveground biomass; higher water table increased the fibrous root biomass allocation, but largely decreased the leaf biomass. The exclusion of toxic inorganic ions such as Na+ and Cl- and the accumulation of organic solutes are also important mechanisms to aid survival in saline wetlands. On average 35.1 % of Cl- and 53.9 % of Na+ accumulated in belowground organs. The study could provide fundamental guidance for wetland restoration projects and wetland sustainable use in coastal zones such as the Yellow River Delta
    • …
    corecore