457 research outputs found
Extending LLMs' Context Window with 100 Samples
Large Language Models (LLMs) are known to have limited extrapolation ability
beyond their pre-trained context window, constraining their application in
downstream tasks with lengthy inputs. Recent studies have sought to extend
LLMs' context window by modifying rotary position embedding (RoPE), a popular
position encoding method adopted by well-known LLMs such as LLaMA, PaLM, and
GPT-NeoX. However, prior works like Position Interpolation (PI) and YaRN are
resource-intensive and lack comparative experiments to assess their
applicability. In this work, we identify the inherent need for LLMs' attention
entropy (i.e. the information entropy of attention scores) to maintain
stability and introduce a novel extension to RoPE which combines adjusting
RoPE's base frequency and scaling the attention logits to help LLMs efficiently
adapt to a larger context window. We validate the superiority of our method in
both fine-tuning performance and robustness across different context window
sizes on various context-demanding tasks. Notably, our method extends the
context window of LLaMA-2-7B-Chat to 16,384 with only 100 samples and 6
training steps, showcasing extraordinary efficiency. Finally, we also explore
how data compositions and training curricula affect context window extension
for specific downstream tasks, suggesting fine-tuning LLMs with lengthy
conversations as a good starting point. We release our code and SFT data at
https://github.com/GAIR-NLP/Entropy-ABF
Self-Prompting Large Language Models for Zero-Shot Open-Domain QA
Open-Domain Question Answering (ODQA) aims to answer questions without
explicitly providing specific background documents. This task becomes notably
challenging in a zero-shot setting where no data is available to train tailored
retrieval-reader models. While recent Large Language Models (LLMs) like GPT-3
have demonstrated their effectiveness in zero-shot ODQA using direct prompting
methods, these methods still fall short of fully harnessing the potential of
LLMs when implicitly invoked. In this paper, we propose a Self-Prompting
framework to explicitly utilize the massive knowledge encoded in the parameters
of LLMs and their strong instruction understanding abilities. Concretely, we
prompt LLMs step by step to generate multiple pseudo QA pairs with background
passages and explanations entirely from scratch. These generated elements are
then utilized for in-context learning. Experimental results show that our
method significantly surpasses previous state-of-the-art zero-shot methods on
three widely-used ODQA datasets and even achieves comparable performance with
various customized fine-tuned models on full training data. Our code is
available at https://github.com/lockon-n/self-prompting.Comment: NAACL 202
Government regulation of emergency supplies under the epidemic crisis
This paper constructs a multi-oligopoly model of emergency supplies and analyses the market equilibrium results under normal
conditions and epidemic conditions. The impacts of the degree of
change in market demand, externalities, the material cost of
emergency supplies and government regulation on the equilibrium results, especially on the prices of emergency supplies, are
discussed. The results show that an increase in material cost will
lead to low output and social welfare and a high price, under
either normal conditions or epidemic conditions. Moreover, under
epidemic conditions, the degree of change in market demand,
externalities, material cost and the presence and mode of government regulation all have multiple and complex influences on the
equilibrium results. Under epidemic conditions, both government
output and price regulation can increase the supply of emergency
supplies. In addition, when market demand changes drastically,
consumer surplus and social welfare can be enhanced by the
implementation of regulations. Particularly, price regulation is
more effective when there is a high material cost
Further Development of the Improved QMD Model and its Applications to Fusion Reaction near Barrier
The Improved Quantum Molecular Dynamics model is further developed by
introducing new parameters in interaction potential energy functional based on
Skyrme interaction of SkM and SLy series. The properties of ground states
of selected nuclei can be reproduced very well. The Coulomb barriers for a
series of reaction systems are studied and compared with the results of the
proximity potential. The fusion excitation functions for a series of fusion
reactions are calculated and the results are in good agreement with
experimental data.Comment: 17 pages, 10 figures, PRC accepte
Spatial variation of perceived equity and its determinants in a gateway community of Giant Panda National Park, China
Unidad de excelencia María de Maeztu CEX2019-000940-MSocial equity is essential in the governance of protected areas (PAs), as ignoring such consideration can lead to resistance and jeopardize conservation objectives. However, more research is required to understand the spatial heterogeneity of perceived social equity and its underlying spatial factors. Using a survey of 361 respondents, we presented spatial distribution patterns of perceived equity by kernel density estimation (KDE) in Giant Panda National Park, China. The regression analysis showed that local residents who live closer to the PA boundary are more likely to develop negative responses and those who with easy access to tourism spots have more positive procedural and distributional perceptions. Notably, the proximity to the PA authority decreases locals' perceptions of fairness in all aspects, which is potentially due to the opaque participative channels provided by the PA authority. We argue that those spatial differentials in fairness perceptions are driven by the intrinsic discrepancy of biodiversity protection requirements and the unevenly distributed consequences of management policies. Key steps to advance social equity considerations include multi-industry guidance, extending participative channels, and co-producing better compensation plans. Herein, this study appeals to a greater focus on the spatial aspect of social equity issues in PAs
Task-specific Objectives of Pre-trained Language Models for Dialogue Adaptation
Pre-trained Language Models (PrLMs) have been widely used as backbones in
lots of Natural Language Processing (NLP) tasks. The common process of
utilizing PrLMs is first pre-training on large-scale general corpora with
task-independent LM training objectives, then fine-tuning on task datasets with
task-specific training objectives. Pre-training in a task-independent way
enables the models to learn language representations, which is universal to
some extent, but fails to capture crucial task-specific features in the
meantime. This will lead to an incompatibility between pre-training and
fine-tuning. To address this issue, we introduce task-specific pre-training on
in-domain task-related corpora with task-specific objectives. This procedure is
placed between the original two stages to enhance the model understanding
capacity of specific tasks. In this work, we focus on Dialogue-related Natural
Language Processing (DrNLP) tasks and design a Dialogue-Adaptive Pre-training
Objective (DAPO) based on some important qualities for assessing dialogues
which are usually ignored by general LM pre-training objectives. PrLMs with
DAPO on a large in-domain dialogue corpus are then fine-tuned for downstream
DrNLP tasks. Experimental results show that models with DAPO surpass those with
general LM pre-training objectives and other strong baselines on downstream
DrNLP tasks
- …