24 research outputs found
Task Difficulty Aware Parameter Allocation & Regularization for Lifelong Learning
Parameter regularization or allocation methods are effective in overcoming
catastrophic forgetting in lifelong learning. However, they solve all tasks in
a sequence uniformly and ignore the differences in the learning difficulty of
different tasks. So parameter regularization methods face significant
forgetting when learning a new task very different from learned tasks, and
parameter allocation methods face unnecessary parameter overhead when learning
simple tasks. In this paper, we propose the Parameter Allocation &
Regularization (PAR), which adaptively select an appropriate strategy for each
task from parameter allocation and regularization based on its learning
difficulty. A task is easy for a model that has learned tasks related to it and
vice versa. We propose a divergence estimation method based on the
Nearest-Prototype distance to measure the task relatedness using only features
of the new task. Moreover, we propose a time-efficient relatedness-aware
sampling-based architecture search strategy to reduce the parameter overhead
for allocation. Experimental results on multiple benchmarks demonstrate that,
compared with SOTAs, our method is scalable and significantly reduces the
model's redundancy while improving the model's performance. Further qualitative
analysis indicates that PAR obtains reasonable task-relatedness.Comment: Accepted by CVPR2023. Code is available at
https://github.com/WenjinW/PA
TimeBench: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models
Understanding time is a pivotal aspect of human cognition, crucial in the
broader framework of grasping the intricacies of the world. Previous studies
typically focus on specific aspects of time, lacking a comprehensive temporal
reasoning benchmark. To address this issue, we propose TimeBench, a
comprehensive hierarchical temporal reasoning benchmark that covers a broad
spectrum of temporal reasoning phenomena, which provides a thorough evaluation
for investigating the temporal reasoning capabilities of large language models.
We conduct extensive experiments on popular LLMs, such as GPT-4, LLaMA2, and
Mistral, incorporating chain-of-thought prompting. Our experimental results
indicate a significant performance gap between the state-of-the-art LLMs and
humans, highlighting that there is still a considerable distance to cover in
temporal reasoning. We aspire for TimeBench to serve as a comprehensive
benchmark, fostering research in temporal reasoning for LLMs. Our resource is
available at https://github.com/zchuz/TimeBenchComment: Resources at: https://github.com/zchuz/TimeBenc
Mixed Distillation Helps Smaller Language Model Better Reasoning
While large language models (LLMs) have demonstrated exceptional performance
in recent natural language processing (NLP) tasks, their deployment poses
substantial challenges due to high computational and memory demands in
real-world applications. Recent studies have focused on enhancing smaller
models through knowledge distillation from LLMs, yielding promising results.
However, these models often struggle to match the performance of LLMs,
especially in tasks that require reasoning. In this work, we introduce Mixed
Distillation (MD) framework, which capitalizes on the strengths of Program of
Thought (PoT) and Chain of Thought (CoT) capabilities within LLMs, combining
multiple prompting techniques and distilling these capabilities into smaller
models. Our experimental results show that MD significantly enhances the
single-path and multi-path reasoning ability of smaller models in various
tasks. In terms of accuracy and generality of reasoning tasks, the model
generated by it exceeds the comprehensive performance of two individually
distilled models. Notably, LLaMA2-7B and CodeLlama-7B using MD achieved
remarkable improvements of (84.5%) and (85.5%), respectively, outperforming
GPT-3.5-Turbo by (2.5%) and (3.5%), on the SVAMP benchmark.Comment: Working in Progress, 17 pages, 16 figure
Large Language Models Are Also Good Prototypical Commonsense Reasoners
Commonsense reasoning is a pivotal skill for large language models, yet it
presents persistent challenges in specific tasks requiring this competence.
Traditional fine-tuning approaches can be resource-intensive and potentially
compromise a model's generalization capacity. Furthermore, state-of-the-art
language models like GPT-3.5 and Claude are primarily accessible through API
calls, which makes fine-tuning models challenging. To address these challenges,
we draw inspiration from the outputs of large models for tailored tasks and
semi-automatically developed a set of novel prompts from several perspectives,
including task-relevance, supportive evidence generation (e.g. chain-of-thought
and knowledge), diverse path decoding to aid the model. Experimental results on
ProtoQA dataset demonstrate that with better designed prompts we can achieve
the new state-of-art(SOTA) on the ProtoQA leaderboard, improving the Max
Answer@1 score by 8%, Max Incorrect@1 score by 4% (breakthrough 50% for the
first time) compared to the previous SOTA model and achieved an improvement on
StrategyQA and CommonsenseQA2.0 (3% and 1%, respectively). Furthermore, with
the generated Chain-of-Thought and knowledge, we can improve the
interpretability of the model while also surpassing the previous SOTA models.
We hope that our work can provide insight for the NLP community to develop
better prompts and explore the potential of large language models for more
complex reasoning tasks
Rethinking the Value of Gazetteer in Chinese Named Entity Recognition
Gazetteer is widely used in Chinese named entity recognition (NER) to enhance
span boundary detection and type classification. However, to further understand
the generalizability and effectiveness of gazetteers, the NLP community still
lacks a systematic analysis of the gazetteer-enhanced NER model. In this paper,
we first re-examine the effectiveness several common practices of the
gazetteer-enhanced NER models and carry out a series of detailed analysis to
evaluate the relationship between the model performance and the gazetteer
characteristics, which can guide us to build a more suitable gazetteer. The
findings of this paper are as follows: (1) the gazetteer improves most of the
situations that the traditional NER model datasets are difficult to learn. (2)
the performance of model greatly benefits from the high-quality pre-trained
lexeme embeddings. (3) a good gazetteer should cover more entities that can be
matched in both the training set and testing set.Comment: Accepted by NLPCC 202
A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future
Chain-of-thought reasoning, a cognitive process fundamental to human
intelligence, has garnered significant attention in the realm of artificial
intelligence and natural language processing. However, there still remains a
lack of a comprehensive survey for this arena. To this end, we take the first
step and present a thorough survey of this research field carefully and widely.
We use X-of-Thought to refer to Chain-of-Thought in a broad sense. In detail,
we systematically organize the current research according to the taxonomies of
methods, including XoT construction, XoT structure variants, and enhanced XoT.
Additionally, we describe XoT with frontier applications, covering planning,
tool use, and distillation. Furthermore, we address challenges and discuss some
future directions, including faithfulness, multi-modal, and theory. We hope
this survey serves as a valuable resource for researchers seeking to innovate
within the domain of chain-of-thought reasoning.Comment: 26 pages. Resources are available at
https://github.com/zchuz/CoT-Reasoning-Surve
Trends in Integration of Knowledge and Large Language Models: A Survey and Taxonomy of Methods, Benchmarks, and Applications
Large language models (LLMs) exhibit superior performance on various natural
language tasks, but they are susceptible to issues stemming from outdated data
and domain-specific limitations. In order to address these challenges,
researchers have pursued two primary strategies, knowledge editing and
retrieval augmentation, to enhance LLMs by incorporating external information
from different aspects. Nevertheless, there is still a notable absence of a
comprehensive survey. In this paper, we propose a review to discuss the trends
in integration of knowledge and large language models, including taxonomy of
methods, benchmarks, and applications. In addition, we conduct an in-depth
analysis of different methods and point out potential research directions in
the future. We hope this survey offers the community quick access and a
comprehensive overview of this research area, with the intention of inspiring
future research endeavors.Comment: Work in progress; 22 pages. This work has been submitted to the IEEE
for possible publication. Copyright may be transferred without notice, after
which this version may no longer be accessibl
ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document Understanding
Recent efforts of multimodal Transformers have improved Visually Rich
Document Understanding (VrDU) tasks via incorporating visual and textual
information. However, existing approaches mainly focus on fine-grained elements
such as words and document image patches, making it hard for them to learn from
coarse-grained elements, including natural lexical units like phrases and
salient visual regions like prominent image regions. In this paper, we attach
more importance to coarse-grained elements containing high-density information
and consistent semantics, which are valuable for document understanding. At
first, a document graph is proposed to model complex relationships among
multi-grained multimodal elements, in which salient visual regions are detected
by a cluster-based method. Then, a multi-grained multimodal Transformer called
mmLayout is proposed to incorporate coarse-grained information into existing
pre-trained fine-grained multimodal Transformers based on the graph. In
mmLayout, coarse-grained information is aggregated from fine-grained, and then,
after further processing, is fused back into fine-grained for final prediction.
Furthermore, common sense enhancement is introduced to exploit the semantic
information of natural lexical units. Experimental results on four tasks,
including information extraction and document question answering, show that our
method can improve the performance of multimodal Transformers based on
fine-grained elements and achieve better performance with fewer parameters.
Qualitative analyses show that our method can capture consistent semantics in
coarse-grained elements.Comment: Accepted by ACM Multimedia 202
Diatom distribution in an alpine basin (central China) in relation to environmental factors and substrata
<p>This study examines the habitat preferences of diatom species for bogs, ponds and streams, and explores the effects of environmental variables and substrata on diatom distribution in an alpine basin (Dajiuhu Basin, central China). Ponds and streams were characterized by high pH and high ionic strength, while bogs were acidic and heavy metal-rich habitats. Diatom samples of the epiphyton (attached to <i>Sphagnum</i>), the epipelon (associated with the mud) and the epilithon (attached to stones) were collected from bogs, ponds and streams, respectively. Diatom assemblages in bogs were characterized by acid-tolerant species, such as <i>Eunotia paludosa</i>, <i>Eunotia seminulum</i> and <i>Frustulia rhomboides</i>. In streams, the indicator species preferred circumneutral or alkaline conditions, and included <i>Achnanthidium minutissimum</i>, <i>Nitzschia perminuta</i> and <i>Reimeria sinuata</i>. The characteristic taxa in ponds included <i>Achnanthidium catenatum</i>, <i>Aulacoseira ambigua</i> and <i>Discostella pseudostelligera</i>. Canonical correspondence analysis (CCA) revealed that variations in diatom communities were significantly correlated with two environmental factors (i.e., concentrations of Si and ) and two substratum types (i.e., <i>Sphagnum</i> and stones). Substrata were found to influence diatom composition, probably through mediating the availability of microhabitats, moisture and nutrients. Our results point out the importance of substrata for diatom-based environmental monitoring. This study provides baseline information on diatom communities in the Dajiuhu Basin, for future comparisons, highlighting the utility of diatoms for monitoring environmental change in alpine landscapes.</p
The complete chloroplast genome sequence of the Dioscorea persimilis Prain et Burkill (Dioscoreaceae)
Dioscorea persimilis belongs to Dioscorea genus, which is considered as one of the most popular food and traditional folk medicine in China. The complete chloroplast genome of D. persimilis was determined in this study. The total genome size was 153,219 bp in length, containing a pair of inverted repeats (IRs) of 25,477 bp, which were separated by large single copy (LSC) and small single copy (SSC) of 83,448 bp and 18,817 bp, respectively. The GC content is 37.01%. A total of 129 genes were predicted including 84 protein-coding genes, eight rRNA genes and 37 tRNA genes. Phylogenetic tree analysis of 24 species in the genus Dioscorea indicated that D. persimilis was closer to Chinese yam, but has remote phylogenetic relationship with Guinea yam