24 research outputs found

    Task Difficulty Aware Parameter Allocation & Regularization for Lifelong Learning

    Full text link
    Parameter regularization or allocation methods are effective in overcoming catastrophic forgetting in lifelong learning. However, they solve all tasks in a sequence uniformly and ignore the differences in the learning difficulty of different tasks. So parameter regularization methods face significant forgetting when learning a new task very different from learned tasks, and parameter allocation methods face unnecessary parameter overhead when learning simple tasks. In this paper, we propose the Parameter Allocation & Regularization (PAR), which adaptively select an appropriate strategy for each task from parameter allocation and regularization based on its learning difficulty. A task is easy for a model that has learned tasks related to it and vice versa. We propose a divergence estimation method based on the Nearest-Prototype distance to measure the task relatedness using only features of the new task. Moreover, we propose a time-efficient relatedness-aware sampling-based architecture search strategy to reduce the parameter overhead for allocation. Experimental results on multiple benchmarks demonstrate that, compared with SOTAs, our method is scalable and significantly reduces the model's redundancy while improving the model's performance. Further qualitative analysis indicates that PAR obtains reasonable task-relatedness.Comment: Accepted by CVPR2023. Code is available at https://github.com/WenjinW/PA

    TimeBench: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models

    Full text link
    Understanding time is a pivotal aspect of human cognition, crucial in the broader framework of grasping the intricacies of the world. Previous studies typically focus on specific aspects of time, lacking a comprehensive temporal reasoning benchmark. To address this issue, we propose TimeBench, a comprehensive hierarchical temporal reasoning benchmark that covers a broad spectrum of temporal reasoning phenomena, which provides a thorough evaluation for investigating the temporal reasoning capabilities of large language models. We conduct extensive experiments on popular LLMs, such as GPT-4, LLaMA2, and Mistral, incorporating chain-of-thought prompting. Our experimental results indicate a significant performance gap between the state-of-the-art LLMs and humans, highlighting that there is still a considerable distance to cover in temporal reasoning. We aspire for TimeBench to serve as a comprehensive benchmark, fostering research in temporal reasoning for LLMs. Our resource is available at https://github.com/zchuz/TimeBenchComment: Resources at: https://github.com/zchuz/TimeBenc

    Mixed Distillation Helps Smaller Language Model Better Reasoning

    Full text link
    While large language models (LLMs) have demonstrated exceptional performance in recent natural language processing (NLP) tasks, their deployment poses substantial challenges due to high computational and memory demands in real-world applications. Recent studies have focused on enhancing smaller models through knowledge distillation from LLMs, yielding promising results. However, these models often struggle to match the performance of LLMs, especially in tasks that require reasoning. In this work, we introduce Mixed Distillation (MD) framework, which capitalizes on the strengths of Program of Thought (PoT) and Chain of Thought (CoT) capabilities within LLMs, combining multiple prompting techniques and distilling these capabilities into smaller models. Our experimental results show that MD significantly enhances the single-path and multi-path reasoning ability of smaller models in various tasks. In terms of accuracy and generality of reasoning tasks, the model generated by it exceeds the comprehensive performance of two individually distilled models. Notably, LLaMA2-7B and CodeLlama-7B using MD achieved remarkable improvements of (84.5%) and (85.5%), respectively, outperforming GPT-3.5-Turbo by (2.5%) and (3.5%), on the SVAMP benchmark.Comment: Working in Progress, 17 pages, 16 figure

    Large Language Models Are Also Good Prototypical Commonsense Reasoners

    Full text link
    Commonsense reasoning is a pivotal skill for large language models, yet it presents persistent challenges in specific tasks requiring this competence. Traditional fine-tuning approaches can be resource-intensive and potentially compromise a model's generalization capacity. Furthermore, state-of-the-art language models like GPT-3.5 and Claude are primarily accessible through API calls, which makes fine-tuning models challenging. To address these challenges, we draw inspiration from the outputs of large models for tailored tasks and semi-automatically developed a set of novel prompts from several perspectives, including task-relevance, supportive evidence generation (e.g. chain-of-thought and knowledge), diverse path decoding to aid the model. Experimental results on ProtoQA dataset demonstrate that with better designed prompts we can achieve the new state-of-art(SOTA) on the ProtoQA leaderboard, improving the Max Answer@1 score by 8%, Max Incorrect@1 score by 4% (breakthrough 50% for the first time) compared to the previous SOTA model and achieved an improvement on StrategyQA and CommonsenseQA2.0 (3% and 1%, respectively). Furthermore, with the generated Chain-of-Thought and knowledge, we can improve the interpretability of the model while also surpassing the previous SOTA models. We hope that our work can provide insight for the NLP community to develop better prompts and explore the potential of large language models for more complex reasoning tasks

    Rethinking the Value of Gazetteer in Chinese Named Entity Recognition

    Full text link
    Gazetteer is widely used in Chinese named entity recognition (NER) to enhance span boundary detection and type classification. However, to further understand the generalizability and effectiveness of gazetteers, the NLP community still lacks a systematic analysis of the gazetteer-enhanced NER model. In this paper, we first re-examine the effectiveness several common practices of the gazetteer-enhanced NER models and carry out a series of detailed analysis to evaluate the relationship between the model performance and the gazetteer characteristics, which can guide us to build a more suitable gazetteer. The findings of this paper are as follows: (1) the gazetteer improves most of the situations that the traditional NER model datasets are difficult to learn. (2) the performance of model greatly benefits from the high-quality pre-trained lexeme embeddings. (3) a good gazetteer should cover more entities that can be matched in both the training set and testing set.Comment: Accepted by NLPCC 202

    A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future

    Full text link
    Chain-of-thought reasoning, a cognitive process fundamental to human intelligence, has garnered significant attention in the realm of artificial intelligence and natural language processing. However, there still remains a lack of a comprehensive survey for this arena. To this end, we take the first step and present a thorough survey of this research field carefully and widely. We use X-of-Thought to refer to Chain-of-Thought in a broad sense. In detail, we systematically organize the current research according to the taxonomies of methods, including XoT construction, XoT structure variants, and enhanced XoT. Additionally, we describe XoT with frontier applications, covering planning, tool use, and distillation. Furthermore, we address challenges and discuss some future directions, including faithfulness, multi-modal, and theory. We hope this survey serves as a valuable resource for researchers seeking to innovate within the domain of chain-of-thought reasoning.Comment: 26 pages. Resources are available at https://github.com/zchuz/CoT-Reasoning-Surve

    Trends in Integration of Knowledge and Large Language Models: A Survey and Taxonomy of Methods, Benchmarks, and Applications

    Full text link
    Large language models (LLMs) exhibit superior performance on various natural language tasks, but they are susceptible to issues stemming from outdated data and domain-specific limitations. In order to address these challenges, researchers have pursued two primary strategies, knowledge editing and retrieval augmentation, to enhance LLMs by incorporating external information from different aspects. Nevertheless, there is still a notable absence of a comprehensive survey. In this paper, we propose a review to discuss the trends in integration of knowledge and large language models, including taxonomy of methods, benchmarks, and applications. In addition, we conduct an in-depth analysis of different methods and point out potential research directions in the future. We hope this survey offers the community quick access and a comprehensive overview of this research area, with the intention of inspiring future research endeavors.Comment: Work in progress; 22 pages. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

    ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document Understanding

    Full text link
    Recent efforts of multimodal Transformers have improved Visually Rich Document Understanding (VrDU) tasks via incorporating visual and textual information. However, existing approaches mainly focus on fine-grained elements such as words and document image patches, making it hard for them to learn from coarse-grained elements, including natural lexical units like phrases and salient visual regions like prominent image regions. In this paper, we attach more importance to coarse-grained elements containing high-density information and consistent semantics, which are valuable for document understanding. At first, a document graph is proposed to model complex relationships among multi-grained multimodal elements, in which salient visual regions are detected by a cluster-based method. Then, a multi-grained multimodal Transformer called mmLayout is proposed to incorporate coarse-grained information into existing pre-trained fine-grained multimodal Transformers based on the graph. In mmLayout, coarse-grained information is aggregated from fine-grained, and then, after further processing, is fused back into fine-grained for final prediction. Furthermore, common sense enhancement is introduced to exploit the semantic information of natural lexical units. Experimental results on four tasks, including information extraction and document question answering, show that our method can improve the performance of multimodal Transformers based on fine-grained elements and achieve better performance with fewer parameters. Qualitative analyses show that our method can capture consistent semantics in coarse-grained elements.Comment: Accepted by ACM Multimedia 202

    Diatom distribution in an alpine basin (central China) in relation to environmental factors and substrata

    No full text
    <p>This study examines the habitat preferences of diatom species for bogs, ponds and streams, and explores the effects of environmental variables and substrata on diatom distribution in an alpine basin (Dajiuhu Basin, central China). Ponds and streams were characterized by high pH and high ionic strength, while bogs were acidic and heavy metal-rich habitats. Diatom samples of the epiphyton (attached to <i>Sphagnum</i>), the epipelon (associated with the mud) and the epilithon (attached to stones) were collected from bogs, ponds and streams, respectively. Diatom assemblages in bogs were characterized by acid-tolerant species, such as <i>Eunotia paludosa</i>, <i>Eunotia seminulum</i> and <i>Frustulia rhomboides</i>. In streams, the indicator species preferred circumneutral or alkaline conditions, and included <i>Achnanthidium minutissimum</i>, <i>Nitzschia perminuta</i> and <i>Reimeria sinuata</i>. The characteristic taxa in ponds included <i>Achnanthidium catenatum</i>, <i>Aulacoseira ambigua</i> and <i>Discostella pseudostelligera</i>. Canonical correspondence analysis (CCA) revealed that variations in diatom communities were significantly correlated with two environmental factors (i.e., concentrations of Si and ) and two substratum types (i.e., <i>Sphagnum</i> and stones). Substrata were found to influence diatom composition, probably through mediating the availability of microhabitats, moisture and nutrients. Our results point out the importance of substrata for diatom-based environmental monitoring. This study provides baseline information on diatom communities in the Dajiuhu Basin, for future comparisons, highlighting the utility of diatoms for monitoring environmental change in alpine landscapes.</p

    The complete chloroplast genome sequence of the Dioscorea persimilis Prain et Burkill (Dioscoreaceae)

    No full text
    Dioscorea persimilis belongs to Dioscorea genus, which is considered as one of the most popular food and traditional folk medicine in China. The complete chloroplast genome of D. persimilis was determined in this study. The total genome size was 153,219 bp in length, containing a pair of inverted repeats (IRs) of 25,477 bp, which were separated by large single copy (LSC) and small single copy (SSC) of 83,448 bp and 18,817 bp, respectively. The GC content is 37.01%. A total of 129 genes were predicted including 84 protein-coding genes, eight rRNA genes and 37 tRNA genes. Phylogenetic tree analysis of 24 species in the genus Dioscorea indicated that D. persimilis was closer to Chinese yam, but has remote phylogenetic relationship with Guinea yam
    corecore