79 research outputs found

    Lifelong Sequence Generation with Dynamic Module Expansion and Adaptation

    Full text link
    Lifelong sequence generation (LSG), a problem in continual learning, aims to continually train a model on a sequence of generation tasks to learn constantly emerging new generation patterns while avoiding the forgetting of previous knowledge. Existing LSG methods mainly focus on maintaining old knowledge while paying little attention to knowledge transfer across tasks. In contrast, humans can better learn new tasks by leveraging previously acquired knowledge from similar tasks. Inspired by the learning paradigm of humans, we propose Dynamic Module Expansion and Adaptation (DMEA), which enables the model to dynamically determine the architecture for acquiring new knowledge based on task correlation and select the most similar previous tasks to facilitate adaptation to new tasks. In addition, as the learning process can easily be biased towards the current task which might cause more severe forgetting of previously learned knowledge, we propose dynamic gradient scaling to balance the learning of the current task and replayed tasks. With extensive experiments, we demonstrate that DMEA can consistently outperform existing methods in different LSG settings

    Learning to Initialize: Can Meta Learning Improve Cross-task Generalization in Prompt Tuning?

    Full text link
    Prompt tuning (PT) which only tunes the embeddings of an additional sequence of tokens per task, keeping the pre-trained language model (PLM) frozen, has shown remarkable performance in few-shot learning. Despite this, PT has been shown to rely heavily on good initialization of the prompt embeddings. In this work, we study meta prompt tuning (MPT) to systematically explore how meta-learning can help improve (if it can) cross-task generalization in PT through learning to initialize the prompt embeddings from other relevant tasks. We empirically analyze a representative set of meta learning algorithms in a wide range of adaptation settings with different source/target task configurations on a large set of few-shot tasks. With extensive experiments and analysis, we demonstrate the effectiveness of MPT. We find the improvement to be significant particularly on classification tasks. For other kinds of tasks such as question answering, we observe that while MPT can outperform PT in most cases, it does not always outperform multi-task learning. We further provide an in-depth analysis from the perspective of task similarity

    In-Context Learning with Iterative Demonstration Selection

    Full text link
    Spurred by advancements in scale, large language models (LLMs) have demonstrated strong few-shot learning ability via in-context learning (ICL). However, the performance of ICL has been shown to be highly sensitive to the selection of few-shot demonstrations. Selecting the most suitable examples as context remains an ongoing challenge and an open problem. Existing literature has highlighted the importance of selecting examples that are diverse or semantically similar to the test sample while ignoring the fact that the optimal selection dimension, i.e., diversity or similarity, is task-specific. Leveraging the merits of both dimensions, we propose Iterative Demonstration Selection (IDS). Using zero-shot chain-of-thought reasoning (Zero-shot-CoT), IDS iteratively selects examples that are diverse but still strongly correlated with the test sample as ICL demonstrations. Specifically, IDS applies Zero-shot-CoT to the test sample before demonstration selection. The output reasoning path is then used to choose demonstrations that are prepended to the test sample for inference. The generated answer is accompanied by its corresponding reasoning path for extracting a new set of demonstrations in the next iteration. After several iterations, IDS adopts majority voting to obtain the final result. Through extensive experiments on tasks including commonsense reasoning, question answering, topic classification, and sentiment analysis, we demonstrate that IDS can consistently outperform existing ICL demonstration selection methods

    Is ChatGPT a General-Purpose Natural Language Processing Task Solver?

    Full text link
    Spurred by advancements in scale, large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot -- i.e., without adaptation on downstream data. Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community due to the fact that it can generate high-quality responses to human input and self-correct previous mistakes based on subsequent conversations. However, it is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot. In this work, we empirically analyze the zero-shot learning ability of ChatGPT by evaluating it on 20 popular NLP datasets covering 7 representative task categories. With extensive empirical studies, we demonstrate both the effectiveness and limitations of the current version of ChatGPT. We find that ChatGPT performs well on many tasks favoring reasoning capabilities (e.g., arithmetic reasoning) while it still faces challenges when solving specific tasks such as sequence tagging. We additionally provide in-depth analysis through qualitative case studies

    Shaping a subwavelength needle with ultra-long focal length by focusing azimuthally polarized light

    Get PDF
    10.1038/srep09977Scientific Reports

    Solution and type curves for the seepage model of the water-bearing coalbed with leakage recharge

    Get PDF
    To analyze the effects of the leakage recharge of the aquifer on the initial dewatering of coalbed methane wells, the mathematical seepage model of water in the coalbed considering the aquifer leakage was established by using the leakage coefficient according to the unsteady seepage theory. The model was solved after Laplace transform and the Stehfest numerical reverse inversion was used to obtain the solution in right space. Then, the log-log type curves of pressure and pressure derivative were created with new combinations of parameters. Based on the natural seepage mechanism, the influence of aquifer leakage on curve shape was judged. It is found that the radial flow ends earlier as the leakage coefficient increases. Moreover, it was proposed to obtain reservoir permeability, skin factor, and leakage coefficient by using type curve matching. The type curves are useful for quantitatively evaluating the level of leakage, thereby guiding the adjustment of the following production system for CBM wells.Este estudio estableció el modelo matemático de filtración de agua en una capa carbonífera al estimar la salida acuífera con el uso del coeficiente de fuga, de acuerdo con la teoría de filtración inestable, para analizar los efectos en la recarga de pérdida de fluidos de un acuífero en el drenado inicial para pozos de gas metano.  El modelo se resolvió tras usar la transformación Laplace y la inversión numérica Stehfest para encontrar la respuesta en el lugar indicado. Luego, se creó la representación algorítmica de la presión y la presión derivativa con nuevas combinaciones de parámetros. Se evaluó la influencia de la pérdida de fluido del acuífero en la forma de la curva con base al mecanismo físico de filtración. Se estableció que el flujo radial finaliza antes de que el coeficiente de pérdida de fluido se incremente. Además, se propone el uso de la curva tipo correspondiente para obtener la permeabilidad del reservorio, el factor de daño y el coeficiente de pérdida de fluido. Las curvas tipo son útiles para evaluar cuantitativamente el nivel de la pérdida de fluido, y de esta manera guiar el ajuste de un sistema de producción consecuente para pozos de gas metano de carbón

    Retrieving Multimodal Information for Augmented Generation: A Survey

    Full text link
    As Large Language Models (LLMs) become popular, there emerged an important trend of using multimodality to augment the LLMs' generation ability, which enables LLMs to better interact with the world. However, there lacks a unified perception of at which stage and how to incorporate different modalities. In this survey, we review methods that assist and augment generative models by retrieving multimodal knowledge, whose formats range from images, codes, tables, graphs, to audio. Such methods offer a promising solution to important concerns such as factuality, reasoning, interpretability, and robustness. By providing an in-depth review, this survey is expected to provide scholars with a deeper understanding of the methods' applications and encourage them to adapt existing techniques to the fast-growing field of LLMs
    corecore