183 research outputs found

    Retentive Network: A Successor to Transformer for Large Language Models

    Full text link
    In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance. We theoretically derive the connection between recurrence and attention. Then we propose the retention mechanism for sequence modeling, which supports three computation paradigms, i.e., parallel, recurrent, and chunkwise recurrent. Specifically, the parallel representation allows for training parallelism. The recurrent representation enables low-cost O(1)O(1) inference, which improves decoding throughput, latency, and GPU memory without sacrificing performance. The chunkwise recurrent representation facilitates efficient long-sequence modeling with linear complexity, where each chunk is encoded parallelly while recurrently summarizing the chunks. Experimental results on language modeling show that RetNet achieves favorable scaling results, parallel training, low-cost deployment, and efficient inference. The intriguing properties make RetNet a strong successor to Transformer for large language models. Code will be available at https://aka.ms/retnet

    Can LLMs like GPT-4 outperform traditional AI tools in dementia diagnosis? Maybe, but not today

    Full text link
    Recent investigations show that large language models (LLMs), specifically GPT-4, not only have remarkable capabilities in common Natural Language Processing (NLP) tasks but also exhibit human-level performance on various professional and academic benchmarks. However, whether GPT-4 can be directly used in practical applications and replace traditional artificial intelligence (AI) tools in specialized domains requires further experimental validation. In this paper, we explore the potential of LLMs such as GPT-4 to outperform traditional AI tools in dementia diagnosis. Comprehensive comparisons between GPT-4 and traditional AI tools are conducted to examine their diagnostic accuracy in a clinical setting. Experimental results on two real clinical datasets show that, although LLMs like GPT-4 demonstrate potential for future advancements in dementia diagnosis, they currently do not surpass the performance of traditional AI tools. The interpretability and faithfulness of GPT-4 are also evaluated by comparison with real doctors. We discuss the limitations of GPT-4 in its current state and propose future research directions to enhance GPT-4 in dementia diagnosis.Comment: 16 pages, 6 figure

    FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering

    Full text link
    Knowledge base question answering (KBQA) is a critical yet challenging task due to the vast number of entities within knowledge bases and the diversity of natural language questions posed by users. Unfortunately, the performance of most KBQA models tends to decline significantly in real-world scenarios where high-quality annotated data is insufficient. To mitigate the burden associated with manual annotation, we introduce FlexKBQA by utilizing Large Language Models (LLMs) as program translators for addressing the challenges inherent in the few-shot KBQA task. Specifically, FlexKBQA leverages automated algorithms to sample diverse programs, such as SPARQL queries, from the knowledge base, which are subsequently converted into natural language questions via LLMs. This synthetic dataset facilitates training a specialized lightweight model for the KB. Additionally, to reduce the barriers of distribution shift between synthetic data and real user questions, FlexKBQA introduces an executionguided self-training method to iterative leverage unlabeled user questions. Furthermore, we explore harnessing the inherent reasoning capability of LLMs to enhance the entire framework. Consequently, FlexKBQA delivers substantial flexibility, encompassing data annotation, deployment, and being domain agnostic. Through extensive experiments on GrailQA, WebQSP, and KQA Pro, we observe that under the few-shot even the more challenging zero-shot scenarios, FlexKBQA achieves impressive results with a few annotations, surpassing all previous baselines and even approaching the performance of supervised models, achieving a remarkable 93% performance relative to the fully-supervised models. We posit that FlexKBQA represents a significant advancement towards exploring better integration of large and lightweight models. The code is open-sourced.Comment: Accepted as AAAI-24 Oral paper; Knowledge Base Question Answering; Large Language Model; Data Generation; Few-Shot & Zero-Sho

    A Modeling Study of the Responses of Mesosphere and Lower Thermosphere Winds to Geomagnetic Storms at Middle Latitudes

    Get PDF
    Thermosphere Ionosphere Mesosphere Electrodynamics General Circulation Model (TIMEGCM) simulations are diagnostically analyzed to investigate the causes of mesosphere and lower thermosphere (MLT) wind changes at middle latitudes during the 17 April 2002 storm. In the early phase of the storm, middle‐latitude upper thermospheric wind changes are greater and occur earlier than MLT wind changes. The horizontal wind changes cause downward vertical wind changes, which are transmitted to the MLT region. Adiabatic heating and heat advection associated with downward vertical winds cause MLT temperature increases. The pressure gradient produced by these temperature changes and the Coriolis force then drive strong equatorward meridional wind changes at night, which expand toward lower latitudes. Momentum advection is minor. As the storm evolves, the enhanced MLT temperatures produce upward vertical winds. These upward winds then lead to a decreased temperature, which alters the MLT horizontal wind pattern and causes poleward wind disturbances at higher latitudes
    corecore