183 research outputs found
Retentive Network: A Successor to Transformer for Large Language Models
In this work, we propose Retentive Network (RetNet) as a foundation
architecture for large language models, simultaneously achieving training
parallelism, low-cost inference, and good performance. We theoretically derive
the connection between recurrence and attention. Then we propose the retention
mechanism for sequence modeling, which supports three computation paradigms,
i.e., parallel, recurrent, and chunkwise recurrent. Specifically, the parallel
representation allows for training parallelism. The recurrent representation
enables low-cost inference, which improves decoding throughput, latency,
and GPU memory without sacrificing performance. The chunkwise recurrent
representation facilitates efficient long-sequence modeling with linear
complexity, where each chunk is encoded parallelly while recurrently
summarizing the chunks. Experimental results on language modeling show that
RetNet achieves favorable scaling results, parallel training, low-cost
deployment, and efficient inference. The intriguing properties make RetNet a
strong successor to Transformer for large language models. Code will be
available at https://aka.ms/retnet
Can LLMs like GPT-4 outperform traditional AI tools in dementia diagnosis? Maybe, but not today
Recent investigations show that large language models (LLMs), specifically
GPT-4, not only have remarkable capabilities in common Natural Language
Processing (NLP) tasks but also exhibit human-level performance on various
professional and academic benchmarks. However, whether GPT-4 can be directly
used in practical applications and replace traditional artificial intelligence
(AI) tools in specialized domains requires further experimental validation. In
this paper, we explore the potential of LLMs such as GPT-4 to outperform
traditional AI tools in dementia diagnosis. Comprehensive comparisons between
GPT-4 and traditional AI tools are conducted to examine their diagnostic
accuracy in a clinical setting. Experimental results on two real clinical
datasets show that, although LLMs like GPT-4 demonstrate potential for future
advancements in dementia diagnosis, they currently do not surpass the
performance of traditional AI tools. The interpretability and faithfulness of
GPT-4 are also evaluated by comparison with real doctors. We discuss the
limitations of GPT-4 in its current state and propose future research
directions to enhance GPT-4 in dementia diagnosis.Comment: 16 pages, 6 figure
FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering
Knowledge base question answering (KBQA) is a critical yet challenging task
due to the vast number of entities within knowledge bases and the diversity of
natural language questions posed by users. Unfortunately, the performance of
most KBQA models tends to decline significantly in real-world scenarios where
high-quality annotated data is insufficient. To mitigate the burden associated
with manual annotation, we introduce FlexKBQA by utilizing Large Language
Models (LLMs) as program translators for addressing the challenges inherent in
the few-shot KBQA task. Specifically, FlexKBQA leverages automated algorithms
to sample diverse programs, such as SPARQL queries, from the knowledge base,
which are subsequently converted into natural language questions via LLMs. This
synthetic dataset facilitates training a specialized lightweight model for the
KB. Additionally, to reduce the barriers of distribution shift between
synthetic data and real user questions, FlexKBQA introduces an executionguided
self-training method to iterative leverage unlabeled user questions.
Furthermore, we explore harnessing the inherent reasoning capability of LLMs to
enhance the entire framework. Consequently, FlexKBQA delivers substantial
flexibility, encompassing data annotation, deployment, and being domain
agnostic. Through extensive experiments on GrailQA, WebQSP, and KQA Pro, we
observe that under the few-shot even the more challenging zero-shot scenarios,
FlexKBQA achieves impressive results with a few annotations, surpassing all
previous baselines and even approaching the performance of supervised models,
achieving a remarkable 93% performance relative to the fully-supervised models.
We posit that FlexKBQA represents a significant advancement towards exploring
better integration of large and lightweight models. The code is open-sourced.Comment: Accepted as AAAI-24 Oral paper; Knowledge Base Question Answering;
Large Language Model; Data Generation; Few-Shot & Zero-Sho
A Modeling Study of the Responses of Mesosphere and Lower Thermosphere Winds to Geomagnetic Storms at Middle Latitudes
Thermosphere Ionosphere Mesosphere Electrodynamics General Circulation Model (TIMEGCM) simulations are diagnostically analyzed to investigate the causes of mesosphere and lower thermosphere (MLT) wind changes at middle latitudes during the 17 April 2002 storm. In the early phase of the storm, middle‐latitude upper thermospheric wind changes are greater and occur earlier than MLT wind changes. The horizontal wind changes cause downward vertical wind changes, which are transmitted to the MLT region. Adiabatic heating and heat advection associated with downward vertical winds cause MLT temperature increases. The pressure gradient produced by these temperature changes and the Coriolis force then drive strong equatorward meridional wind changes at night, which expand toward lower latitudes. Momentum advection is minor. As the storm evolves, the enhanced MLT temperatures produce upward vertical winds. These upward winds then lead to a decreased temperature, which alters the MLT horizontal wind pattern and causes poleward wind disturbances at higher latitudes
- …