486 research outputs found
Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning
This paper presents a systematic overview and comparison of
parameter-efficient fine-tuning methods covering over 40 papers published
between February 2019 and February 2023. These methods aim to resolve the
infeasibility and impracticality of fine-tuning large language models by only
training a small set of parameters. We provide a taxonomy that covers a broad
range of methods and present a detailed method comparison with a specific focus
on real-life efficiency and fine-tuning multibillion-scale language models
Instruct-Align: Teaching Novel Languages with to LLMs through Alignment-based Cross-Lingual Instruction
Instruction-tuned large language models (LLMs) have shown remarkable
generalization capability over multiple tasks in multiple languages.
Nevertheless, their generalization towards different languages varies
especially to underrepresented languages or even to unseen languages. Prior
works on adapting new languages to LLMs find that naively adapting new
languages to instruction-tuned LLMs will result in catastrophic forgetting,
which in turn causes the loss of multitasking ability in these LLMs. To tackle
this, we propose the Instruct-Align a.k.a (IA) framework, which enables
instruction-tuned LLMs to learn cross-lingual alignment between unseen and
previously learned languages via alignment-based cross-lingual
instruction-tuning. Our preliminary result on BLOOMZ-560M shows that (IA)
is able to learn a new language effectively with only a limited amount of
parallel data and at the same time prevent catastrophic forgetting by applying
continual instruction-tuning through experience replay. Our work contributes to
the progression of language adaptation methods for instruction-tuned LLMs and
opens up the possibility of adapting underrepresented low-resource languages
into existing instruction-tuned LLMs. Our code will be publicly released upon
acceptance
Do Stochastic Parrots have Feelings Too? Improving Neural Detection of Synthetic Text via Emotion Recognition
Recent developments in generative AI have shone a spotlight on
high-performance synthetic text generation technologies. The now wide
availability and ease of use of such models highlights the urgent need to
provide equally powerful technologies capable of identifying synthetic text.
With this in mind, we draw inspiration from psychological studies which suggest
that people can be driven by emotion and encode emotion in the text they
compose. We hypothesize that pretrained language models (PLMs) have an
affective deficit because they lack such an emotional driver when generating
text and consequently may generate synthetic text which has affective
incoherence i.e. lacking the kind of emotional coherence present in
human-authored text. We subsequently develop an emotionally aware detector by
fine-tuning a PLM on emotion. Experiment results indicate that our
emotionally-aware detector achieves improvements across a range of synthetic
text generators, various sized models, datasets, and domains. Finally, we
compare our emotionally-aware synthetic text detector to ChatGPT in the task of
identification of its own output and show substantial gains, reinforcing the
potential of emotion as a signal to identify synthetic text. Code, models, and
datasets are available at https: //github.com/alanagiasi/emoPLMsynthComment: Accepted to Findings of EMNLP 2023 (long paper). Camera ready versio
Efficient Methods for Natural Language Processing: A Survey
Recent work in natural language processing (NLP) has yielded appealing
results from scaling model parameters and training data; however, using only
scale to improve performance means that resource consumption also grows. Such
resources include data, time, storage, or energy, all of which are naturally
limited and unevenly distributed. This motivates research into efficient
methods that require fewer resources to achieve similar results. This survey
synthesizes and relates current methods and findings in efficient NLP. We aim
to provide both guidance for conducting NLP under limited resources, and point
towards promising research directions for developing more efficient methods.Comment: Accepted at TACL, pre publication versio
FreeAL: Towards Human-Free Active Learning in the Era of Large Language Models
Collecting high-quality labeled data for model training is notoriously
time-consuming and labor-intensive for various NLP tasks. While copious
solutions, such as active learning for small language models (SLMs) and
prevalent in-context learning in the era of large language models (LLMs), have
been proposed and alleviate the labeling burden to some extent, their
performances are still subject to human intervention. It is still underexplored
how to reduce the annotation cost in the LLMs era. To bridge this, we
revolutionize traditional active learning and propose an innovative
collaborative learning framework FreeAL to interactively distill and filter the
task-specific knowledge from LLMs. During collaborative training, an LLM serves
as an active annotator inculcating its coarse-grained knowledge, while a
downstream SLM is incurred as a student to filter out high-quality in-context
samples to feedback LLM for the subsequent label refinery. Extensive
experiments on eight benchmark datasets demonstrate that FreeAL largely
enhances the zero-shot performances for both SLM and LLM without any human
supervision. The code is available at https://github.com/Justherozen/FreeAL .Comment: Accepted to EMNLP 2023 (Main conference
On the Challenges and Opportunities in Generative AI
The field of deep generative modeling has grown rapidly and consistently over
the years. With the availability of massive amounts of training data coupled
with advances in scalable unsupervised learning paradigms, recent large-scale
generative models show tremendous promise in synthesizing high-resolution
images and text, as well as structured data such as videos and molecules.
However, we argue that current large-scale generative AI models do not
sufficiently address several fundamental issues that hinder their widespread
adoption across domains. In this work, we aim to identify key unresolved
challenges in modern generative AI paradigms that should be tackled to further
enhance their capabilities, versatility, and reliability. By identifying these
challenges, we aim to provide researchers with valuable insights for exploring
fruitful research directions, thereby fostering the development of more robust
and accessible generative AI solutions
DataComp: In search of the next generation of multimodal datasets
Multimodal datasets are a critical component in recent breakthroughs such as
Stable Diffusion and GPT-4, yet their design does not receive the same research
attention as model architectures or training algorithms. To address this
shortcoming in the ML ecosystem, we introduce DataComp, a testbed for dataset
experiments centered around a new candidate pool of 12.8 billion image-text
pairs from Common Crawl. Participants in our benchmark design new filtering
techniques or curate new data sources and then evaluate their new dataset by
running our standardized CLIP training code and testing the resulting model on
38 downstream test sets. Our benchmark consists of multiple compute scales
spanning four orders of magnitude, which enables the study of scaling trends
and makes the benchmark accessible to researchers with varying resources. Our
baseline experiments show that the DataComp workflow leads to better training
sets. In particular, our best baseline, DataComp-1B, enables training a CLIP
ViT-L/14 from scratch to 79.2% zero-shot accuracy on ImageNet, outperforming
OpenAI's CLIP ViT-L/14 by 3.7 percentage points while using the same training
procedure and compute. We release DataComp and all accompanying code at
www.datacomp.ai
Large Language Models in Finance: A Survey
Recent advances in large language models (LLMs) have opened new possibilities
for artificial intelligence applications in finance. In this paper, we provide
a practical survey focused on two key aspects of utilizing LLMs for financial
tasks: existing solutions and guidance for adoption.
First, we review current approaches employing LLMs in finance, including
leveraging pretrained models via zero-shot or few-shot learning, fine-tuning on
domain-specific data, and training custom LLMs from scratch. We summarize key
models and evaluate their performance improvements on financial natural
language processing tasks.
Second, we propose a decision framework to guide financial professionals in
selecting the appropriate LLM solution based on their use case constraints
around data, compute, and performance needs. The framework provides a pathway
from lightweight experimentation to heavy investment in customized LLMs.
Lastly, we discuss limitations and challenges around leveraging LLMs in
financial applications. Overall, this survey aims to synthesize the
state-of-the-art and provide a roadmap for responsibly applying LLMs to advance
financial AI.Comment: Accepted by 4th ACM International Conference on AI in Finance
(ICAIF-23) https://ai-finance.or
- …