Search CORE

486 research outputs found

Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning

Author: Deshpande Vijeta
Lialin Vladislav
Rumshisky Anna
Publication venue
Publication date: 27/03/2023
Field of study

This paper presents a systematic overview and comparison of parameter-efficient fine-tuning methods covering over 40 papers published between February 2019 and February 2023. These methods aim to resolve the infeasibility and impracticality of fine-tuning large language models by only training a small set of parameters. We provide a taxonomy that covers a broad range of methods and present a detailed method comparison with a specific focus on real-life efficiency and fine-tuning multibillion-scale language models

arXiv.org e-Print Archive

Instruct-Align: Teaching Novel Languages with to LLMs through Alignment-based Cross-Lingual Instruction

Author: Cahyawijaya Samuel
Chung Willy
Fung Pascale
Lovenia Holy
Yu Tiezheng
Publication venue
Publication date: 22/05/2023
Field of study

Instruction-tuned large language models (LLMs) have shown remarkable generalization capability over multiple tasks in multiple languages. Nevertheless, their generalization towards different languages varies especially to underrepresented languages or even to unseen languages. Prior works on adapting new languages to LLMs find that naively adapting new languages to instruction-tuned LLMs will result in catastrophic forgetting, which in turn causes the loss of multitasking ability in these LLMs. To tackle this, we propose the Instruct-Align a.k.a (IA)

^1

framework, which enables instruction-tuned LLMs to learn cross-lingual alignment between unseen and previously learned languages via alignment-based cross-lingual instruction-tuning. Our preliminary result on BLOOMZ-560M shows that (IA)

^1

is able to learn a new language effectively with only a limited amount of parallel data and at the same time prevent catastrophic forgetting by applying continual instruction-tuning through experience replay. Our work contributes to the progression of language adaptation methods for instruction-tuned LLMs and opens up the possibility of adapting underrepresented low-resource languages into existing instruction-tuned LLMs. Our code will be publicly released upon acceptance

arXiv.org e-Print Archive

Do Stochastic Parrots have Feelings Too? Improving Neural Detection of Synthetic Text via Emotion Recognition

Author: Cowap Alan
Foster Jennifer
Graham Yvette
Publication venue
Publication date: 24/10/2023
Field of study

Recent developments in generative AI have shone a spotlight on high-performance synthetic text generation technologies. The now wide availability and ease of use of such models highlights the urgent need to provide equally powerful technologies capable of identifying synthetic text. With this in mind, we draw inspiration from psychological studies which suggest that people can be driven by emotion and encode emotion in the text they compose. We hypothesize that pretrained language models (PLMs) have an affective deficit because they lack such an emotional driver when generating text and consequently may generate synthetic text which has affective incoherence i.e. lacking the kind of emotional coherence present in human-authored text. We subsequently develop an emotionally aware detector by fine-tuning a PLM on emotion. Experiment results indicate that our emotionally-aware detector achieves improvements across a range of synthetic text generators, various sized models, datasets, and domains. Finally, we compare our emotionally-aware synthetic text detector to ChatGPT in the task of identification of its own output and show substantial gains, reinforcing the potential of emotion as a signal to identify synthetic text. Code, models, and datasets are available at https: //github.com/alanagiasi/emoPLMsynthComment: Accepted to Findings of EMNLP 2023 (long paper). Camera ready versio

arXiv.org e-Print Archive

Efficient Methods for Natural Language Processing: A Survey

Author: Balasubramanian Niranjan
Cao Qingqing
Ciosici Manuel R.
Derczynski Leon
Dodge Jesse
Forde Jessica Zosa
Gurevych Iryna
Hassid Michael
Heafield Kenneth
Hooker Sara
Ji Tianchu
Lee Ji-Ung
Martins André F. T.
Martins Pedro H.
Milder Peter
Raffel Colin
Schwartz Roy
Simpson Edwin
Slonim Noam
Strubell Emma
Treviso Marcos
van Aken Betty
Publication venue
Publication date: 01/01/2023
Field of study

Recent work in natural language processing (NLP) has yielded appealing results from scaling model parameters and training data; however, using only scale to improve performance means that resource consumption also grows. Such resources include data, time, storage, or energy, all of which are naturally limited and unevenly distributed. This motivates research into efficient methods that require fewer resources to achieve similar results. This survey synthesizes and relates current methods and findings in efficient NLP. We aim to provide both guidance for conducting NLP under limited resources, and point towards promising research directions for developing more efficient methods.Comment: Accepted at TACL, pre publication versio

arXiv.org e-Print Archive

TUbiblio

Explore Bristol Research

FreeAL: Towards Human-Free Active Learning in the Era of Large Language Models

Author: Chen Gang
Dong Yiwen
Lin Minmin
Wang Haobo
Wu Runze
Xiao Ruixuan
Zhao Junbo
Publication venue
Publication date: 27/11/2023
Field of study

Collecting high-quality labeled data for model training is notoriously time-consuming and labor-intensive for various NLP tasks. While copious solutions, such as active learning for small language models (SLMs) and prevalent in-context learning in the era of large language models (LLMs), have been proposed and alleviate the labeling burden to some extent, their performances are still subject to human intervention. It is still underexplored how to reduce the annotation cost in the LLMs era. To bridge this, we revolutionize traditional active learning and propose an innovative collaborative learning framework FreeAL to interactively distill and filter the task-specific knowledge from LLMs. During collaborative training, an LLM serves as an active annotator inculcating its coarse-grained knowledge, while a downstream SLM is incurred as a student to filter out high-quality in-context samples to feedback LLM for the subsequent label refinery. Extensive experiments on eight benchmark datasets demonstrate that FreeAL largely enhances the zero-shot performances for both SLM and LLM without any human supervision. The code is available at https://github.com/Justherozen/FreeAL .Comment: Accepted to EMNLP 2023 (Main conference

arXiv.org e-Print Archive

On the Challenges and Opportunities in Generative AI

Author: Bamler Robert
Broeck Guy Van den
Cotterell Ryan
de Melo Gerard
Däubener Sina
Fellenz Sophie
Fischer Asja
Fortuin Vincent
Gärtner Thomas
Kirchler Matthias
Kloft Marius
Li Yingzhen
Lippert Christoph
Mandt Stephan
Manduchi Laura
Nalisnick Eric
Ommer Björn
Pandey Kushagra
Ranganath Rajesh
Rudolph Maja
Ullrich Karen
Vogt Julia E
Wang Yixin
Wenzel Florian
Wood Frank
Publication venue
Publication date: 28/02/2024
Field of study

The field of deep generative modeling has grown rapidly and consistently over the years. With the availability of massive amounts of training data coupled with advances in scalable unsupervised learning paradigms, recent large-scale generative models show tremendous promise in synthesizing high-resolution images and text, as well as structured data such as videos and molecules. However, we argue that current large-scale generative AI models do not sufficiently address several fundamental issues that hinder their widespread adoption across domains. In this work, we aim to identify key unresolved challenges in modern generative AI paradigms that should be tackled to further enhance their capabilities, versatility, and reliability. By identifying these challenges, we aim to provide researchers with valuable insights for exploring fruitful research directions, thereby fostering the development of more robust and accessible generative AI solutions

arXiv.org e-Print Archive

DataComp: In search of the next generation of multimodal datasets

Multimodal datasets are a critical component in recent breakthroughs such as Stable Diffusion and GPT-4, yet their design does not receive the same research attention as model architectures or training algorithms. To address this shortcoming in the ML ecosystem, we introduce DataComp, a testbed for dataset experiments centered around a new candidate pool of 12.8 billion image-text pairs from Common Crawl. Participants in our benchmark design new filtering techniques or curate new data sources and then evaluate their new dataset by running our standardized CLIP training code and testing the resulting model on 38 downstream test sets. Our benchmark consists of multiple compute scales spanning four orders of magnitude, which enables the study of scaling trends and makes the benchmark accessible to researchers with varying resources. Our baseline experiments show that the DataComp workflow leads to better training sets. In particular, our best baseline, DataComp-1B, enables training a CLIP ViT-L/14 from scratch to 79.2% zero-shot accuracy on ImageNet, outperforming OpenAI's CLIP ViT-L/14 by 3.7 percentage points while using the same training procedure and compute. We release DataComp and all accompanying code at www.datacomp.ai

arXiv.org e-Print Archive

Large Language Models in Finance: A Survey

Author: Chen Hang
Ding Han
Li Yinheng
Wang Shaofei
Publication venue
Publication date: 28/09/2023
Field of study

Recent advances in large language models (LLMs) have opened new possibilities for artificial intelligence applications in finance. In this paper, we provide a practical survey focused on two key aspects of utilizing LLMs for financial tasks: existing solutions and guidance for adoption. First, we review current approaches employing LLMs in finance, including leveraging pretrained models via zero-shot or few-shot learning, fine-tuning on domain-specific data, and training custom LLMs from scratch. We summarize key models and evaluate their performance improvements on financial natural language processing tasks. Second, we propose a decision framework to guide financial professionals in selecting the appropriate LLM solution based on their use case constraints around data, compute, and performance needs. The framework provides a pathway from lightweight experimentation to heavy investment in customized LLMs. Lastly, we discuss limitations and challenges around leveraging LLMs in financial applications. Overall, this survey aims to synthesize the state-of-the-art and provide a roadmap for responsibly applying LLMs to advance financial AI.Comment: Accepted by 4th ACM International Conference on AI in Finance (ICAIF-23) https://ai-finance.or

arXiv.org e-Print Archive