63 research outputs found
Automated Few-shot Classification with Instruction-Finetuned Language Models
A particularly successful class of approaches for few-shot learning combines
language models with prompts -- hand-crafted task descriptions that complement
data samples. However, designing prompts by hand for each task commonly
requires domain knowledge and substantial guesswork. We observe, in the context
of classification tasks, that instruction finetuned language models exhibit
remarkable prompt robustness, and we subsequently propose a simple method to
eliminate the need for handcrafted prompts, named AuT-Few. This approach
consists of (i) a prompt retrieval module that selects suitable task
instructions from the instruction-tuning knowledge base, and (ii) the
generation of two distinct, semantically meaningful, class descriptions and a
selection mechanism via cross-validation. Over datasets, spanning
classification tasks, we show that AuT-Few outperforms current state-of-the-art
few-shot learning methods. Moreover, AuT-Few is the best ranking method across
datasets on the RAFT few-shot benchmark. Notably, these results are achieved
without task-specific handcrafted prompts on unseen tasks.Comment: EMNLP2023 Finding
Parameter and Data Efficient Continual Pre-training for Robustness to Dialectal Variance in Arabic
The use of multilingual language models for tasks in low and high-resource
languages has been a success story in deep learning. In recent times, Arabic
has been receiving widespread attention on account of its dialectal variance.
While prior research studies have tried to adapt these multilingual models for
dialectal variants of Arabic, it still remains a challenging problem owing to
the lack of sufficient monolingual dialectal data and parallel translation data
of such dialectal variants. It remains an open problem on whether the limited
dialectical data can be used to improve the models trained in Arabic on its
dialectal variants. First, we show that multilingual-BERT (mBERT) incrementally
pretrained on Arabic monolingual data takes less training time and yields
comparable accuracy when compared to our custom monolingual Arabic model and
beat existing models (by an avg metric of +). We then explore two
continual pre-training methods -- (1) using small amounts of dialectical data
for continual finetuning and (2) parallel Arabic to English data and a
Translation Language Modeling loss function. We show that both approaches help
improve performance on dialectal classification tasks ( avg. gain) when
used on monolingual models
Federated Learning's Blessing: FedAvg has Linear Speedup
Federated learning (FL) learns a model jointly from a set of participating
devices without sharing each other's privately held data. The characteristics
of non-iid data across the network, low device participation, and the mandate
that data remain private bring challenges in understanding the convergence of
FL algorithms, particularly in regards to how convergence scales with the
number of participating devices. In this paper, we focus on Federated Averaging
(FedAvg)--the most widely used and effective FL algorithm in use today--and
provide a comprehensive study of its convergence rate. Although FedAvg has
recently been studied by an emerging line of literature, it remains open as to
how FedAvg's convergence scales with the number of participating devices in the
FL setting--a crucial question whose answer would shed light on the performance
of FedAvg in large FL systems. We fill this gap by establishing convergence
guarantees for FedAvg under three classes of problems: strongly convex smooth,
convex smooth, and overparameterized strongly convex smooth problems. We show
that FedAvg enjoys linear speedup in each case, although with different
convergence rates. For each class, we also characterize the corresponding
convergence rates for the Nesterov accelerated FedAvg algorithm in the FL
setting: to the best of our knowledge, these are the first linear speedup
guarantees for FedAvg when Nesterov acceleration is used. To accelerate FedAvg,
we also design a new momentum-based FL algorithm that further improves the
convergence rate in overparameterized linear regression problems. Empirical
studies of the algorithms in various settings have supported our theoretical
results
- …