11 research outputs found
InstructABSA: Instruction Learning for Aspect Based Sentiment Analysis
In this paper, we present InstructABSA, Aspect Based Sentiment Analysis
(ABSA) using the instruction learning paradigm for all ABSA subtasks: Aspect
Term Extraction (ATE), Aspect Term Sentiment Classification (ATSC), and Joint
Task modeling. Our method introduces positive, negative, and neutral examples
to each training sample, and instruction tunes the model (Tk-Instruct) for each
ABSA subtask, yielding significant performance improvements. Experimental
results on the Sem Eval 2014, 15, and 16 datasets demonstrate that InstructABSA
outperforms the previous state-of-the-art (SOTA) approaches on all three ABSA
subtasks (ATE, ATSC, and Joint Task) by a significant margin, outperforming 7x
larger models. In particular, InstructABSA surpasses the SOTA on the Rest14 ATE
subtask by 7.31% points, Rest15 ATSC subtask by and on the Lapt14 Joint Task by
8.63% points. Our results also suggest a strong generalization ability to new
domains across all three subtasksComment: 4 pages, 2 figures, 5 tables, 5 appendix page
Instruction Tuned Models are Quick Learners
Instruction tuning of language models has demonstrated the ability to enhance
model generalization to unseen tasks via in-context learning using a few
examples. However, typical supervised learning still requires a plethora of
downstream training data for finetuning. Often in real-world situations, there
is a scarcity of data available for finetuning, falling somewhere between few
shot inference and fully supervised finetuning. In this work, we demonstrate
the sample efficiency of instruction tuned models over various tasks by
estimating the minimal downstream training data required by them to perform
transfer learning and match the performance of state-of-the-art (SOTA)
supervised models. We conduct experiments on 119 tasks from Super Natural
Instructions (SuperNI) in both the single task learning (STL) and multi task
learning (MTL) settings. Our findings reveal that, in the STL setting,
instruction tuned models equipped with 25% of the downstream train data surpass
the SOTA performance on the downstream tasks. In the MTL setting, an
instruction tuned model trained on only 6% of downstream training data achieve
SOTA, while using 100% of the training data results in a 3.69% points
improvement (ROUGE-L 74.68) over the previous SOTA. We conduct an analysis on
T5 vs Tk-Instruct by developing several baselines to demonstrate that
instruction tuning aids in increasing both sample efficiency and transfer
learning. Additionally, we observe a consistent ~4% performance increase in
both settings when pre-finetuning is performed with instructions. Finally, we
conduct a categorical study and find that contrary to previous results, tasks
in the question rewriting and title generation categories suffer from
instruction tuning.Comment: 9 pages, 5 figures, 19 Tables (inclusing appendix), 12 pages of
Appendi
TarGEN: Targeted Data Generation with Large Language Models
The rapid advancement of large language models (LLMs) has sparked interest in
data synthesis techniques, aiming to generate diverse and high-quality
synthetic datasets. However, these synthetic datasets often suffer from a lack
of diversity and added noise. In this paper, we present TarGEN, a multi-step
prompting strategy for generating high-quality synthetic datasets utilizing a
LLM. An advantage of TarGEN is its seedless nature; it does not require
specific task instances, broadening its applicability beyond task replication.
We augment TarGEN with a method known as self-correction empowering LLMs to
rectify inaccurately labeled instances during dataset creation, ensuring
reliable labels. To assess our technique's effectiveness, we emulate 8 tasks
from the SuperGLUE benchmark and finetune various language models, including
encoder-only, encoder-decoder, and decoder-only models on both synthetic and
original training sets. Evaluation on the original test set reveals that models
trained on datasets generated by TarGEN perform approximately 1-2% points
better than those trained on original datasets (82.84% via syn. vs. 81.12% on
og. using Flan-T5). When incorporating instruction tuning, the performance
increases to 84.54% on synthetic data vs. 81.49% on original data by Flan-T5. A
comprehensive analysis of the synthetic dataset compared to the original
dataset reveals that the synthetic dataset demonstrates similar or higher
levels of dataset complexity and diversity. Furthermore, the synthetic dataset
displays a bias level that aligns closely with the original dataset. Finally,
when pre-finetuned on our synthetic SuperGLUE dataset, T5-3B yields impressive
results on the OpenLLM leaderboard, surpassing the model trained on the
Self-Instruct dataset by 4.14% points. We hope that TarGEN can be helpful for
quality data generation and reducing the human efforts to create complex
benchmarks.Comment: 10 pages, 6 tables, 5 figures, 5 pages references, 17 pages appendi
