49 research outputs found
Rethink the Effectiveness of Text Data Augmentation: An Empirical Analysis
In recent years, language models (LMs) have made remarkable progress in
advancing the field of natural language processing (NLP). However, the impact
of data augmentation (DA) techniques on the fine-tuning (FT) performance of
these LMs has been a topic of ongoing debate. In this study, we evaluate the
effectiveness of three different FT methods in conjugation with
back-translation across an array of 7 diverse NLP tasks, including
classification and regression types, covering single-sentence and sentence-pair
tasks. Contrary to prior assumptions that DA does not contribute to the
enhancement of LMs' FT performance, our findings reveal that continued
pre-training on augmented data can effectively improve the FT performance of
the downstream tasks. In the most favourable case, continued pre-training
improves the performance of FT by more than 10% in the few-shot learning
setting. Our finding highlights the potential of DA as a powerful tool for
bolstering LMs' performance
DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning
Prompt tuning (PT), where a small amount of trainable soft (continuous)
prompt vectors is affixed to the input of language models (LM), has shown
promising results across various tasks and models for parameter-efficient
fine-tuning (PEFT). PT stands out from other PEFT approaches because it
maintains competitive performance with fewer trainable parameters and does not
drastically scale up its parameters as the model size expands. However, PT
introduces additional soft prompt tokens, leading to longer input sequences,
which significantly impacts training and inference time and memory usage due to
the Transformer's quadratic complexity. Particularly concerning for Large
Language Models (LLMs) that face heavy daily querying. To address this issue,
we propose Decomposed Prompt Tuning (DePT), which decomposes the soft prompt
into a shorter soft prompt and a pair of low-rank matrices that are then
optimised with two different learning rates. This allows DePT to achieve better
performance while saving over 20% memory and time costs compared to vanilla PT
and its variants, without changing trainable parameter sizes. Through extensive
experiments on 23 natural language processing (NLP) and vision-language (VL)
tasks, we demonstrate that DePT outperforms state-of-the-art PEFT approaches,
including the full fine-tuning baseline in some scenarios. Additionally, we
empirically show that DEPT grows more efficient as the model size increases.
Our further study reveals that DePT integrates seamlessly with
parameter-efficient transfer learning in the few-shot learning setting and
highlights its adaptability to various model architectures and sizes.Comment: Code is available at https://github.com/ZhengxiangShi/DeP
Priming and Actions: An Analysis in Conversational Search Systems
In order to accurately simulate users in conversational systems, it is essential to comprehend the factors that influence their behaviour. This is a critical challenge for the Information Retrieval (IR) field, as conventional methods are not well-suited for the interactive and unique sequential structure of conversational contexts. In this study, we employed the concept of Priming effects from the Psychology literature to identify core stimuli for each abstracted effect. We then examined these stimuli on various datasets to investigate their correlations with users' actions. Finally, we trained Logistic Regression (LR) models based on these stimuli to anticipate users' actions. Our findings offer a basis for creating more realistic user models and simulators, as we identified the subset of stimuli with strong relationships with users' actions. Additionally, we built a model that can predict users' actions
Rethink the Effectiveness of Text Data Augmentation: An Empirical Analysis
In recent years, language models (LMs) have made remarkable progress in
advancing the field of natural language processing (NLP). However, the impact
of data augmentation (DA) techniques on the fine-tuning (FT) performance of
these LMs has been a topic of ongoing debate. In this study, we evaluate the
effectiveness of three different FT methods in conjugation with
back-translation across an array of 7 diverse NLP tasks, including
classification and regression types, covering single-sentence and sentence-pair
tasks. Contrary to prior assumptions that DA does not contribute to the
enhancement of LMs' FT performance, our findings reveal that continued
pre-training on augmented data can effectively improve the FT performance of
the downstream tasks. In the most favourable case, continued pre-training
improves the performance of FT by more than 10% in the few-shot learning
setting. Our finding highlights the potential of DA as a powerful tool for
bolstering LMs' performance.Comment: Accepted at ESANN 202
Learning to execute or ask clarification questions
Collaborative tasks are ubiquitous activities where a form of communication is required in order to reach a joint goal. Collaborative building is one of such tasks. We wish to develop an intelligent builder agent in a simulated building environment (Minecraft) that can build whatever users wish to build by just talking to the agent. In order to achieve this goal, such agents need to be able to take the initiative by asking clarification questions when further information is needed. Existing works on Minecraft Corpus Dataset only learn to execute instructions neglecting the importance of asking for clarifications. In this paper, we extend the Minecraft Corpus Dataset by annotating all builder utterances into eight types, including clarification questions, and propose a new builder agent model capable of determining when to ask or execute instructions. Experimental results show that our model achieves state-of-the-art performance on the collaborative building task with a substantial improvement. We also define two new tasks, the learning to ask task and the joint learning task. The latter consists of solving both collaborating building and learning to ask tasks jointly
Evaluating the Cranfield Paradigm for Conversational Search Systems
Due to the sequential and interactive nature of conversations, the
application of traditional Information Retrieval (IR) methods like
the Cranfield paradigm require stronger assumptions. When building a test collection for Ad Hoc search, it is fair to assume that the
relevance judgments provided by an annotator correlate well with
the relevance judgments perceived by an actual user of the search
engine. However, when building a test collection for conversational
search, we do not know if it is fair to assume that the relevance judgments provided by an annotator correlate well with the relevance
judgments perceived by an actual user of the conversational search
system. In this paper, we perform a crowdsourcing study to evaluate
the applicability of the Cranfield paradigm to conversational search
systems. Our main aim is to understand what is the agreement in
terms of user satisfaction between the users performing a search
task in a conversational search system (i.e., directly assessing the
system) and the users observing the search task being performed
(i.e., indirectly assessing the system). The result of this study is
paramount because it underpins and guides 1) the development of
more realistic user models and simulators, and 2) the design of more
reliable and robust evaluation measures for conversational search
systems. Our results show that there is a fair agreement between
direct and indirect assessments in terms of user satisfaction and
that these two kinds of assessments share similar conversational
patterns. Indeed, by collecting relevance assessments for each system utterance, we tested several conversational patterns that show
a promising ability to predict user satisfaction
Flood susceptibility assessment using artificial neural networks in Indonesia
Flood incidents can massively damage and disrupt a city economic or governing core. However, flood risk can be mitigated through event planning and city-wide preparation to reduce damage. For, governments, firms, and civilians to make such preparations, flood susceptibility predictions are required. To predict flood susceptibility nine environmental related factors have been identified. They are elevation, slope, curvature, topographical wetness index (TWI), Euclidean distance from a river, land-cover, stream power index (SPI), soil type and precipitation. This work will use these environmental related factors alongside Sentinel-1 satellite imagery in a model intercomparison study to back-predict flood susceptibility in Jakarta for the January 2020 historic flood event across 260 key locations. For each location, this study uses current environmental conditions to predict flood status in the following month. Considering the imbalance between instances of flooded and non-flooded conditions, the Synthetic Minority Oversampling Technique (SMOTE) has been implemented to balance both classes in the training set. This work compares predictions from artificial neural networks (ANN), k-Nearest Neighbors algorithms (k-NN) and Support Vector Machines (SVM) against a random baseline. The effects of the SMOTE are also assessed by training each model on balanced and imbalanced datasets. The ANN is found to be superior to the other machine learning models
Self Contrastive Learning for Session-based Recommendation
Session-based recommendation, which aims to predict the next item of users'
interest as per an existing sequence interaction of items, has attracted
growing applications of Contrastive Learning (CL) with improved user and item
representations. However, these contrastive objectives: (1) serve a similar
role as the cross-entropy loss while ignoring the item representation space
optimisation; and (2) commonly require complicated modelling, including complex
positive/negative sample constructions and extra data augmentation. In this
work, we introduce Self-Contrastive Learning (SCL), which simplifies the
application of CL and enhances the performance of state-of-the-art CL-based
recommendation techniques. Specifically, SCL is formulated as an objective
function that directly promotes a uniform distribution among item
representations and efficiently replaces all the existing contrastive objective
components of state-of-the-art models. Unlike previous works, SCL eliminates
the need for any positive/negative sample construction or data augmentation,
leading to enhanced interpretability of the item representation space and
facilitating its extensibility to existing recommender systems. Through
experiments on three benchmark datasets, we demonstrate that SCL consistently
improves the performance of state-of-the-art models with statistical
significance. Notably, our experiments show that SCL improves the performance
of two best-performing models by 8.2% and 9.5% in P@10 (Precision) and 9.9% and
11.2% in MRR@10 (Mean Reciprocal Rank) on average across different benchmarks.
Additionally, our analysis elucidates the improvement in terms of alignment and
uniformity of representations, as well as the effectiveness of SCL with a low
computational cost.Comment: Technical Repor