Search CORE

57 research outputs found

DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning

Author: Lipani Aldo
Shi Zhengxiang
Publication venue
Publication date: 10/09/2023
Field of study

Prompt tuning (PT), where a small amount of trainable soft (continuous) prompt vectors is affixed to the input of language models (LM), has shown promising results across various tasks and models for parameter-efficient fine-tuning (PEFT). PT stands out from other PEFT approaches because it maintains competitive performance with fewer trainable parameters and does not drastically scale up its parameters as the model size expands. However, PT introduces additional soft prompt tokens, leading to longer input sequences, which significantly impacts training and inference time and memory usage due to the Transformer's quadratic complexity. Particularly concerning for Large Language Models (LLMs) that face heavy daily querying. To address this issue, we propose Decomposed Prompt Tuning (DePT), which decomposes the soft prompt into a shorter soft prompt and a pair of low-rank matrices that are then optimised with two different learning rates. This allows DePT to achieve better performance while saving over 20% memory and time costs compared to vanilla PT and its variants, without changing trainable parameter sizes. Through extensive experiments on 23 natural language processing (NLP) and vision-language (VL) tasks, we demonstrate that DePT outperforms state-of-the-art PEFT approaches, including the full fine-tuning baseline in some scenarios. Additionally, we empirically show that DEPT grows more efficient as the model size increases. Our further study reveals that DePT integrates seamlessly with parameter-efficient transfer learning in the few-shot learning setting and highlights its adaptability to various model architectures and sizes.Comment: Code is available at https://github.com/ZhengxiangShi/DeP

arXiv.org e-Print Archive

Rethink the Effectiveness of Text Data Augmentation: An Empirical Analysis

Author: Lipani Aldo
Shi Zhengxiang
Publication venue: ESANN
Publication date: 01/01/2023
Field of study

In recent years, language models (LMs) have made remarkable progress in advancing the field of natural language processing (NLP). However, the impact of data augmentation (DA) techniques on the fine-tuning (FT) performance of these LMs has been a topic of ongoing debate. In this study, we evaluate the effectiveness of three different FT methods in conjugation with back-translation across an array of 7 diverse NLP tasks, including classification and regression types, covering single-sentence and sentence-pair tasks. Contrary to prior assumptions that DA does not contribute to the enhancement of LMs' FT performance, our findings reveal that continued pre-training on augmented data can effectively improve the FT performance of the downstream tasks. In the most favourable case, continued pre-training improves the performance of FT by more than 10% in the few-shot learning setting. Our finding highlights the potential of DA as a powerful tool for bolstering LMs' performance

UCL Discovery

Rethink the Effectiveness of Text Data Augmentation: An Empirical Analysis

Author: Lipani Aldo
Shi Zhengxiang
Publication venue
Publication date: 13/06/2023
Field of study

arXiv.org e-Print Archive

Self Contrastive Learning for Session-based Recommendation

Author: Lipani Aldo
Shi Zhengxiang
Wang Xi
Publication venue
Publication date: 02/06/2023
Field of study

Session-based recommendation, which aims to predict the next item of users' interest as per an existing sequence interaction of items, has attracted growing applications of Contrastive Learning (CL) with improved user and item representations. However, these contrastive objectives: (1) serve a similar role as the cross-entropy loss while ignoring the item representation space optimisation; and (2) commonly require complicated modelling, including complex positive/negative sample constructions and extra data augmentation. In this work, we introduce Self-Contrastive Learning (SCL), which simplifies the application of CL and enhances the performance of state-of-the-art CL-based recommendation techniques. Specifically, SCL is formulated as an objective function that directly promotes a uniform distribution among item representations and efficiently replaces all the existing contrastive objective components of state-of-the-art models. Unlike previous works, SCL eliminates the need for any positive/negative sample construction or data augmentation, leading to enhanced interpretability of the item representation space and facilitating its extensibility to existing recommender systems. Through experiments on three benchmark datasets, we demonstrate that SCL consistently improves the performance of state-of-the-art models with statistical significance. Notably, our experiments show that SCL improves the performance of two best-performing models by 8.2% and 9.5% in P@10 (Precision) and 9.9% and 11.2% in MRR@10 (Mean Reciprocal Rank) on average across different benchmarks. Additionally, our analysis elucidates the improvement in terms of alignment and uniformity of representations, as well as the effectiveness of SCL with a low computational cost.Comment: Technical Repor

arXiv.org e-Print Archive

Learning to execute or ask clarification questions

Author: Feng Yue
Lipani Aldo
Shi Zhengxiang
Publication venue: ACL Anthology
Publication date: 01/07/2022
Field of study

Collaborative tasks are ubiquitous activities where a form of communication is required in order to reach a joint goal. Collaborative building is one of such tasks. We wish to develop an intelligent builder agent in a simulated building environment (Minecraft) that can build whatever users wish to build by just talking to the agent. In order to achieve this goal, such agents need to be able to take the initiative by asking clarification questions when further information is needed. Existing works on Minecraft Corpus Dataset only learn to execute instructions neglecting the importance of asking for clarifications. In this paper, we extend the Minecraft Corpus Dataset by annotating all builder utterances into eight types, including clarification questions, and propose a new builder agent model capable of determining when to ask or execute instructions. Experimental results show that our model achieves state-of-the-art performance on the collaborative building task with a substantial improvement. We also define two new tasks, the learning to ask task and the joint learning task. The latter consists of solving both collaborating building and learning to ask tasks jointly

UCL Discovery

StepGame: A New Benchmark for Robust Multi-Hop Spatial Reasoning in Texts

Author: Lipani Aldo
Shi Zhengxiang
Zhang Qiang
Publication venue
Publication date: 18/04/2022
Field of study

Inferring spatial relations in natural language is a crucial ability an intelligent system should possess. The bAbI dataset tries to capture tasks relevant to this domain (task 17 and 19). However, these tasks have several limitations. Most importantly, they are limited to fixed expressions, they are limited in the number of reasoning steps required to solve them, and they fail to test the robustness of models to input that contains irrelevant or redundant information. In this paper, we present a new Question-Answering dataset called StepGame for robust multi-hop spatial reasoning in texts. Our experiments demonstrate that state-of-the-art models on the bAbI dataset struggle on the StepGame dataset. Moreover, we propose a Tensor-Product based Memory-Augmented Neural Network (TP-MANN) specialized for spatial reasoning tasks. Experimental results on both datasets show that our model outperforms all the baselines with superior generalization and robustness performance.Comment: AAAI 2022 Camera Read

arXiv.org e-Print Archive

UCL Discovery

Association for the Advancement of Artificial Intelligence: AAAI Publications

LucidDraw: Efficiently visualizing complex biochemical networks within MATLAB

Author: He Sheng
Li Weijiang
Mei Juan
Shi Guiyang
Wang Zhengxiang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Biochemical networks play an essential role in systems biology. Rapidly growing network data and versatile research activities call for convenient visualization tools to aid intuitively perceiving abstract structures of networks and gaining insights into the functional implications of networks. There are various kinds of network visualization software, but they are usually not adequate for visual analysis of complex biological networks mainly because of the two reasons: 1) most existing drawing methods suitable for biochemical networks have high computation loads and can hardly achieve near real-time visualization; 2) available network visualization tools are designed for working in certain network modeling platforms, so they are not convenient for general analyses due to lack of broader range of readily accessible numerical utilities. Results We present LucidDraw as a visual analysis tool, which features (a) speed: typical biological networks with several hundreds of nodes can be drawn in a few seconds through a new layout algorithm; (b) ease of use: working within MATLAB makes it convenient to manipulate and analyze the network data using a broad spectrum of sophisticated numerical functions; (c) flexibility: layout styles and incorporation of other available information about functional modules can be controlled by users with little effort, and the output drawings are interactively modifiable. Conclusions Equipped with a new grid layout algorithm proposed here, LucidDraw serves as an auxiliary network analysis tool capable of visualizing complex biological networks in near real-time with controllable layout styles and drawing details. The framework of the algorithm enables easy incorporation of extra biological information, if available, to influence the output layouts with predefined node grouping features.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Rethinking Semi-supervised Learning with Language Models

Author: Aletras Nikolaos
Jiao Yunlong
Kazai Gabriella
Shi Zhengxiang
Tonolini Francesco
Yilmaz Emine
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2023
Field of study

Semi-supervised learning (SSL) is a popular setting aiming to effectively utilize unlabelled data to improve model performance in downstream natural language processing (NLP) tasks. Currently, there are two popular approaches to make use of unlabelled data: Self-training (ST) and Task-adaptive pre-training (TAPT). ST uses a teacher model to assign pseudo-labels to the unlabelled data, while TAPT continues pre-training on the unlabelled data before fine-tuning. To the best of our knowledge, the effectiveness of TAPT in SSL tasks has not been systematically studied, and no previous work has directly compared TAPT and ST in terms of their ability to utilize the pool of unlabelled data. In this paper, we provide an extensive empirical study comparing five state-of-the-art ST approaches and TAPT across various NLP tasks and data sizes, including in- and out-of-domain settings. Surprisingly, we find that TAPT is a strong and more robust SSL learner, even when using just a few hundred unlabelled samples or in the presence of domain shifts, compared to more sophisticated ST approaches, and tends to bring greater improvements in SSL than in fully-supervised settings. Our further analysis demonstrates the risks of using ST approaches when the size of labelled or unlabelled data is small or when domain shifts exist. We offer a fresh perspective for future SSL research, suggesting the use of unsupervised pre-training objectives over dependency on pseudo labels

UCL Discovery

Classifying Ingestive Behavior of Dairy Cows via Automatic Sound Recognition

Author: Du Qian
Gates Richard
Gates Richard
Li Guoming
Shi Zhengxiang
Xiong Yijie
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/08/2021
Field of study

Determining ingestive behaviors of dairy cows is critical to evaluate their productivity and health status. The objectives of this research were to (1) develop the relationship between forage species/heights and sound characteristics of three different ingestive behaviors (bites, chews, and chew-bites); (2) comparatively evaluate three deep learning models and optimization strategies for classifying the three behaviors; and (3) examine the ability of deep learning modeling for classifying the three ingestive behaviors under various forage characteristics. The results show that the amplitude and duration of the bite, chew, and chew-bite sounds were mostly larger for tall forages (tall fescue and alfalfa) compared to their counterparts. The long short-term memory network using a filtered dataset with balanced duration and imbalanced audio files offered better performance than its counterparts. The best classification performance was over 0.93, and the best and poorest performance difference was 0.4–0.5 under different forage species and heights. In conclusion, the deep learning technique could classify the dairy cow ingestive behaviors but was unable to differentiate between them under some forage characteristics using acoustic signals. Thus, while the developed tool is useful to support precision dairy cow management, it requires further improvement

Digital Repository @ Iowa State University (ISU)

DigitalCommons@University of Nebraska

Directory of Open Access Journals

PubMed Central