56 research outputs found
Statistical NLG for Generating the Content and Form of Referring Expressions
Acknowledgments We gratefully acknowledge the anonymous reviewers for their very helpful comments.Publisher PD
Improving Variational Autoencoder for Text Modelling with Timestep-Wise Regularisation
Accepted by COLING 2020, final camera ready versionPreprin
Effective Distillation of Table-based Reasoning Ability from LLMs
Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of natural language processing tasks. However, their enormous parameter size and extremely high requirements for compute power pose challenges for their practical deployment. Recent research has revealed that specific capabilities of LLMs, such as numerical reasoning, can be transferred to smaller models through distillation. Some studies explore the potential of leveraging LLMs to perform table-based reasoning. However, there has been no prior work focusing on table reasoning skills in smaller models specifically tailored for scientific table-to-text generation tasks. In this paper, we propose a novel table-based reasoning distillation approach, with the aim of distilling LLMs into tailored smaller models. Our experimental results have shown that a 220 million parameter model (Flan-T5-base) fine-tuned using distilled data, not only achieves a significant improvement compared to traditionally fine-tuned baselines, but also surpasses specific LLMs on a scientific table-to-text generation dataset. Our code is available at https://github.com/Bernard-Yang/DistillTableCoT
A Dual-Attention Hierarchical Recurrent Neural Network for Dialogue Act Classification
Acknowledgment This work is supported by the award made by the UK Engineering and Physical Sciences Research Council (Grant number: EP/P011829/1).PreprintPublisher PD
Effective Distillation of Table-based Reasoning Ability from LLMs
Large Language Models (LLMs) have demonstrated remarkable performance across
a wide range of natural language processing tasks. However, their remarkable
parameter size and their impressive high requirement of computing resources
pose challenges for their practical deployment. Recent research has revealed
that specific capabilities of LLMs, such as numerical reasoning, can be
transferred to smaller models through distillation. Some studies explore the
potential of leveraging LLMs to perform table-based reasoning. Nevertheless,
prior to our work, there has been no investigation into the prospect of
specialising table reasoning skills in smaller models specifically tailored for
table-to-text generation tasks. In this paper, we propose a novel table-based
reasoning distillation, with the aim of distilling distilling LLMs into
tailored, smaller models specifically designed for table-based reasoning task.
Experimental results have shown that a 0.22 billion parameter model
(Flan-T5-base) fine-tuned using distilled data, not only achieves a significant
improvement compared to traditionally fine-tuned baselines but also surpasses
specific LLMs like gpt-3.5-turbo on the scientific table-to-text generation
dataset (SciGen). The code and data are released in
https://github.com/Bernard-Yang/TableDistill
Length is a Curse and a Blessing for Document-level Semantics
In recent years, contrastive learning (CL) has been extensively utilized to
recover sentence and document-level encoding capability from pre-trained
language models. In this work, we question the length generalizability of
CL-based models, i.e., their vulnerability towards length-induced semantic
shift. We verify not only that length vulnerability is a significant yet
overlooked research gap, but we can devise unsupervised CL methods solely
depending on the semantic signal provided by document length. We first derive
the theoretical foundations underlying length attacks, showing that elongating
a document would intensify the high intra-document similarity that is already
brought by CL. Moreover, we found that isotropy promised by CL is highly
dependent on the length range of text exposed in training. Inspired by these
findings, we introduce a simple yet universal document representation learning
framework, LA(SER): length-agnostic self-reference for semantically
robust sentence representation learning, achieving state-of-the-art
unsupervised performance on the standard information retrieval benchmark.Comment: Accepted at EMNLP 2023. Our code is publicly available at
https://github.com/gowitheflow-1998/LA-SER-cube
Audio Contrastive based Fine-tuning
Audio classification plays a crucial role in speech and sound processing
tasks with a wide range of applications. There still remains a challenge of
striking the right balance between fitting the model to the training data
(avoiding overfitting) and enabling it to generalise well to a new domain.
Leveraging the transferability of contrastive learning, we introduce Audio
Contrastive-based Fine-tuning (AudioConFit), an efficient approach
characterised by robust generalisability. Empirical experiments on a variety of
audio classification tasks demonstrate the effectiveness and robustness of our
approach, which achieves state-of-the-art results in various settings.Comment: Under revie
Improving variational autoencoder for text modelling with timestep-wise regularisation
The Variational Autoencoder (VAE) is a popular and powerful model applied to text modelling to generate diverse sentences. However, an issue known as posterior collapse (or KL loss vanishing) happens when the VAE is used in text modelling, where the approximate posterior collapses to the prior, and the model will totally ignore the latent variables and be degraded to a plain language model during text generation. Such an issue is particularly prevalent when RNN-based VAE models are employed for text modelling. In this paper, we propose a simple, generic architecture called Timestep-Wise Regularisation VAE (TWR-VAE), which can effectively avoid posterior collapse and can be applied to any RNN-based VAE models. The effectiveness and versatility of our model are demonstrated in different tasks, including language modelling and dialogue response generation
DGST : a dual-generator network for text style transfer
We propose DGST, a novel and simple Dual-Generator network architecture for text Style Transfer. Our model employs two generators only, and does not rely on any discriminators or parallel corpus for training. Both quantitative and qualitative experiments on the Yelp and IMDb datasets show that our model gives competitive performance compared to several strong baselines with more complicated architecture designs
Ethyne Reducing Metal-Organic Frameworks to Control Fabrications of Core/shell Nanoparticles as Catalysts
An approach using cobalt metal-organic frameworks (Co-MOF) as precursors is established for the fabrication of cobalt nanoparticles in porous carbon shells (core/shell Co@C). Chemical vapor deposition of ethyne is used for controlling the reduction of cobalt nanoclusters in the MOF and the spontaneous formation of the porous carbon shells. The metallic cobalt cores formed are up to 4 - 6 nm with the crystal phase varying between hexagonally-close-packed (hcp) and face-centre-packed (fcc). The porous carbon shells change from amorphous to graphene with the ethyne deposition temperature increasing from 400 to 600 oC. The core/shell Co@C nanoparticles exhibit high catalytic activity in selectively converting syngas (CTY: 254.1 - 312.1 μmolCO·gCo-1·s-1) into hydrocarbons (4.0 - 5.2 gHC·g-cat-1·h-1) at 260 oC. As well as the crystal size and phase, the coordination numbers of the cobalt to oxygen and to other cobalt atoms on the surface of the cobalt nanoparticles, and the permeability of the porous carbon shell have been related to the catalytic performance in FTS reactions
- …