56 research outputs found

    Statistical NLG for Generating the Content and Form of Referring Expressions

    Get PDF
    Acknowledgments We gratefully acknowledge the anonymous reviewers for their very helpful comments.Publisher PD

    Improving Variational Autoencoder for Text Modelling with Timestep-Wise Regularisation

    Get PDF
    Accepted by COLING 2020, final camera ready versionPreprin

    Effective Distillation of Table-based Reasoning Ability from LLMs

    Get PDF
    Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of natural language processing tasks. However, their enormous parameter size and extremely high requirements for compute power pose challenges for their practical deployment. Recent research has revealed that specific capabilities of LLMs, such as numerical reasoning, can be transferred to smaller models through distillation. Some studies explore the potential of leveraging LLMs to perform table-based reasoning. However, there has been no prior work focusing on table reasoning skills in smaller models specifically tailored for scientific table-to-text generation tasks. In this paper, we propose a novel table-based reasoning distillation approach, with the aim of distilling LLMs into tailored smaller models. Our experimental results have shown that a 220 million parameter model (Flan-T5-base) fine-tuned using distilled data, not only achieves a significant improvement compared to traditionally fine-tuned baselines, but also surpasses specific LLMs on a scientific table-to-text generation dataset. Our code is available at https://github.com/Bernard-Yang/DistillTableCoT

    A Dual-Attention Hierarchical Recurrent Neural Network for Dialogue Act Classification

    Get PDF
    Acknowledgment This work is supported by the award made by the UK Engineering and Physical Sciences Research Council (Grant number: EP/P011829/1).PreprintPublisher PD

    Effective Distillation of Table-based Reasoning Ability from LLMs

    Full text link
    Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of natural language processing tasks. However, their remarkable parameter size and their impressive high requirement of computing resources pose challenges for their practical deployment. Recent research has revealed that specific capabilities of LLMs, such as numerical reasoning, can be transferred to smaller models through distillation. Some studies explore the potential of leveraging LLMs to perform table-based reasoning. Nevertheless, prior to our work, there has been no investigation into the prospect of specialising table reasoning skills in smaller models specifically tailored for table-to-text generation tasks. In this paper, we propose a novel table-based reasoning distillation, with the aim of distilling distilling LLMs into tailored, smaller models specifically designed for table-based reasoning task. Experimental results have shown that a 0.22 billion parameter model (Flan-T5-base) fine-tuned using distilled data, not only achieves a significant improvement compared to traditionally fine-tuned baselines but also surpasses specific LLMs like gpt-3.5-turbo on the scientific table-to-text generation dataset (SciGen). The code and data are released in https://github.com/Bernard-Yang/TableDistill

    Length is a Curse and a Blessing for Document-level Semantics

    Full text link
    In recent years, contrastive learning (CL) has been extensively utilized to recover sentence and document-level encoding capability from pre-trained language models. In this work, we question the length generalizability of CL-based models, i.e., their vulnerability towards length-induced semantic shift. We verify not only that length vulnerability is a significant yet overlooked research gap, but we can devise unsupervised CL methods solely depending on the semantic signal provided by document length. We first derive the theoretical foundations underlying length attacks, showing that elongating a document would intensify the high intra-document similarity that is already brought by CL. Moreover, we found that isotropy promised by CL is highly dependent on the length range of text exposed in training. Inspired by these findings, we introduce a simple yet universal document representation learning framework, LA(SER)3^{3}: length-agnostic self-reference for semantically robust sentence representation learning, achieving state-of-the-art unsupervised performance on the standard information retrieval benchmark.Comment: Accepted at EMNLP 2023. Our code is publicly available at https://github.com/gowitheflow-1998/LA-SER-cube

    Audio Contrastive based Fine-tuning

    Full text link
    Audio classification plays a crucial role in speech and sound processing tasks with a wide range of applications. There still remains a challenge of striking the right balance between fitting the model to the training data (avoiding overfitting) and enabling it to generalise well to a new domain. Leveraging the transferability of contrastive learning, we introduce Audio Contrastive-based Fine-tuning (AudioConFit), an efficient approach characterised by robust generalisability. Empirical experiments on a variety of audio classification tasks demonstrate the effectiveness and robustness of our approach, which achieves state-of-the-art results in various settings.Comment: Under revie

    Improving variational autoencoder for text modelling with timestep-wise regularisation

    Get PDF
    The Variational Autoencoder (VAE) is a popular and powerful model applied to text modelling to generate diverse sentences. However, an issue known as posterior collapse (or KL loss vanishing) happens when the VAE is used in text modelling, where the approximate posterior collapses to the prior, and the model will totally ignore the latent variables and be degraded to a plain language model during text generation. Such an issue is particularly prevalent when RNN-based VAE models are employed for text modelling. In this paper, we propose a simple, generic architecture called Timestep-Wise Regularisation VAE (TWR-VAE), which can effectively avoid posterior collapse and can be applied to any RNN-based VAE models. The effectiveness and versatility of our model are demonstrated in different tasks, including language modelling and dialogue response generation

    DGST : a dual-generator network for text style transfer

    Get PDF
    We propose DGST, a novel and simple Dual-Generator network architecture for text Style Transfer. Our model employs two generators only, and does not rely on any discriminators or parallel corpus for training. Both quantitative and qualitative experiments on the Yelp and IMDb datasets show that our model gives competitive performance compared to several strong baselines with more complicated architecture designs

    Ethyne Reducing Metal-Organic Frameworks to Control Fabrications of Core/shell Nanoparticles as Catalysts

    Get PDF
    An approach using cobalt metal-organic frameworks (Co-MOF) as precursors is established for the fabrication of cobalt nanoparticles in porous carbon shells (core/shell Co@C). Chemical vapor deposition of ethyne is used for controlling the reduction of cobalt nanoclusters in the MOF and the spontaneous formation of the porous carbon shells. The metallic cobalt cores formed are up to 4 - 6 nm with the crystal phase varying between hexagonally-close-packed (hcp) and face-centre-packed (fcc). The porous carbon shells change from amorphous to graphene with the ethyne deposition temperature increasing from 400 to 600 oC. The core/shell Co@C nanoparticles exhibit high catalytic activity in selectively converting syngas (CTY: 254.1 - 312.1 μmolCO·gCo-1·s-1) into hydrocarbons (4.0 - 5.2 gHC·g-cat-1·h-1) at 260 oC. As well as the crystal size and phase, the coordination numbers of the cobalt to oxygen and to other cobalt atoms on the surface of the cobalt nanoparticles, and the permeability of the porous carbon shell have been related to the catalytic performance in FTS reactions
    corecore