277,750 research outputs found
Customizing General-Purpose Foundation Models for Medical Report Generation
Medical caption prediction which can be regarded as a task of medical report
generation (MRG), requires the automatic generation of coherent and accurate
captions for the given medical images. However, the scarcity of labelled
medical image-report pairs presents great challenges in the development of deep
and large-scale neural networks capable of harnessing the potential artificial
general intelligence power like large language models (LLMs). In this work, we
propose customizing off-the-shelf general-purpose large-scale pre-trained
models, i.e., foundation models (FMs), in computer vision and natural language
processing with a specific focus on medical report generation. Specifically,
following BLIP-2, a state-of-the-art vision-language pre-training approach, we
introduce our encoder-decoder-based MRG model. This model utilizes a
lightweight query Transformer to connect two FMs: the giant vision Transformer
EVA-ViT-g and a bilingual LLM trained to align with human intentions (referred
to as ChatGLM-6B). Furthermore, we conduct ablative experiments on the
trainable components of the model to identify the crucial factors for effective
transfer learning. Our findings demonstrate that unfreezing EVA-ViT-g to learn
medical image representations, followed by parameter-efficient training of
ChatGLM-6B to capture the writing styles of medical reports, is essential for
achieving optimal results. Our best attempt (PCLmed Team) achieved the 4th and
the 2nd, respectively, out of 13 participating teams, based on the BERTScore
and ROUGE-1 metrics, in the ImageCLEFmedical Caption 2023 Caption Prediction
Task competition.Comment: 14 pages, 3 figure
Generation-Distillation for Efficient Natural Language Understanding in Low-Data Settings
Over the past year, the emergence of transfer learning with large-scale
language models (LM) has led to dramatic performance improvements across a
broad range of natural language understanding tasks. However, the size and
memory footprint of these large LMs makes them difficult to deploy in many
scenarios (e.g. on mobile phones). Recent research points to knowledge
distillation as a potential solution, showing that when training data for a
given task is abundant, it is possible to distill a large (teacher) LM into a
small task-specific (student) network with minimal loss of performance.
However, when such data is scarce, there remains a significant performance gap
between large pretrained LMs and smaller task-specific models, even when
training via distillation. In this paper, we bridge this gap with a novel
training approach, called generation-distillation, that leverages large
finetuned LMs in two ways: (1) to generate new (unlabeled) training examples,
and (2) to distill their knowledge into a small network using these examples.
Across three low-resource text classification datsets, we achieve comparable
performance to BERT while using 300x fewer parameters, and we outperform prior
approaches to distillation for text classification while using 3x fewer
parameters.Comment: EMNLP 2019 Workshop on Deep Learning for Low-resource NL
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
Program synthesis or code generation aims to generate a program that
satisfies a problem specification. Recent approaches using large-scale
pretrained language models (LMs) have shown promising results, yet they have
some critical limitations. In particular, they often follow a standard
supervised fine-tuning procedure to train a code generation model only from the
pairs of natural-language problem descriptions and ground-truth programs. Such
paradigm largely ignores some important but potentially useful signals in the
problem specification such as unit tests, which thus often results in poor
performance when solving complex unseen coding tasks. To address the
limitations, we propose "CodeRL", a new framework for program synthesis tasks
through pretrained LMs and deep reinforcement learning (RL). Specifically,
during training, we treat the code-generating LM as an actor network, and
introduce a critic network that is trained to predict the functional
correctness of generated programs and provide dense feedback signals to the
actor. During inference, we introduce a new generation procedure with a
critical sampling strategy that allows a model to automatically regenerate
programs based on feedback from example unit tests and critic scores. For the
model backbones, we extended the encoder-decoder architecture of CodeT5 with
enhanced learning objectives, larger model sizes, and better pretraining data.
Our method not only achieves new SOTA results on the challenging APPS
benchmark, but also shows strong zero-shot transfer capability with new SOTA
results on the simpler MBPP benchmark
Automatic Image Captioning with Style
This thesis connects two core topics in machine learning, vision
and language. The problem of choice is image caption generation:
automatically constructing natural language descriptions of image
content. Previous research into image caption generation has
focused on generating purely descriptive captions; I focus on
generating visually relevant captions with a distinct linguistic
style. Captions with style have the potential to ease
communication and add a new layer of personalisation.
First, I consider naming variations in image captions, and
propose a method for predicting context-dependent names that
takes into account visual and linguistic information. This method
makes use of a large-scale image caption dataset, which I also
use to explore naming conventions and report naming conventions
for hundreds of animal classes. Next I propose the SentiCap
model, which relies on recent advances in artificial neural
networks to generate visually relevant image captions with
positive or negative sentiment. To balance descriptiveness and
sentiment, the SentiCap model dynamically switches between two
recurrent neural networks, one tuned for descriptive words and
one for sentiment words. As the first published model for
generating captions with sentiment, SentiCap has influenced a
number of subsequent works. I then investigate the sub-task of
modelling styled sentences without images. The specific task
chosen is sentence simplification: rewriting news article
sentences to make them easier to understand.
For this task I design a neural sequence-to-sequence model that
can work with
limited training data, using novel adaptations for word copying
and sharing
word embeddings. Finally, I present SemStyle, a system for
generating visually
relevant image captions in the style of an arbitrary text corpus.
A shared term
space allows a neural network for vision and content planning to
communicate
with a network for styled language generation. SemStyle achieves
competitive
results in human and automatic evaluations of descriptiveness and
style.
As a whole, this thesis presents two complete systems for styled
caption generation that are first of their kind and demonstrate,
for the first time, that automatic style transfer for image
captions is achievable. Contributions also include novel ideas
for object naming and sentence simplification. This thesis opens
up inquiries into highly personalised image captions; large scale
visually grounded concept naming; and more generally, styled text
generation with content control
Text generation for small data regimes
In Natural Language Processing (NLP), applications trained on downstream tasks for text classification usually require enormous amounts of data to perform well. Neural Network (NN) models are among the applications that can always be trained to produce better results. Yet, a huge factor in improving results is the ability to scale over large datasets. Given that Deep NNs are known to be data hungry, having more training samples can always be beneficial. For a classification model to perform well, it could require thousands or even
millions of textual training examples. Transfer learning enables us to leverage knowledge gained from general data collections to perform well on target tasks. In NLP, training language models on large data collections has been shown to achieve great results when tuned to different task-specific datasets Wang et al. (2019, 2018a). However, even with transfer learning, adequate training data remains a condition for training machine learning models. Nonetheless, we show that small textual datasets can be augmented to a degree that is enough to achieve improved classification performance. In this thesis, we make multiple contributions to data augmentation. Firstly, we transform the data generation task into an optimization problem which maximizes the usefulness of the generated output, using Monte Carlo Tree Search (MCTS) as the optimization strategy and incorporating entropy as one of the optimization criteria. Secondly, we propose a language generation approach for targeted data generation with the participation of the training classifier. With a user in the loop, we find that manual annotation of a small proportion of the generated data is enough to boost classification performance. Thirdly, under a self-learning scheme, we replace the user by an automated approach in which the classifier is trained on its own pseudo-labels. Finally, we extend the data generation approach to the knowledge distillation domain, by generating samples that a teacher model can confidently label, but not its student
Semi-Supervised Learning for Neural Keyphrase Generation
We study the problem of generating keyphrases that summarize the key points
for a given document. While sequence-to-sequence (seq2seq) models have achieved
remarkable performance on this task (Meng et al., 2017), model training often
relies on large amounts of labeled data, which is only applicable to
resource-rich domains. In this paper, we propose semi-supervised keyphrase
generation methods by leveraging both labeled data and large-scale unlabeled
samples for learning. Two strategies are proposed. First, unlabeled documents
are first tagged with synthetic keyphrases obtained from unsupervised keyphrase
extraction methods or a selflearning algorithm, and then combined with labeled
samples for training. Furthermore, we investigate a multi-task learning
framework to jointly learn to generate keyphrases as well as the titles of the
articles. Experimental results show that our semi-supervised learning-based
methods outperform a state-of-the-art model trained with labeled data only.Comment: To appear in EMNLP 2018 (12 pages, 7 figures, 6 tables
- …