4,646 research outputs found
A Survey of Natural Language Generation
This paper offers a comprehensive review of the research on Natural Language
Generation (NLG) over the past two decades, especially in relation to
data-to-text generation and text-to-text generation deep learning methods, as
well as new applications of NLG technology. This survey aims to (a) give the
latest synthesis of deep learning research on the NLG core tasks, as well as
the architectures adopted in the field; (b) detail meticulously and
comprehensively various NLG tasks and datasets, and draw attention to the
challenges in NLG evaluation, focusing on different evaluation methods and
their relationships; (c) highlight some future emphasis and relatively recent
research issues that arise due to the increasing synergy between NLG and other
artificial intelligence areas, such as computer vision, text and computational
creativity.Comment: Accepted by ACM Computing Survey (CSUR) 202
๋ฅ ๋ด๋ด ๋คํธ์ํฌ ๊ธฐ๋ฐ์ ๋ฌธ์ฅ ์ธ์ฝ๋๋ฅผ ์ด์ฉํ ๋ฌธ์ฅ ๊ฐ ๊ด๊ณ ๋ชจ๋ธ๋ง
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ)--์์ธ๋ํ๊ต ๋ํ์ :๊ณต๊ณผ๋ํ ์ปดํจํฐ๊ณตํ๋ถ,2020. 2. ์ด์๊ตฌ.๋ฌธ์ฅ ๋งค์นญ์ด๋ ๋ ๋ฌธ์ฅ ๊ฐ ์๋ฏธ์ ์ผ๋ก ์ผ์นํ๋ ์ ๋๋ฅผ ์์ธกํ๋ ๋ฌธ์ ์ด๋ค.
์ด๋ค ๋ชจ๋ธ์ด ๋ ๋ฌธ์ฅ ์ฌ์ด์ ๊ด๊ณ๋ฅผ ํจ๊ณผ์ ์ผ๋ก ๋ฐํ๋ด๊ธฐ ์ํด์๋ ๋์ ์์ค์ ์์ฐ์ด ํ
์คํธ ์ดํด ๋ฅ๋ ฅ์ด ํ์ํ๊ธฐ ๋๋ฌธ์, ๋ฌธ์ฅ ๋งค์นญ์ ๋ค์ํ ์์ฐ์ด ์ฒ๋ฆฌ ์์ฉ์ ์ฑ๋ฅ์ ์ค์ํ ์ํฅ์ ๋ฏธ์น๋ค.
๋ณธ ํ์ ๋
ผ๋ฌธ์์๋ ๋ฌธ์ฅ ์ธ์ฝ๋, ๋งค์นญ ํจ์, ์ค์ง๋ ํ์ต์ด๋ผ๋ ์ธ ๊ฐ์ง ์ธก๋ฉด์์ ๋ฌธ์ฅ ๋งค์นญ์ ์ฑ๋ฅ ๊ฐ์ ์ ๋ชจ์ํ๋ค.
๋ฌธ์ฅ ์ธ์ฝ๋๋ ๋ฌธ์ฅ์ผ๋ก๋ถํฐ ์ ์ฉํ ํน์ง๋ค์ ์ถ์ถํ๋ ์ญํ ์ ํ๋ ๊ตฌ์ฑ ์์๋ก, ๋ณธ ๋
ผ๋ฌธ์์๋ ๋ฌธ์ฅ ์ธ์ฝ๋์ ์ฑ๋ฅ ํฅ์์ ์ํ์ฌ Gumbel Tree-LSTM๊ณผ Cell-aware Stacked LSTM์ด๋ผ๋ ๋ ๊ฐ์ ์๋ก์ด ์ํคํ
์ฒ๋ฅผ ์ ์ํ๋ค.
Gumbel Tree-LSTM์ ์ฌ๊ท์ ๋ด๋ด ๋คํธ์ํฌ(recursive neural network) ๊ตฌ์กฐ์ ๊ธฐ๋ฐํ ์ํคํ
์ฒ์ด๋ค.
๊ตฌ์กฐ ์ ๋ณด๊ฐ ํฌํจ๋ ๋ฐ์ดํฐ๋ฅผ ์
๋ ฅ์ผ๋ก ์ฌ์ฉํ๋ ๊ธฐ์กด์ ์ฌ๊ท์ ๋ด๋ด ๋คํธ์ํฌ ๋ชจ๋ธ๊ณผ ๋ฌ๋ฆฌ, Gumbel Tree-LSTM์ ๊ตฌ์กฐ๊ฐ ์๋ ๋ฐ์ดํฐ๋ก๋ถํฐ ํน์ ๋ฌธ์ ์ ๋ํ ์ฑ๋ฅ์ ์ต๋ํํ๋ ํ์ฑ ์ ๋ต์ ํ์ตํ๋ค.
Cell-aware Stacked LSTM์ LSTM ๊ตฌ์กฐ๋ฅผ ๊ฐ์ ํ ์ํคํ
์ฒ๋ก, ์ฌ๋ฌ LSTM ๋ ์ด์ด๋ฅผ ์ค์ฒฉํ์ฌ ์ฌ์ฉํ ๋ ๋ง๊ฐ ๊ฒ์ดํธ(forget gate)๋ฅผ ์ถ๊ฐ์ ์ผ๋ก ๋์
ํ์ฌ ์์ง ๋ฐฉํฅ์ ์ ๋ณด ํ๋ฆ์ ๋ ํจ์จ์ ์ผ๋ก ์ ์ดํ ์ ์๋๋ก ํ๋ค.
ํํธ, ์๋ก์ด ๋งค์นญ ํจ์๋ก์ ์ฐ๋ฆฌ๋ ์์๋ณ ์์ ํ ๋ฌธ์ฅ ๋งค์นญ(element-wise bilinear sentence matching, ElBiS) ํจ์๋ฅผ ์ ์ํ๋ค.
ElBiS ์๊ณ ๋ฆฌ์ฆ์ ํน์ ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๋ ๋ฐ์ ์ ํฉํ ๋ฐฉ์์ผ๋ก ๋ ๋ฌธ์ฅ ํํ์ ํ๋์ ๋ฒกํฐ๋ก ํฉ์น๋ ๋ฐฉ๋ฒ์ ์๋์ผ๋ก ์ฐพ๋ ๊ฒ์ ๋ชฉ์ ์ผ๋ก ํ๋ค.
๋ฌธ์ฅ ํํ์ ์ป์ ๋์ ์๋ก ๊ฐ์ ๋ฌธ์ฅ ์ธ์ฝ๋๋ฅผ ์ฌ์ฉํ๋ค๋ ์ฌ์ค๋ก๋ถํฐ ์ฐ๋ฆฌ๋ ๋ฒกํฐ์ ๊ฐ ์์ ๊ฐ ์์ ํ(bilinear) ์ํธ ์์ฉ๋ง์ ๊ณ ๋ คํ์ฌ๋ ๋ ๋ฌธ์ฅ ๋ฒกํฐ ๊ฐ ๋น๊ต๋ฅผ ์ถฉ๋ถํ ์ ์ํํ ์ ์๋ค๋ ๊ฐ์ค์ ์๋ฆฝํ๊ณ ์ด๋ฅผ ์คํ์ ์ผ๋ก ๊ฒ์ฆํ๋ค.
์ํธ ์์ฉ์ ๋ฒ์๋ฅผ ์ ํํจ์ผ๋ก์จ, ์๋์ผ๋ก ์ ์ฉํ ๋ณํฉ ๋ฐฉ๋ฒ์ ์ฐพ๋๋ค๋ ์ด์ ์ ์ ์งํ๋ฉด์ ๋ชจ๋ ์ํธ ์์ฉ์ ๊ณ ๋ คํ๋ ์์ ํ ํ๋ง ๋ฐฉ๋ฒ์ ๋นํด ํ์ํ ํ๋ผ๋ฏธํฐ์ ์๋ฅผ ํฌ๊ฒ ์ค์ผ ์ ์๋ค.
๋ง์ง๋ง์ผ๋ก, ํ์ต ์ ๋ ์ด๋ธ์ด ์๋ ๋ฐ์ดํฐ์ ๋ ์ด๋ธ์ด ์๋ ๋ฐ์ดํฐ๋ฅผ ํจ๊ป ์ฌ์ฉํ๋ ์ค์ง๋ ํ์ต์ ์ํด ์ฐ๋ฆฌ๋ ๊ต์ฐจ ๋ฌธ์ฅ ์ ์ฌ ๋ณ์ ๋ชจ๋ธ(cross-sentence latent variable model, CS-LVM)์ ์ ์ํ๋ค.
CS-LVM์ ์์ฑ ๋ชจ๋ธ์ ์ถ์ฒ ๋ฌธ์ฅ(source sentence)์ ์ ์ฌ ํํ ๋ฐ ์ถ์ฒ ๋ฌธ์ฅ๊ณผ ๋ชฉํ ๋ฌธ์ฅ(target sentence) ๊ฐ์ ๊ด๊ณ๋ฅผ ๋ํ๋ด๋ ๋ณ์๋ก๋ถํฐ ๋ชฉํ ๋ฌธ์ฅ์ด ์์ฑ๋๋ค๊ณ ๊ฐ์ ํ๋ค.
CS-LVM์์๋ ๋ ๋ฌธ์ฅ์ด ํ๋์ ๋ชจ๋ธ ์์์ ๋ชจ๋ ๊ณ ๋ ค๋๊ธฐ ๋๋ฌธ์, ํ์ต์ ์ฌ์ฉ๋๋ ๋ชฉ์ ํจ์๊ฐ ๋ ์์ฐ์ค๋ฝ๊ฒ ์ ์๋๋ค.
๋ํ, ์ฐ๋ฆฌ๋ ์์ฑ ๋ชจ๋ธ์ ํ๋ผ๋ฏธํฐ๊ฐ ๋ ์๋ฏธ์ ์ผ๋ก ์ ํฉํ ๋ฌธ์ฅ์ ์์ฑํ๋๋ก ์ ๋ํ๊ธฐ ์ํ์ฌ ์ผ๋ จ์ ์๋ฏธ ์ ์ฝ๋ค์ ์ ์ํ๋ค.
๋ณธ ํ์ ๋
ผ๋ฌธ์์ ์ ์๋ ๊ฐ์ ๋ฐฉ์๋ค์ ๋ฌธ์ฅ ๋งค์นญ ๊ณผ์ ์ ํฌํจํ๋ ๋ค์ํ ์์ฐ์ด ์ฒ๋ฆฌ ์์ฉ์ ํจ์ฉ์ฑ์ ๋์ผ ๊ฒ์ผ๋ก ๊ธฐ๋๋๋ค.Sentence matching is a task of predicting the degree of match between two sentences.
Since high level of understanding natural language text is needed for a model to identify the relationship between two sentences,
it is an important component for various natural language processing applications.
In this dissertation, we seek for the improvement of the sentence matching module from the following three ingredients: sentence encoder, matching function, and semi-supervised learning.
To enhance a sentence encoder network which takes responsibility of extracting useful features from a sentence, we propose two new sentence encoder architectures: Gumbel Tree-LSTM and Cell-aware Stacked LSTM (CAS-LSTM).
Gumbel Tree-LSTM is based on a recursive neural network (RvNN) architecture, however unlike typical RvNN architectures it does not need a structured input.
Instead, it learns from data a parsing strategy that is optimized for a specific task.
The latter, CAS-LSTM, extends the stacked long short-term memory (LSTM) architecture by introducing an additional forget gate for better handling of vertical information flow.
And then, as a new matching function, we present the element-wise bilinear sentence matching (ElBiS) function.
It aims to automatically find an aggregation scheme that fuses two sentence representations into a single one suitable for a specific task.
From the fact that a sentence encoder is shared across inputs, we hypothesize and empirically prove that considering only the element-wise bilinear interaction is sufficient for comparing two sentence vectors.
By restricting the interaction, we can largely reduce the number of required parameters compared with full bilinear pooling methods without losing the advantage of automatically discovering useful aggregation schemes.
Finally, to facilitate semi-supervised training, i.e. to make use of both labeled and unlabeled data in training, we propose the cross-sentence latent variable model (CS-LVM).
Its generative model assumes that a target sentence is generated from the latent representation of a source sentence and the variable indicating the relationship between the source and the target sentence.
As it considers the two sentences in a pair together in a single model, the training objectives are defined more naturally than prior approaches based on the variational auto-encoder (VAE).
We also define semantic constraints that force the generator to generate semantically more plausible sentences.
We believe that the improvements proposed in this dissertation would advance the effectiveness of various natural language processing applications containing modeling sentence pairs.Chapter 1 Introduction 1
1.1 Sentence Matching 1
1.2 Deep Neural Networks for Sentence Matching 2
1.3 Scope of the Dissertation 4
Chapter 2 Background and Related Work 9
2.1 Sentence Encoders 9
2.2 Matching Functions 11
2.3 Semi-Supervised Training 13
Chapter 3 Sentence Encoder: Gumbel Tree-LSTM 15
3.1 Motivation 15
3.2 Preliminaries 16
3.2.1 Recursive Neural Networks 16
3.2.2 Training RvNNs without Tree Information 17
3.3 Model Description 19
3.3.1 Tree-LSTM 19
3.3.2 Gumbel-Softmax 20
3.3.3 Gumbel Tree-LSTM 22
3.4 Implementation Details 25
3.5 Experiments 27
3.5.1 Natural Language Inference 27
3.5.2 Sentiment Analysis 32
3.5.3 Qualitative Analysis 33
3.6 Summary 36
Chapter 4 Sentence Encoder: Cell-aware Stacked LSTM 38
4.1 Motivation 38
4.2 Related Work 40
4.3 Model Description 43
4.3.1 Stacked LSTMs 43
4.3.2 Cell-aware Stacked LSTMs 44
4.3.3 Sentence Encoders 46
4.4 Experiments 47
4.4.1 Natural Language Inference 47
4.4.2 Paraphrase Identification 50
4.4.3 Sentiment Classification 52
4.4.4 Machine Translation 53
4.4.5 Forget Gate Analysis 55
4.4.6 Model Variations 56
4.5 Summary 59
Chapter 5 Matching Function: Element-wise Bilinear Sentence Matching 60
5.1 Motivation 60
5.2 Proposed Method: ElBiS 61
5.3 Experiments 63
5.3.1 Natural language inference 64
5.3.2 Paraphrase Identification 66
5.4 Summary and Discussion 68
Chapter 6 Semi-Supervised Training: Cross-Sentence Latent Variable Model 70
6.1 Motivation 70
6.2 Preliminaries 71
6.2.1 Variational Auto-Encoders 71
6.2.2 von MisesโFisher Distribution 73
6.3 Proposed Framework: CS-LVM 74
6.3.1 Cross-Sentence Latent Variable Model 75
6.3.2 Architecture 78
6.3.3 Optimization 79
6.4 Experiments 84
6.4.1 Natural Language Inference 84
6.4.2 Paraphrase Identification 85
6.4.3 Ablation Study 86
6.4.4 Generated Sentences 88
6.4.5 Implementation Details 89
6.5 Summary and Discussion 90
Chapter 7 Conclusion 92
Appendix A Appendix 96
A.1 Sentences Generated from CS-LVM 96Docto
Pedestrian Attribute Recognition: A Survey
Recognizing pedestrian attributes is an important task in computer vision
community due to it plays an important role in video surveillance. Many
algorithms has been proposed to handle this task. The goal of this paper is to
review existing works using traditional methods or based on deep learning
networks. Firstly, we introduce the background of pedestrian attributes
recognition (PAR, for short), including the fundamental concepts of pedestrian
attributes and corresponding challenges. Secondly, we introduce existing
benchmarks, including popular datasets and evaluation criterion. Thirdly, we
analyse the concept of multi-task learning and multi-label learning, and also
explain the relations between these two learning algorithms and pedestrian
attribute recognition. We also review some popular network architectures which
have widely applied in the deep learning community. Fourthly, we analyse
popular solutions for this task, such as attributes group, part-based,
\emph{etc}. Fifthly, we shown some applications which takes pedestrian
attributes into consideration and achieve better performance. Finally, we
summarized this paper and give several possible research directions for
pedestrian attributes recognition. The project page of this paper can be found
from the following website:
\url{https://sites.google.com/view/ahu-pedestrianattributes/}.Comment: Check our project page for High Resolution version of this survey:
https://sites.google.com/view/ahu-pedestrianattributes
Deep Generative Models for Natural Language
Generative models aim to simulate the process by which a set of data is generated. They are intuitive, interpretable, and naturally suited to learning from unlabelled data. This is particularly appealing in natural language processing, where labels are often costly to obtain and can require significant manual input from trained annotators. However, traditional generative modelling approaches can often be inflexible due to the need to maintain tractable maximum likelihood training. On the other hand, deep learning methods are powerful, flexible, and have achieved significant success on a wide variety of natural language processing tasks. In recent years, algorithms have been developed for training generative models that incorporate neural networks to parametrise their conditional distributions. These approaches aim to take advantage of the intuitiveness and interpretability of generative models as well as the power and flexibility of deep learning. In this work, we investigate how to leverage such algorithms in order to develop deep generative models for natural language. Firstly, we present an attention-based latent variable model, trained using unlabelled data, for learning representations of sentences. Experiments such as missing word imputation and sentence similarity matching suggest that the representations are able to learn semantic information about the sentences. We then present an RNN-based latent variable model for per- forming machine translation. Trained using semi-supervised learning, our approach achieves strong results even with very limited labelled data. Finally, we present a locally-contextual conditional random field for performing sequence labelling tasks. Our method consistently outperforms the linear chain conditional random field and achieves state of the art performance on two out of the four tasks evaluated
Automatic Distractor Generation for Multiple Choice Questions in Standard Tests
To assess the knowledge proficiency of a learner, multiple choice question is
an efficient and widespread form in standard tests. However, the composition of
the multiple choice question, especially the construction of distractors is
quite challenging. The distractors are required to both incorrect and plausible
enough to confuse the learners who did not master the knowledge. Currently, the
distractors are generated by domain experts which are both expensive and
time-consuming. This urges the emergence of automatic distractor generation,
which can benefit various standard tests in a wide range of domains. In this
paper, we propose a question and answer guided distractor generation (EDGE)
framework to automate distractor generation. EDGE consists of three major
modules: (1) the Reforming Question Module and the Reforming Passage Module
apply gate layers to guarantee the inherent incorrectness of the generated
distractors; (2) the Distractor Generator Module applies attention mechanism to
control the level of plausibility. Experimental results on a large-scale public
dataset demonstrate that our model significantly outperforms existing models
and achieves a new state-of-the-art.Comment: accepted by COLING202
Language modelling for clinical natural language understanding and generation
One of the long-standing objectives of Artificial Intelligence (AI) is to design and develop algorithms for social good including tackling public health challenges. In the era of digitisation, with an unprecedented amount of healthcare data being captured in digital form, the analysis of the healthcare data at scale can lead to better research of diseases, better monitoring patient conditions and more importantly improving patient outcomes. However, many AI-based analytic algorithms rely solely on structured healthcare data such as bedside measurements and test results which only account for 20% of all healthcare data, whereas the remaining 80% of healthcare data is unstructured including textual data such as clinical notes and discharge summaries which is still underexplored.
Conventional Natural Language Processing (NLP) algorithms that are designed for clinical applications rely on the shallow matching, templates and non-contextualised word embeddings which lead to limited understanding of contextual semantics. Though recent advances in NLP algorithms have demonstrated promising performance on a variety of NLP tasks in the general domain with contextualised language models, most of these generic NLP algorithms struggle at specific clinical NLP tasks which require biomedical knowledge and reasoning. Besides, there is limited research to study generative NLP algorithms to generate clinical reports and summaries automatically by considering salient clinical information.
This thesis aims to design and develop novel NLP algorithms especially clinical-driven contextualised language models to understand textual healthcare data and generate clinical narratives which can potentially support clinicians, medical scientists and patients. The first contribution of this thesis focuses on capturing phenotypic information of patients from clinical notes which is important to profile patient situation and improve patient outcomes. The thesis proposes a novel self-supervised language model, named Phenotypic Intelligence Extraction (PIE), to annotate phenotypes from clinical notes with the detection of contextual synonyms and the enhancement to reason with numerical values. The second contribution is to demonstrate the utility and benefits of using phenotypic features of patients in clinical use cases by predicting patient outcomes in Intensive Care Units (ICU) and identifying patients at risk of specific diseases with better accuracy and model interpretability. The third contribution is to propose generative models to generate clinical narratives to automate and accelerate the process of report writing and summarisation by clinicians. This thesis first proposes a novel summarisation language model named PEGASUS which surpasses or is on par with the state-of-the-art performance on 12 downstream datasets including biomedical literature from PubMed. PEGASUS is further extended to generate medical scientific documents from input tabular data.Open Acces
- โฆ