1,425 research outputs found
Exploring Different Dimensions of Attention for Uncertainty Detection
Neural networks with attention have proven effective for many natural
language processing tasks. In this paper, we develop attention mechanisms for
uncertainty detection. In particular, we generalize standardly used attention
mechanisms by introducing external attention and sequence-preserving attention.
These novel architectures differ from standard approaches in that they use
external resources to compute attention weights and preserve sequence
information. We compare them to other configurations along different dimensions
of attention. Our novel architectures set the new state of the art on a
Wikipedia benchmark dataset and perform similar to the state-of-the-art model
on a biomedical benchmark which uses a large set of linguistic features.Comment: accepted at EACL 201
Analyzing and Interpreting Neural Networks for NLP: A Report on the First BlackboxNLP Workshop
The EMNLP 2018 workshop BlackboxNLP was dedicated to resources and techniques
specifically developed for analyzing and understanding the inner-workings and
representations acquired by neural models of language. Approaches included:
systematic manipulation of input to neural networks and investigating the
impact on their performance, testing whether interpretable knowledge can be
decoded from intermediate representations acquired by neural networks,
proposing modifications to neural network architectures to make their knowledge
state or generated output more explainable, and examining the performance of
networks on simplified or formal languages. Here we review a number of
representative studies in each category
ํ๊ตญ์ด ์ฌ์ ํ์ต๋ชจ๋ธ ๊ตฌ์ถ๊ณผ ํ์ฅ ์ฐ๊ตฌ: ๊ฐ์ ๋ถ์์ ์ค์ฌ์ผ๋ก
ํ์๋
ผ๋ฌธ (๋ฐ์ฌ) -- ์์ธ๋ํ๊ต ๋ํ์ : ์ธ๋ฌธ๋ํ ์ธ์ดํ๊ณผ, 2021. 2. ์ ํจํ.Recently, as interest in the Bidirectional Encoder Representations from Transformers (BERT) model has increased, many studies have also been actively conducted in Natural Language Processing based on the model. Such sentence-level contextualized embedding models are generally known to capture and model lexical, syntactic, and semantic information in sentences during training. Therefore, such models, including ELMo, GPT, and BERT, function as a universal model that can impressively perform a wide range of NLP tasks.
This study proposes a monolingual BERT model trained based on Korean texts. The first released BERT model that can handle the Korean language was Google Researchโs multilingual BERT (M-BERT), which was constructed with training data and a vocabulary composed of 104 languages, including Korean and English, and can handle the text of any language contained in the single model. However, despite the advantages of multilingualism, this model does not fully reflect each languageโs characteristics, so that its text processing performance in each language is lower than that of a monolingual model. While mitigating those shortcomings, we built monolingual models using the training data and a vocabulary organized to better capture Korean textsโ linguistic knowledge.
Therefore, in this study, a model named KR-BERT was built using training data composed of Korean Wikipedia text and news articles, and was released through GitHub so that it could be used for processing Korean texts. Additionally, we trained a KR-BERT-MEDIUM model based on expanded data by adding comments and legal texts to the training data of KR-BERT. Each model used a list of tokens composed mainly of Hangul characters as its vocabulary, organized using WordPiece algorithms based on the corresponding training data. These models reported competent performances in various Korean NLP tasks such as Named Entity Recognition, Question Answering, Semantic Textual Similarity, and Sentiment Analysis.
In addition, we added sentiment features to the BERT model to specialize it to better function in sentiment analysis. We constructed a sentiment-combined model including sentiment features, where the features consist of polarity and intensity values assigned to each token in the training data corresponding to that of Korean Sentiment Analysis Corpus (KOSAC). The sentiment features assigned to each token compose polarity and intensity embeddings and are infused to the basic BERT input embeddings. The sentiment-combined model is constructed by training the BERT model with these embeddings.
We trained a model named KR-BERT-KOSAC that contains sentiment features while maintaining the same training data, vocabulary, and model configurations as KR-BERT and distributed it through GitHub. Then we analyzed the effects of using sentiment features in comparison to KR-BERT by observing their performance in language modeling during the training process and sentiment analysis tasks. Additionally, we determined how much each of the polarity and intensity features contributes to improving the model performance by separately organizing a model that utilizes each of the features, respectively. We obtained some increase in language modeling and sentiment analysis performances by using both the sentiment features, compared to other models with different feature composition. Here, we included the problems of binary positivity classification of movie reviews and hate speech detection on offensive comments as the sentiment analysis tasks.
On the other hand, training these embedding models requires a lot of training time and hardware resources. Therefore, this study proposes a simple model fusing method that requires relatively little time. We trained a smaller-scaled sentiment-combined model consisting of a smaller number of encoder layers and attention heads and smaller hidden sizes for a few steps, combining it with an existing pre-trained BERT model. Since those pre-trained models are expected to function universally to handle various NLP problems based on good language modeling, this combination will allow two models with different advantages to interact and have better text processing capabilities. In this study, experiments on sentiment analysis problems have confirmed that combining the two models is efficient in training time and usage of hardware resources, while it can produce more accurate predictions than single models that do not include sentiment features.์ต๊ทผ ํธ๋์คํฌ๋จธ ์๋ฐฉํฅ ์ธ์ฝ๋ ํํ (Bidirectional Encoder Representations from Transformers, BERT) ๋ชจ๋ธ์ ๋ํ ๊ด์ฌ์ด ๋์์ง๋ฉด์ ์์ฐ์ด์ฒ๋ฆฌ ๋ถ์ผ์์ ์ด์ ๊ธฐ๋ฐํ ์ฐ๊ตฌ ์ญ์ ํ๋ฐํ ์ด๋ฃจ์ด์ง๊ณ ์๋ค. ์ด๋ฌํ ๋ฌธ์ฅ ๋จ์์ ์๋ฒ ๋ฉ์ ์ํ ๋ชจ๋ธ๋ค์ ๋ณดํต ํ์ต ๊ณผ์ ์์ ๋ฌธ์ฅ ๋ด ์ดํ, ํต์ฌ, ์๋ฏธ ์ ๋ณด๋ฅผ ํฌ์ฐฉํ์ฌ ๋ชจ๋ธ๋งํ๋ค๊ณ ์๋ ค์ ธ ์๋ค. ๋ฐ๋ผ์ ELMo, GPT, BERT ๋ฑ์ ๊ทธ ์์ฒด๊ฐ ๋ค์ํ ์์ฐ์ด์ฒ๋ฆฌ ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ ์ ์๋ ๋ณดํธ์ ์ธ ๋ชจ๋ธ๋ก์ ๊ธฐ๋ฅํ๋ค.
๋ณธ ์ฐ๊ตฌ๋ ํ๊ตญ์ด ์๋ฃ๋ก ํ์ตํ ๋จ์ผ ์ธ์ด BERT ๋ชจ๋ธ์ ์ ์ํ๋ค. ๊ฐ์ฅ ๋จผ์ ๊ณต๊ฐ๋ ํ๊ตญ์ด๋ฅผ ๋ค๋ฃฐ ์ ์๋ BERT ๋ชจ๋ธ์ Google Research์ multilingual BERT (M-BERT)์๋ค. ์ด๋ ํ๊ตญ์ด์ ์์ด๋ฅผ ํฌํจํ์ฌ 104๊ฐ ์ธ์ด๋ก ๊ตฌ์ฑ๋ ํ์ต ๋ฐ์ดํฐ์ ์ดํ ๋ชฉ๋ก์ ๊ฐ์ง๊ณ ํ์ตํ ๋ชจ๋ธ์ด๋ฉฐ, ๋ชจ๋ธ ํ๋๋ก ํฌํจ๋ ๋ชจ๋ ์ธ์ด์ ํ
์คํธ๋ฅผ ์ฒ๋ฆฌํ ์ ์๋ค. ๊ทธ๋ฌ๋ ์ด๋ ๊ทธ ๋ค์ค์ธ์ด์ฑ์ด ๊ฐ๋ ์ฅ์ ์๋ ๋ถ๊ตฌํ๊ณ , ๊ฐ ์ธ์ด์ ํน์ฑ์ ์ถฉ๋ถํ ๋ฐ์ํ์ง ๋ชปํ์ฌ ๋จ์ผ ์ธ์ด ๋ชจ๋ธ๋ณด๋ค ๊ฐ ์ธ์ด์ ํ
์คํธ ์ฒ๋ฆฌ ์ฑ๋ฅ์ด ๋ฎ๋ค๋ ๋จ์ ์ ๋ณด์ธ๋ค. ๋ณธ ์ฐ๊ตฌ๋ ๊ทธ๋ฌํ ๋จ์ ๋ค์ ์ํํ๋ฉด์ ํ
์คํธ์ ํฌํจ๋์ด ์๋ ์ธ์ด ์ ๋ณด๋ฅผ ๋ณด๋ค ์ ํฌ์ฐฉํ ์ ์๋๋ก ๊ตฌ์ฑ๋ ๋ฐ์ดํฐ์ ์ดํ ๋ชฉ๋ก์ ์ด์ฉํ์ฌ ๋ชจ๋ธ์ ๊ตฌ์ถํ๊ณ ์ ํ์๋ค.
๋ฐ๋ผ์ ๋ณธ ์ฐ๊ตฌ์์๋ ํ๊ตญ์ด Wikipedia ํ
์คํธ์ ๋ด์ค ๊ธฐ์ฌ๋ก ๊ตฌ์ฑ๋ ๋ฐ์ดํฐ๋ฅผ ์ด์ฉํ์ฌ KR-BERT ๋ชจ๋ธ์ ๊ตฌํํ๊ณ , ์ด๋ฅผ GitHub์ ํตํด ๊ณต๊ฐํ์ฌ ํ๊ตญ์ด ์ ๋ณด์ฒ๋ฆฌ๋ฅผ ์ํด ์ฌ์ฉ๋ ์ ์๋๋ก ํ์๋ค. ๋ํ ํด๋น ํ์ต ๋ฐ์ดํฐ์ ๋๊ธ ๋ฐ์ดํฐ์ ๋ฒ์กฐ๋ฌธ๊ณผ ํ๊ฒฐ๋ฌธ์ ๋ง๋ถ์ฌ ํ์ฅํ ํ
์คํธ์ ๊ธฐ๋ฐํด์ ๋ค์ KR-BERT-MEDIUM ๋ชจ๋ธ์ ํ์ตํ์๋ค. ์ด ๋ชจ๋ธ์ ํด๋น ํ์ต ๋ฐ์ดํฐ๋ก๋ถํฐ WordPiece ์๊ณ ๋ฆฌ์ฆ์ ์ด์ฉํด ๊ตฌ์ฑํ ํ๊ธ ์ค์ฌ์ ํ ํฐ ๋ชฉ๋ก์ ์ฌ์ ์ผ๋ก ์ด์ฉํ์๋ค. ์ด๋ค ๋ชจ๋ธ์ ๊ฐ์ฒด๋ช
์ธ์, ์ง์์๋ต, ๋ฌธ์ฅ ์ ์ฌ๋ ํ๋จ, ๊ฐ์ ๋ถ์ ๋ฑ์ ๋ค์ํ ํ๊ตญ์ด ์์ฐ์ด์ฒ๋ฆฌ ๋ฌธ์ ์ ์ ์ฉ๋์ด ์ฐ์ํ ์ฑ๋ฅ์ ๋ณด๊ณ ํ๋ค.
๋ํ ๋ณธ ์ฐ๊ตฌ์์๋ BERT ๋ชจ๋ธ์ ๊ฐ์ ์์ง์ ์ถ๊ฐํ์ฌ ๊ทธ๊ฒ์ด ๊ฐ์ ๋ถ์์ ํนํ๋ ๋ชจ๋ธ๋ก์ ํ์ฅ๋ ๊ธฐ๋ฅ์ ํ๋๋ก ํ์๋ค. ๊ฐ์ ์์ง์ ํฌํจํ์ฌ ๋ณ๋์ ์๋ฒ ๋ฉ ๋ชจ๋ธ์ ํ์ต์์ผฐ๋๋ฐ, ์ด๋ ๊ฐ์ ์์ง์ ๋ฌธ์ฅ ๋ด์ ๊ฐ ํ ํฐ์ ํ๊ตญ์ด ๊ฐ์ ๋ถ์ ์ฝํผ์ค (KOSAC)์ ๋์ํ๋ ๊ฐ์ ๊ทน์ฑ(polarity)๊ณผ ๊ฐ๋(intensity) ๊ฐ์ ๋ถ์ฌํ ๊ฒ์ด๋ค. ๊ฐ ํ ํฐ์ ๋ถ์ฌ๋ ์์ง์ ๊ทธ ์์ฒด๋ก ๊ทน์ฑ ์๋ฒ ๋ฉ๊ณผ ๊ฐ๋ ์๋ฒ ๋ฉ์ ๊ตฌ์ฑํ๊ณ , BERT๊ฐ ๊ธฐ๋ณธ์ผ๋ก ํ๋ ํ ํฐ ์๋ฒ ๋ฉ์ ๋ํด์ง๋ค. ์ด๋ ๊ฒ ๋ง๋ค์ด์ง ์๋ฒ ๋ฉ์ ํ์ตํ ๊ฒ์ด ๊ฐ์ ์์ง ๋ชจ๋ธ(sentiment-combined model)์ด ๋๋ค.
KR-BERT์ ๊ฐ์ ํ์ต ๋ฐ์ดํฐ์ ๋ชจ๋ธ ๊ตฌ์ฑ์ ์ ์งํ๋ฉด์ ๊ฐ์ ์์ง์ ๊ฒฐํฉํ ๋ชจ๋ธ์ธ KR-BERT-KOSAC๋ฅผ ๊ตฌํํ๊ณ , ์ด๋ฅผ GitHub์ ํตํด ๋ฐฐํฌํ์๋ค. ๋ํ ๊ทธ๋ก๋ถํฐ ํ์ต ๊ณผ์ ๋ด ์ธ์ด ๋ชจ๋ธ๋ง๊ณผ ๊ฐ์ ๋ถ์ ๊ณผ์ ์์์ ์ฑ๋ฅ์ ์ป์ ๋ค KR-BERT์ ๋น๊ตํ์ฌ ๊ฐ์ ์์ง ์ถ๊ฐ์ ํจ๊ณผ๋ฅผ ์ดํด๋ณด์๋ค. ๋ํ ๊ฐ์ ์์ง ์ค ๊ทน์ฑ๊ณผ ๊ฐ๋ ๊ฐ์ ๊ฐ๊ฐ ์ ์ฉํ ๋ชจ๋ธ์ ๋ณ๋ ๊ตฌ์ฑํ์ฌ ๊ฐ ์์ง์ด ๋ชจ๋ธ ์ฑ๋ฅ ํฅ์์ ์ผ๋ง๋ ๊ธฐ์ฌํ๋์ง๋ ํ์ธํ์๋ค. ์ด๋ฅผ ํตํด ๋ ๊ฐ์ง ๊ฐ์ ์์ง์ ๋ชจ๋ ์ถ๊ฐํ ๊ฒฝ์ฐ์, ๊ทธ๋ ์ง ์์ ๋ค๋ฅธ ๋ชจ๋ธ๋ค์ ๋นํ์ฌ ์ธ์ด ๋ชจ๋ธ๋ง์ด๋ ๊ฐ์ ๋ถ์ ๋ฌธ์ ์์ ์ฑ๋ฅ์ด ์ด๋ ์ ๋ ํฅ์๋๋ ๊ฒ์ ๊ด์ฐฐํ ์ ์์๋ค. ์ด๋ ๊ฐ์ ๋ถ์ ๋ฌธ์ ๋ก๋ ์ํํ์ ๊ธ๋ถ์ ์ฌ๋ถ ๋ถ๋ฅ์ ๋๊ธ์ ์
ํ ์ฌ๋ถ ๋ถ๋ฅ๋ฅผ ํฌํจํ์๋ค.
๊ทธ๋ฐ๋ฐ ์์ ๊ฐ์ ์๋ฒ ๋ฉ ๋ชจ๋ธ์ ์ฌ์ ํ์ตํ๋ ๊ฒ์ ๋ง์ ์๊ฐ๊ณผ ํ๋์จ์ด ๋ฑ์ ์์์ ์๊ตฌํ๋ค. ๋ฐ๋ผ์ ๋ณธ ์ฐ๊ตฌ์์๋ ๋น๊ต์ ์ ์ ์๊ฐ๊ณผ ์์์ ์ฌ์ฉํ๋ ๊ฐ๋จํ ๋ชจ๋ธ ๊ฒฐํฉ ๋ฐฉ๋ฒ์ ์ ์ํ๋ค. ์ ์ ์์ ์ธ์ฝ๋ ๋ ์ด์ด, ์ดํ
์
ํค๋, ์ ์ ์๋ฒ ๋ฉ ์ฐจ์ ์๋ก ๊ตฌ์ฑํ ๊ฐ์ ์์ง ๋ชจ๋ธ์ ์ ์ ์คํ
์๊น์ง๋ง ํ์ตํ๊ณ , ์ด๋ฅผ ๊ธฐ์กด์ ํฐ ๊ท๋ชจ๋ก ์ฌ์ ํ์ต๋์ด ์๋ ์๋ฒ ๋ฉ ๋ชจ๋ธ๊ณผ ๊ฒฐํฉํ๋ค. ๊ธฐ์กด์ ์ฌ์ ํ์ต๋ชจ๋ธ์๋ ์ถฉ๋ถํ ์ธ์ด ๋ชจ๋ธ๋ง์ ํตํด ๋ค์ํ ์ธ์ด ์ฒ๋ฆฌ ๋ฌธ์ ๋ฅผ ์ฒ๋ฆฌํ ์ ์๋ ๋ณดํธ์ ์ธ ๊ธฐ๋ฅ์ด ๊ธฐ๋๋๋ฏ๋ก, ์ด๋ฌํ ๊ฒฐํฉ์ ์๋ก ๋ค๋ฅธ ์ฅ์ ์ ๊ฐ๋ ๋ ๋ชจ๋ธ์ด ์ํธ์์ฉํ์ฌ ๋ ์ฐ์ํ ์์ฐ์ด์ฒ๋ฆฌ ๋ฅ๋ ฅ์ ๊ฐ๋๋ก ํ ๊ฒ์ด๋ค. ๋ณธ ์ฐ๊ตฌ์์๋ ๊ฐ์ ๋ถ์ ๋ฌธ์ ๋ค์ ๋ํ ์คํ์ ํตํด ๋ ๊ฐ์ง ๋ชจ๋ธ์ ๊ฒฐํฉ์ด ํ์ต ์๊ฐ์ ์์ด ํจ์จ์ ์ด๋ฉด์๋, ๊ฐ์ ์์ง์ ๋ํ์ง ์์ ๋ชจ๋ธ๋ณด๋ค ๋ ์ ํํ ์์ธก์ ํ ์ ์๋ค๋ ๊ฒ์ ํ์ธํ์๋ค.1 Introduction 1
1.1 Objectives 3
1.2 Contribution 9
1.3 Dissertation Structure 10
2 Related Work 13
2.1 Language Modeling and the Attention Mechanism 13
2.2 BERT-based Models 16
2.2.1 BERT and Variation Models 16
2.2.2 Korean-Specific BERT Models 19
2.2.3 Task-Specific BERT Models 22
2.3 Sentiment Analysis 24
2.4 Chapter Summary 30
3 BERT Architecture and Evaluations 33
3.1 Bidirectional Encoder Representations from Transformers (BERT) 33
3.1.1 Transformers and the Multi-Head Self-Attention Mechanism 34
3.1.2 Tokenization and Embeddings of BERT 39
3.1.3 Training and Fine-Tuning BERT 42
3.2 Evaluation of BERT 47
3.2.1 NLP Tasks 47
3.2.2 Metrics 50
3.3 Chapter Summary 52
4 Pre-Training of Korean BERT-based Model 55
4.1 The Need for a Korean Monolingual Model 55
4.2 Pre-Training Korean-specific BERT Model 58
4.3 Chapter Summary 70
5 Performances of Korean-Specific BERT Models 71
5.1 Task Datasets 71
5.1.1 Named Entity Recognition 71
5.1.2 Question Answering 73
5.1.3 Natural Language Inference 74
5.1.4 Semantic Textual Similarity 78
5.1.5 Sentiment Analysis 80
5.2 Experiments 81
5.2.1 Experiment Details 81
5.2.2 Task Results 83
5.3 Chapter Summary 89
6 An Extended Study to Sentiment Analysis 91
6.1 Sentiment Features 91
6.1.1 Sources of Sentiment Features 91
6.1.2 Assigning Prior Sentiment Values 94
6.2 Composition of Sentiment Embeddings 103
6.3 Training the Sentiment-Combined Model 109
6.4 Effect of Sentiment Features 113
6.5 Chapter Summary 121
7 Combining Two BERT Models 123
7.1 External Fusing Method 123
7.2 Experiments and Results 130
7.3 Chapter Summary 135
8 Conclusion 137
8.1 Summary of Contribution and Results 138
8.1.1 Construction of Korean Pre-trained BERT Models 138
8.1.2 Construction of a Sentiment-Combined Model 138
8.1.3 External Fusing of Two Pre-Trained Models to Gain Performance and Cost Advantages 139
8.2 Future Directions and Open Problems 140
8.2.1 More Training of KR-BERT-MEDIUM for Convergence of Performance 140
8.2.2 Observation of Changes Depending on the Domain of Training Data 141
8.2.3 Overlap of Sentiment Features with Linguistic Knowledge that BERT Learns 142
8.2.4 The Specific Process of Sentiment Features Helping the Language Modeling of BERT is Unknown 143
Bibliography 145
Appendices 157
A. Python Sources 157
A.1 Construction of Polarity and Intensity Embeddings 157
A.2 External Fusing of Different Pre-Trained Models 158
B. Examples of Experiment Outputs 162
C. Model Releases through GitHub 165Docto
ATP: A holistic attention integrated approach to enhance ABSA
Aspect based sentiment analysis (ABSA) deals with the identification of the
sentiment polarity of a review sentence towards a given aspect. Deep Learning
sequential models like RNN, LSTM, and GRU are current state-of-the-art methods
for inferring the sentiment polarity. These methods work well to capture the
contextual relationship between the words of a review sentence. However, these
methods are insignificant in capturing long-term dependencies. Attention
mechanism plays a significant role by focusing only on the most crucial part of
the sentence. In the case of ABSA, aspect position plays a vital role. Words
near to aspect contribute more while determining the sentiment towards the
aspect. Therefore, we propose a method that captures the position based
information using dependency parsing tree and helps attention mechanism. Using
this type of position information over a simple word-distance-based position
enhances the deep learning model's performance. We performed the experiments on
SemEval'14 dataset to demonstrate the effect of dependency parsing
relation-based attention for ABSA
- โฆ