262 research outputs found

    A Robust Transformation-Based Learning Approach Using Ripple Down Rules for Part-of-Speech Tagging

    Full text link
    In this paper, we propose a new approach to construct a system of transformation rules for the Part-of-Speech (POS) tagging task. Our approach is based on an incremental knowledge acquisition method where rules are stored in an exception structure and new rules are only added to correct the errors of existing rules; thus allowing systematic control of the interaction between the rules. Experimental results on 13 languages show that our approach is fast in terms of training time and tagging speed. Furthermore, our approach obtains very competitive accuracy in comparison to state-of-the-art POS and morphological taggers.Comment: Version 1: 13 pages. Version 2: Submitted to AI Communications - the European Journal on Artificial Intelligence. Version 3: Resubmitted after major revisions. Version 4: Resubmitted after minor revisions. Version 5: to appear in AI Communications (accepted for publication on 3/12/2015

    ํ•œ๊ตญ์–ด ์‚ฌ์ „ํ•™์Šต๋ชจ๋ธ ๊ตฌ์ถ•๊ณผ ํ™•์žฅ ์—ฐ๊ตฌ: ๊ฐ์ •๋ถ„์„์„ ์ค‘์‹ฌ์œผ๋กœ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ธ๋ฌธ๋Œ€ํ•™ ์–ธ์–ดํ•™๊ณผ, 2021. 2. ์‹ ํšจํ•„.Recently, as interest in the Bidirectional Encoder Representations from Transformers (BERT) model has increased, many studies have also been actively conducted in Natural Language Processing based on the model. Such sentence-level contextualized embedding models are generally known to capture and model lexical, syntactic, and semantic information in sentences during training. Therefore, such models, including ELMo, GPT, and BERT, function as a universal model that can impressively perform a wide range of NLP tasks. This study proposes a monolingual BERT model trained based on Korean texts. The first released BERT model that can handle the Korean language was Google Researchโ€™s multilingual BERT (M-BERT), which was constructed with training data and a vocabulary composed of 104 languages, including Korean and English, and can handle the text of any language contained in the single model. However, despite the advantages of multilingualism, this model does not fully reflect each languageโ€™s characteristics, so that its text processing performance in each language is lower than that of a monolingual model. While mitigating those shortcomings, we built monolingual models using the training data and a vocabulary organized to better capture Korean textsโ€™ linguistic knowledge. Therefore, in this study, a model named KR-BERT was built using training data composed of Korean Wikipedia text and news articles, and was released through GitHub so that it could be used for processing Korean texts. Additionally, we trained a KR-BERT-MEDIUM model based on expanded data by adding comments and legal texts to the training data of KR-BERT. Each model used a list of tokens composed mainly of Hangul characters as its vocabulary, organized using WordPiece algorithms based on the corresponding training data. These models reported competent performances in various Korean NLP tasks such as Named Entity Recognition, Question Answering, Semantic Textual Similarity, and Sentiment Analysis. In addition, we added sentiment features to the BERT model to specialize it to better function in sentiment analysis. We constructed a sentiment-combined model including sentiment features, where the features consist of polarity and intensity values assigned to each token in the training data corresponding to that of Korean Sentiment Analysis Corpus (KOSAC). The sentiment features assigned to each token compose polarity and intensity embeddings and are infused to the basic BERT input embeddings. The sentiment-combined model is constructed by training the BERT model with these embeddings. We trained a model named KR-BERT-KOSAC that contains sentiment features while maintaining the same training data, vocabulary, and model configurations as KR-BERT and distributed it through GitHub. Then we analyzed the effects of using sentiment features in comparison to KR-BERT by observing their performance in language modeling during the training process and sentiment analysis tasks. Additionally, we determined how much each of the polarity and intensity features contributes to improving the model performance by separately organizing a model that utilizes each of the features, respectively. We obtained some increase in language modeling and sentiment analysis performances by using both the sentiment features, compared to other models with different feature composition. Here, we included the problems of binary positivity classification of movie reviews and hate speech detection on offensive comments as the sentiment analysis tasks. On the other hand, training these embedding models requires a lot of training time and hardware resources. Therefore, this study proposes a simple model fusing method that requires relatively little time. We trained a smaller-scaled sentiment-combined model consisting of a smaller number of encoder layers and attention heads and smaller hidden sizes for a few steps, combining it with an existing pre-trained BERT model. Since those pre-trained models are expected to function universally to handle various NLP problems based on good language modeling, this combination will allow two models with different advantages to interact and have better text processing capabilities. In this study, experiments on sentiment analysis problems have confirmed that combining the two models is efficient in training time and usage of hardware resources, while it can produce more accurate predictions than single models that do not include sentiment features.์ตœ๊ทผ ํŠธ๋žœ์Šคํฌ๋จธ ์–‘๋ฐฉํ–ฅ ์ธ์ฝ”๋” ํ‘œํ˜„ (Bidirectional Encoder Representations from Transformers, BERT) ๋ชจ๋ธ์— ๋Œ€ํ•œ ๊ด€์‹ฌ์ด ๋†’์•„์ง€๋ฉด์„œ ์ž์—ฐ์–ด์ฒ˜๋ฆฌ ๋ถ„์•ผ์—์„œ ์ด์— ๊ธฐ๋ฐ˜ํ•œ ์—ฐ๊ตฌ ์—ญ์‹œ ํ™œ๋ฐœํžˆ ์ด๋ฃจ์–ด์ง€๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์žฅ ๋‹จ์œ„์˜ ์ž„๋ฒ ๋”ฉ์„ ์œ„ํ•œ ๋ชจ๋ธ๋“ค์€ ๋ณดํ†ต ํ•™์Šต ๊ณผ์ •์—์„œ ๋ฌธ์žฅ ๋‚ด ์–ดํœ˜, ํ†ต์‚ฌ, ์˜๋ฏธ ์ •๋ณด๋ฅผ ํฌ์ฐฉํ•˜์—ฌ ๋ชจ๋ธ๋งํ•œ๋‹ค๊ณ  ์•Œ๋ ค์ ธ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ELMo, GPT, BERT ๋“ฑ์€ ๊ทธ ์ž์ฒด๊ฐ€ ๋‹ค์–‘ํ•œ ์ž์—ฐ์–ด์ฒ˜๋ฆฌ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋Š” ๋ณดํŽธ์ ์ธ ๋ชจ๋ธ๋กœ์„œ ๊ธฐ๋Šฅํ•œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ํ•œ๊ตญ์–ด ์ž๋ฃŒ๋กœ ํ•™์Šตํ•œ ๋‹จ์ผ ์–ธ์–ด BERT ๋ชจ๋ธ์„ ์ œ์•ˆํ•œ๋‹ค. ๊ฐ€์žฅ ๋จผ์ € ๊ณต๊ฐœ๋œ ํ•œ๊ตญ์–ด๋ฅผ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ๋Š” BERT ๋ชจ๋ธ์€ Google Research์˜ multilingual BERT (M-BERT)์˜€๋‹ค. ์ด๋Š” ํ•œ๊ตญ์–ด์™€ ์˜์–ด๋ฅผ ํฌํ•จํ•˜์—ฌ 104๊ฐœ ์–ธ์–ด๋กœ ๊ตฌ์„ฑ๋œ ํ•™์Šต ๋ฐ์ดํ„ฐ์™€ ์–ดํœ˜ ๋ชฉ๋ก์„ ๊ฐ€์ง€๊ณ  ํ•™์Šตํ•œ ๋ชจ๋ธ์ด๋ฉฐ, ๋ชจ๋ธ ํ•˜๋‚˜๋กœ ํฌํ•จ๋œ ๋ชจ๋“  ์–ธ์–ด์˜ ํ…์ŠคํŠธ๋ฅผ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋Š” ๊ทธ ๋‹ค์ค‘์–ธ์–ด์„ฑ์ด ๊ฐ–๋Š” ์žฅ์ ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , ๊ฐ ์–ธ์–ด์˜ ํŠน์„ฑ์„ ์ถฉ๋ถ„ํžˆ ๋ฐ˜์˜ํ•˜์ง€ ๋ชปํ•˜์—ฌ ๋‹จ์ผ ์–ธ์–ด ๋ชจ๋ธ๋ณด๋‹ค ๊ฐ ์–ธ์–ด์˜ ํ…์ŠคํŠธ ์ฒ˜๋ฆฌ ์„ฑ๋Šฅ์ด ๋‚ฎ๋‹ค๋Š” ๋‹จ์ ์„ ๋ณด์ธ๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ๊ทธ๋Ÿฌํ•œ ๋‹จ์ ๋“ค์„ ์™„ํ™”ํ•˜๋ฉด์„œ ํ…์ŠคํŠธ์— ํฌํ•จ๋˜์–ด ์žˆ๋Š” ์–ธ์–ด ์ •๋ณด๋ฅผ ๋ณด๋‹ค ์ž˜ ํฌ์ฐฉํ•  ์ˆ˜ ์žˆ๋„๋ก ๊ตฌ์„ฑ๋œ ๋ฐ์ดํ„ฐ์™€ ์–ดํœ˜ ๋ชฉ๋ก์„ ์ด์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ•˜๊ณ ์ž ํ•˜์˜€๋‹ค. ๋”ฐ๋ผ์„œ ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ํ•œ๊ตญ์–ด Wikipedia ํ…์ŠคํŠธ์™€ ๋‰ด์Šค ๊ธฐ์‚ฌ๋กœ ๊ตฌ์„ฑ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•˜์—ฌ KR-BERT ๋ชจ๋ธ์„ ๊ตฌํ˜„ํ•˜๊ณ , ์ด๋ฅผ GitHub์„ ํ†ตํ•ด ๊ณต๊ฐœํ•˜์—ฌ ํ•œ๊ตญ์–ด ์ •๋ณด์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•ด ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๋„๋ก ํ•˜์˜€๋‹ค. ๋˜ํ•œ ํ•ด๋‹น ํ•™์Šต ๋ฐ์ดํ„ฐ์— ๋Œ“๊ธ€ ๋ฐ์ดํ„ฐ์™€ ๋ฒ•์กฐ๋ฌธ๊ณผ ํŒ๊ฒฐ๋ฌธ์„ ๋ง๋ถ™์—ฌ ํ™•์žฅํ•œ ํ…์ŠคํŠธ์— ๊ธฐ๋ฐ˜ํ•ด์„œ ๋‹ค์‹œ KR-BERT-MEDIUM ๋ชจ๋ธ์„ ํ•™์Šตํ•˜์˜€๋‹ค. ์ด ๋ชจ๋ธ์€ ํ•ด๋‹น ํ•™์Šต ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ WordPiece ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ด์šฉํ•ด ๊ตฌ์„ฑํ•œ ํ•œ๊ธ€ ์ค‘์‹ฌ์˜ ํ† ํฐ ๋ชฉ๋ก์„ ์‚ฌ์ „์œผ๋กœ ์ด์šฉํ•˜์˜€๋‹ค. ์ด๋“ค ๋ชจ๋ธ์€ ๊ฐœ์ฒด๋ช… ์ธ์‹, ์งˆ์˜์‘๋‹ต, ๋ฌธ์žฅ ์œ ์‚ฌ๋„ ํŒ๋‹จ, ๊ฐ์ • ๋ถ„์„ ๋“ฑ์˜ ๋‹ค์–‘ํ•œ ํ•œ๊ตญ์–ด ์ž์—ฐ์–ด์ฒ˜๋ฆฌ ๋ฌธ์ œ์— ์ ์šฉ๋˜์–ด ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด๊ณ ํ–ˆ๋‹ค. ๋˜ํ•œ ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” BERT ๋ชจ๋ธ์— ๊ฐ์ • ์ž์งˆ์„ ์ถ”๊ฐ€ํ•˜์—ฌ ๊ทธ๊ฒƒ์ด ๊ฐ์ • ๋ถ„์„์— ํŠนํ™”๋œ ๋ชจ๋ธ๋กœ์„œ ํ™•์žฅ๋œ ๊ธฐ๋Šฅ์„ ํ•˜๋„๋ก ํ•˜์˜€๋‹ค. ๊ฐ์ • ์ž์งˆ์„ ํฌํ•จํ•˜์—ฌ ๋ณ„๋„์˜ ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ์„ ํ•™์Šต์‹œ์ผฐ๋Š”๋ฐ, ์ด๋•Œ ๊ฐ์ • ์ž์งˆ์€ ๋ฌธ์žฅ ๋‚ด์˜ ๊ฐ ํ† ํฐ์— ํ•œ๊ตญ์–ด ๊ฐ์ • ๋ถ„์„ ์ฝ”ํผ์Šค (KOSAC)์— ๋Œ€์‘ํ•˜๋Š” ๊ฐ์ • ๊ทน์„ฑ(polarity)๊ณผ ๊ฐ•๋„(intensity) ๊ฐ’์„ ๋ถ€์—ฌํ•œ ๊ฒƒ์ด๋‹ค. ๊ฐ ํ† ํฐ์— ๋ถ€์—ฌ๋œ ์ž์งˆ์€ ๊ทธ ์ž์ฒด๋กœ ๊ทน์„ฑ ์ž„๋ฒ ๋”ฉ๊ณผ ๊ฐ•๋„ ์ž„๋ฒ ๋”ฉ์„ ๊ตฌ์„ฑํ•˜๊ณ , BERT๊ฐ€ ๊ธฐ๋ณธ์œผ๋กœ ํ•˜๋Š” ํ† ํฐ ์ž„๋ฒ ๋”ฉ์— ๋”ํ•ด์ง„๋‹ค. ์ด๋ ‡๊ฒŒ ๋งŒ๋“ค์–ด์ง„ ์ž„๋ฒ ๋”ฉ์„ ํ•™์Šตํ•œ ๊ฒƒ์ด ๊ฐ์ • ์ž์งˆ ๋ชจ๋ธ(sentiment-combined model)์ด ๋œ๋‹ค. KR-BERT์™€ ๊ฐ™์€ ํ•™์Šต ๋ฐ์ดํ„ฐ์™€ ๋ชจ๋ธ ๊ตฌ์„ฑ์„ ์œ ์ง€ํ•˜๋ฉด์„œ ๊ฐ์ • ์ž์งˆ์„ ๊ฒฐํ•ฉํ•œ ๋ชจ๋ธ์ธ KR-BERT-KOSAC๋ฅผ ๊ตฌํ˜„ํ•˜๊ณ , ์ด๋ฅผ GitHub์„ ํ†ตํ•ด ๋ฐฐํฌํ•˜์˜€๋‹ค. ๋˜ํ•œ ๊ทธ๋กœ๋ถ€ํ„ฐ ํ•™์Šต ๊ณผ์ • ๋‚ด ์–ธ์–ด ๋ชจ๋ธ๋ง๊ณผ ๊ฐ์ • ๋ถ„์„ ๊ณผ์ œ์—์„œ์˜ ์„ฑ๋Šฅ์„ ์–ป์€ ๋’ค KR-BERT์™€ ๋น„๊ตํ•˜์—ฌ ๊ฐ์ • ์ž์งˆ ์ถ”๊ฐ€์˜ ํšจ๊ณผ๋ฅผ ์‚ดํŽด๋ณด์•˜๋‹ค. ๋˜ํ•œ ๊ฐ์ • ์ž์งˆ ์ค‘ ๊ทน์„ฑ๊ณผ ๊ฐ•๋„ ๊ฐ’์„ ๊ฐ๊ฐ ์ ์šฉํ•œ ๋ชจ๋ธ์„ ๋ณ„๋„ ๊ตฌ์„ฑํ•˜์—ฌ ๊ฐ ์ž์งˆ์ด ๋ชจ๋ธ ์„ฑ๋Šฅ ํ–ฅ์ƒ์— ์–ผ๋งˆ๋‚˜ ๊ธฐ์—ฌํ•˜๋Š”์ง€๋„ ํ™•์ธํ•˜์˜€๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋‘ ๊ฐ€์ง€ ๊ฐ์ • ์ž์งˆ์„ ๋ชจ๋‘ ์ถ”๊ฐ€ํ•œ ๊ฒฝ์šฐ์—, ๊ทธ๋ ‡์ง€ ์•Š์€ ๋‹ค๋ฅธ ๋ชจ๋ธ๋“ค์— ๋น„ํ•˜์—ฌ ์–ธ์–ด ๋ชจ๋ธ๋ง์ด๋‚˜ ๊ฐ์ • ๋ถ„์„ ๋ฌธ์ œ์—์„œ ์„ฑ๋Šฅ์ด ์–ด๋Š ์ •๋„ ํ–ฅ์ƒ๋˜๋Š” ๊ฒƒ์„ ๊ด€์ฐฐํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ์ด๋•Œ ๊ฐ์ • ๋ถ„์„ ๋ฌธ์ œ๋กœ๋Š” ์˜ํ™”ํ‰์˜ ๊ธ๋ถ€์ • ์—ฌ๋ถ€ ๋ถ„๋ฅ˜์™€ ๋Œ“๊ธ€์˜ ์•…ํ”Œ ์—ฌ๋ถ€ ๋ถ„๋ฅ˜๋ฅผ ํฌํ•จํ•˜์˜€๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ์œ„์™€ ๊ฐ™์€ ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ์„ ์‚ฌ์ „ํ•™์Šตํ•˜๋Š” ๊ฒƒ์€ ๋งŽ์€ ์‹œ๊ฐ„๊ณผ ํ•˜๋“œ์›จ์–ด ๋“ฑ์˜ ์ž์›์„ ์š”๊ตฌํ•œ๋‹ค. ๋”ฐ๋ผ์„œ ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๋น„๊ต์  ์ ์€ ์‹œ๊ฐ„๊ณผ ์ž์›์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฐ„๋‹จํ•œ ๋ชจ๋ธ ๊ฒฐํ•ฉ ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•œ๋‹ค. ์ ์€ ์ˆ˜์˜ ์ธ์ฝ”๋” ๋ ˆ์ด์–ด, ์–ดํ…์…˜ ํ—ค๋“œ, ์ ์€ ์ž„๋ฒ ๋”ฉ ์ฐจ์› ์ˆ˜๋กœ ๊ตฌ์„ฑํ•œ ๊ฐ์ • ์ž์งˆ ๋ชจ๋ธ์„ ์ ์€ ์Šคํ… ์ˆ˜๊นŒ์ง€๋งŒ ํ•™์Šตํ•˜๊ณ , ์ด๋ฅผ ๊ธฐ์กด์— ํฐ ๊ทœ๋ชจ๋กœ ์‚ฌ์ „ํ•™์Šต๋˜์–ด ์žˆ๋Š” ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ๊ณผ ๊ฒฐํ•ฉํ•œ๋‹ค. ๊ธฐ์กด์˜ ์‚ฌ์ „ํ•™์Šต๋ชจ๋ธ์—๋Š” ์ถฉ๋ถ„ํ•œ ์–ธ์–ด ๋ชจ๋ธ๋ง์„ ํ†ตํ•ด ๋‹ค์–‘ํ•œ ์–ธ์–ด ์ฒ˜๋ฆฌ ๋ฌธ์ œ๋ฅผ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” ๋ณดํŽธ์ ์ธ ๊ธฐ๋Šฅ์ด ๊ธฐ๋Œ€๋˜๋ฏ€๋กœ, ์ด๋Ÿฌํ•œ ๊ฒฐํ•ฉ์€ ์„œ๋กœ ๋‹ค๋ฅธ ์žฅ์ ์„ ๊ฐ–๋Š” ๋‘ ๋ชจ๋ธ์ด ์ƒํ˜ธ์ž‘์šฉํ•˜์—ฌ ๋” ์šฐ์ˆ˜ํ•œ ์ž์—ฐ์–ด์ฒ˜๋ฆฌ ๋Šฅ๋ ฅ์„ ๊ฐ–๋„๋ก ํ•  ๊ฒƒ์ด๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๊ฐ์ • ๋ถ„์„ ๋ฌธ์ œ๋“ค์— ๋Œ€ํ•œ ์‹คํ—˜์„ ํ†ตํ•ด ๋‘ ๊ฐ€์ง€ ๋ชจ๋ธ์˜ ๊ฒฐํ•ฉ์ด ํ•™์Šต ์‹œ๊ฐ„์— ์žˆ์–ด ํšจ์œจ์ ์ด๋ฉด์„œ๋„, ๊ฐ์ • ์ž์งˆ์„ ๋”ํ•˜์ง€ ์•Š์€ ๋ชจ๋ธ๋ณด๋‹ค ๋” ์ •ํ™•ํ•œ ์˜ˆ์ธก์„ ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค.1 Introduction 1 1.1 Objectives 3 1.2 Contribution 9 1.3 Dissertation Structure 10 2 Related Work 13 2.1 Language Modeling and the Attention Mechanism 13 2.2 BERT-based Models 16 2.2.1 BERT and Variation Models 16 2.2.2 Korean-Specific BERT Models 19 2.2.3 Task-Specific BERT Models 22 2.3 Sentiment Analysis 24 2.4 Chapter Summary 30 3 BERT Architecture and Evaluations 33 3.1 Bidirectional Encoder Representations from Transformers (BERT) 33 3.1.1 Transformers and the Multi-Head Self-Attention Mechanism 34 3.1.2 Tokenization and Embeddings of BERT 39 3.1.3 Training and Fine-Tuning BERT 42 3.2 Evaluation of BERT 47 3.2.1 NLP Tasks 47 3.2.2 Metrics 50 3.3 Chapter Summary 52 4 Pre-Training of Korean BERT-based Model 55 4.1 The Need for a Korean Monolingual Model 55 4.2 Pre-Training Korean-specific BERT Model 58 4.3 Chapter Summary 70 5 Performances of Korean-Specific BERT Models 71 5.1 Task Datasets 71 5.1.1 Named Entity Recognition 71 5.1.2 Question Answering 73 5.1.3 Natural Language Inference 74 5.1.4 Semantic Textual Similarity 78 5.1.5 Sentiment Analysis 80 5.2 Experiments 81 5.2.1 Experiment Details 81 5.2.2 Task Results 83 5.3 Chapter Summary 89 6 An Extended Study to Sentiment Analysis 91 6.1 Sentiment Features 91 6.1.1 Sources of Sentiment Features 91 6.1.2 Assigning Prior Sentiment Values 94 6.2 Composition of Sentiment Embeddings 103 6.3 Training the Sentiment-Combined Model 109 6.4 Effect of Sentiment Features 113 6.5 Chapter Summary 121 7 Combining Two BERT Models 123 7.1 External Fusing Method 123 7.2 Experiments and Results 130 7.3 Chapter Summary 135 8 Conclusion 137 8.1 Summary of Contribution and Results 138 8.1.1 Construction of Korean Pre-trained BERT Models 138 8.1.2 Construction of a Sentiment-Combined Model 138 8.1.3 External Fusing of Two Pre-Trained Models to Gain Performance and Cost Advantages 139 8.2 Future Directions and Open Problems 140 8.2.1 More Training of KR-BERT-MEDIUM for Convergence of Performance 140 8.2.2 Observation of Changes Depending on the Domain of Training Data 141 8.2.3 Overlap of Sentiment Features with Linguistic Knowledge that BERT Learns 142 8.2.4 The Specific Process of Sentiment Features Helping the Language Modeling of BERT is Unknown 143 Bibliography 145 Appendices 157 A. Python Sources 157 A.1 Construction of Polarity and Intensity Embeddings 157 A.2 External Fusing of Different Pre-Trained Models 158 B. Examples of Experiment Outputs 162 C. Model Releases through GitHub 165Docto

    Meta-learning for fast cross-lingual adaptation in dependency parsing

    Get PDF
    Meta-learning, or learning to learn, is a technique that can help to overcome resource scarcity in cross-lingual NLP problems, by enabling fast adaptation to new tasks. We apply model-agnostic meta-learning (MAML) to the task of cross-lingual dependency parsing. We train our model on a diverse set of languages to learn a parameter initialization that can adapt quickly to new languages. We find that meta-learning with pre-training can significantly improve upon the performance of language transfer and standard supervised learning baselines for a variety of unseen, typologically diverse, and low-resource languages, in a few-shot learning setup

    Human-machine Translation Model Evaluation Based on Artificial Intelligence Translation

    Get PDF
    As artificial intelligence (AI) translation technology advances, big data, cloud computing, and emerging technologies have enhanced the progress of the data industry over the past several decades. Human-machine translation becomes a new interactive mode between humans and machines and plays an essential role in transmitting information. Nevertheless, several translation models have their drawbacks and limitations, such as error rates and inaccuracy, and they are not able to adapt to the various demands of different groups. Taking the AI-based translation model as the research object, this study conducted an analysis of attention mechanisms and relevant technical means, examined the setbacks of conventional translation models, and proposed an AI-based translation model that produced a clear and high quality translation and presented a reference to further perfect AI-based translation models. The values of the manual and automated evaluation have demonstrated that the human-machine translation model improved the mismatchings between texts and contexts and enhanced the accurate and efficient intelligent recognition and expressions. It is set to a score of 1-10 for evaluation comparison with 30 language users as participants, and the achieved 6 points or above is considered effective. The research results suggested that the language fluency score rose from 4.9667 for conventional Statistical Machine Translation to 6.6333 for the AI-based translation model. As a result, the human-machine translation model improved the efficiency, speed, precision, and accuracy of language input to a certain degree, strengthened the correlation between semantic characteristics and intelligent recognition, and pushed the advancement of intelligent recognition. It can provide accurate and high-quality translation for language users and achieve an understanding of natural language input and output and automatic processing

    MEGA: Multilingual Evaluation of Generative AI

    Full text link
    Generative AI models have shown impressive performance on many Natural Language Processing tasks such as language understanding, reasoning, and language generation. An important question being asked by the AI community today is about the capabilities and limits of these models, and it is clear that evaluating generative AI is very challenging. Most studies on generative LLMs have been restricted to English and it is unclear how capable these models are at understanding and generating text in other languages. We present the first comprehensive benchmarking of generative LLMs - MEGA, which evaluates models on standard NLP benchmarks, covering 16 NLP datasets across 70 typologically diverse languages. We compare the performance of generative LLMs including Chat-GPT and GPT-4 to State of the Art (SOTA) non-autoregressive models on these tasks to determine how well generative models perform compared to the previous generation of LLMs. We present a thorough analysis of the performance of models across languages and tasks and discuss challenges in improving the performance of generative LLMs on low-resource languages. We create a framework for evaluating generative LLMs in the multilingual setting and provide directions for future progress in the field.Comment: EMNLP 202
    • โ€ฆ
    corecore