Search CORE

4,659 research outputs found

Econometrics meets sentiment : an overview of methodology and applications

Author: Algaba Andres
Ardia David
Bluteau Keven
Borms Samuel
Boudt Kris
Publication venue: 'Wiley'
Publication date: 01/01/2020
Field of study

The advent of massive amounts of textual, audio, and visual data has spurred the development of econometric methodology to transform qualitative sentiment data into quantitative sentiment variables, and to use those variables in an econometric analysis of the relationships between sentiment and other variables. We survey this emerging research field and refer to it as sentometrics, which is a portmanteau of sentiment and econometrics. We provide a synthesis of the relevant methodological approaches, illustrate with empirical results, and discuss useful software

VU Research Portal

Crossref

Ghent University Academic Bibliography

한국어 사전학습모델 구축과 확장 연구: 감정분석을 중심으로

Author: 이상아
Publication venue: 서울대학교 대학원
Publication date: 01/02/2021
Field of study

학위논문 (박사) -- 서울대학교 대학원 : 인문대학 언어학과, 2021. 2. 신효필.Recently, as interest in the Bidirectional Encoder Representations from Transformers (BERT) model has increased, many studies have also been actively conducted in Natural Language Processing based on the model. Such sentence-level contextualized embedding models are generally known to capture and model lexical, syntactic, and semantic information in sentences during training. Therefore, such models, including ELMo, GPT, and BERT, function as a universal model that can impressively perform a wide range of NLP tasks. This study proposes a monolingual BERT model trained based on Korean texts. The first released BERT model that can handle the Korean language was Google Research’s multilingual BERT (M-BERT), which was constructed with training data and a vocabulary composed of 104 languages, including Korean and English, and can handle the text of any language contained in the single model. However, despite the advantages of multilingualism, this model does not fully reflect each language’s characteristics, so that its text processing performance in each language is lower than that of a monolingual model. While mitigating those shortcomings, we built monolingual models using the training data and a vocabulary organized to better capture Korean texts’ linguistic knowledge. Therefore, in this study, a model named KR-BERT was built using training data composed of Korean Wikipedia text and news articles, and was released through GitHub so that it could be used for processing Korean texts. Additionally, we trained a KR-BERT-MEDIUM model based on expanded data by adding comments and legal texts to the training data of KR-BERT. Each model used a list of tokens composed mainly of Hangul characters as its vocabulary, organized using WordPiece algorithms based on the corresponding training data. These models reported competent performances in various Korean NLP tasks such as Named Entity Recognition, Question Answering, Semantic Textual Similarity, and Sentiment Analysis. In addition, we added sentiment features to the BERT model to specialize it to better function in sentiment analysis. We constructed a sentiment-combined model including sentiment features, where the features consist of polarity and intensity values assigned to each token in the training data corresponding to that of Korean Sentiment Analysis Corpus (KOSAC). The sentiment features assigned to each token compose polarity and intensity embeddings and are infused to the basic BERT input embeddings. The sentiment-combined model is constructed by training the BERT model with these embeddings. We trained a model named KR-BERT-KOSAC that contains sentiment features while maintaining the same training data, vocabulary, and model configurations as KR-BERT and distributed it through GitHub. Then we analyzed the effects of using sentiment features in comparison to KR-BERT by observing their performance in language modeling during the training process and sentiment analysis tasks. Additionally, we determined how much each of the polarity and intensity features contributes to improving the model performance by separately organizing a model that utilizes each of the features, respectively. We obtained some increase in language modeling and sentiment analysis performances by using both the sentiment features, compared to other models with different feature composition. Here, we included the problems of binary positivity classification of movie reviews and hate speech detection on offensive comments as the sentiment analysis tasks. On the other hand, training these embedding models requires a lot of training time and hardware resources. Therefore, this study proposes a simple model fusing method that requires relatively little time. We trained a smaller-scaled sentiment-combined model consisting of a smaller number of encoder layers and attention heads and smaller hidden sizes for a few steps, combining it with an existing pre-trained BERT model. Since those pre-trained models are expected to function universally to handle various NLP problems based on good language modeling, this combination will allow two models with different advantages to interact and have better text processing capabilities. In this study, experiments on sentiment analysis problems have confirmed that combining the two models is efficient in training time and usage of hardware resources, while it can produce more accurate predictions than single models that do not include sentiment features.최근 트랜스포머 양방향 인코더 표현 (Bidirectional Encoder Representations from Transformers, BERT) 모델에 대한 관심이 높아지면서 자연어처리 분야에서 이에 기반한 연구 역시 활발히 이루어지고 있다. 이러한 문장 단위의 임베딩을 위한 모델들은 보통 학습 과정에서 문장 내 어휘, 통사, 의미 정보를 포착하여 모델링한다고 알려져 있다. 따라서 ELMo, GPT, BERT 등은 그 자체가 다양한 자연어처리 문제를 해결할 수 있는 보편적인 모델로서 기능한다. 본 연구는 한국어 자료로 학습한 단일 언어 BERT 모델을 제안한다. 가장 먼저 공개된 한국어를 다룰 수 있는 BERT 모델은 Google Research의 multilingual BERT (M-BERT)였다. 이는 한국어와 영어를 포함하여 104개 언어로 구성된 학습 데이터와 어휘 목록을 가지고 학습한 모델이며, 모델 하나로 포함된 모든 언어의 텍스트를 처리할 수 있다. 그러나 이는 그 다중언어성이 갖는 장점에도 불구하고, 각 언어의 특성을 충분히 반영하지 못하여 단일 언어 모델보다 각 언어의 텍스트 처리 성능이 낮다는 단점을 보인다. 본 연구는 그러한 단점들을 완화하면서 텍스트에 포함되어 있는 언어 정보를 보다 잘 포착할 수 있도록 구성된 데이터와 어휘 목록을 이용하여 모델을 구축하고자 하였다. 따라서 본 연구에서는 한국어 Wikipedia 텍스트와 뉴스 기사로 구성된 데이터를 이용하여 KR-BERT 모델을 구현하고, 이를 GitHub을 통해 공개하여 한국어 정보처리를 위해 사용될 수 있도록 하였다. 또한 해당 학습 데이터에 댓글 데이터와 법조문과 판결문을 덧붙여 확장한 텍스트에 기반해서 다시 KR-BERT-MEDIUM 모델을 학습하였다. 이 모델은 해당 학습 데이터로부터 WordPiece 알고리즘을 이용해 구성한 한글 중심의 토큰 목록을 사전으로 이용하였다. 이들 모델은 개체명 인식, 질의응답, 문장 유사도 판단, 감정 분석 등의 다양한 한국어 자연어처리 문제에 적용되어 우수한 성능을 보고했다. 또한 본 연구에서는 BERT 모델에 감정 자질을 추가하여 그것이 감정 분석에 특화된 모델로서 확장된 기능을 하도록 하였다. 감정 자질을 포함하여 별도의 임베딩 모델을 학습시켰는데, 이때 감정 자질은 문장 내의 각 토큰에 한국어 감정 분석 코퍼스 (KOSAC)에 대응하는 감정 극성(polarity)과 강도(intensity) 값을 부여한 것이다. 각 토큰에 부여된 자질은 그 자체로 극성 임베딩과 강도 임베딩을 구성하고, BERT가 기본으로 하는 토큰 임베딩에 더해진다. 이렇게 만들어진 임베딩을 학습한 것이 감정 자질 모델(sentiment-combined model)이 된다. KR-BERT와 같은 학습 데이터와 모델 구성을 유지하면서 감정 자질을 결합한 모델인 KR-BERT-KOSAC를 구현하고, 이를 GitHub을 통해 배포하였다. 또한 그로부터 학습 과정 내 언어 모델링과 감정 분석 과제에서의 성능을 얻은 뒤 KR-BERT와 비교하여 감정 자질 추가의 효과를 살펴보았다. 또한 감정 자질 중 극성과 강도 값을 각각 적용한 모델을 별도 구성하여 각 자질이 모델 성능 향상에 얼마나 기여하는지도 확인하였다. 이를 통해 두 가지 감정 자질을 모두 추가한 경우에, 그렇지 않은 다른 모델들에 비하여 언어 모델링이나 감정 분석 문제에서 성능이 어느 정도 향상되는 것을 관찰할 수 있었다. 이때 감정 분석 문제로는 영화평의 긍부정 여부 분류와 댓글의 악플 여부 분류를 포함하였다. 그런데 위와 같은 임베딩 모델을 사전학습하는 것은 많은 시간과 하드웨어 등의 자원을 요구한다. 따라서 본 연구에서는 비교적 적은 시간과 자원을 사용하는 간단한 모델 결합 방법을 제시한다. 적은 수의 인코더 레이어, 어텐션 헤드, 적은 임베딩 차원 수로 구성한 감정 자질 모델을 적은 스텝 수까지만 학습하고, 이를 기존에 큰 규모로 사전학습되어 있는 임베딩 모델과 결합한다. 기존의 사전학습모델에는 충분한 언어 모델링을 통해 다양한 언어 처리 문제를 처리할 수 있는 보편적인 기능이 기대되므로, 이러한 결합은 서로 다른 장점을 갖는 두 모델이 상호작용하여 더 우수한 자연어처리 능력을 갖도록 할 것이다. 본 연구에서는 감정 분석 문제들에 대한 실험을 통해 두 가지 모델의 결합이 학습 시간에 있어 효율적이면서도, 감정 자질을 더하지 않은 모델보다 더 정확한 예측을 할 수 있다는 것을 확인하였다.1 Introduction 1 1.1 Objectives 3 1.2 Contribution 9 1.3 Dissertation Structure 10 2 Related Work 13 2.1 Language Modeling and the Attention Mechanism 13 2.2 BERT-based Models 16 2.2.1 BERT and Variation Models 16 2.2.2 Korean-Specific BERT Models 19 2.2.3 Task-Specific BERT Models 22 2.3 Sentiment Analysis 24 2.4 Chapter Summary 30 3 BERT Architecture and Evaluations 33 3.1 Bidirectional Encoder Representations from Transformers (BERT) 33 3.1.1 Transformers and the Multi-Head Self-Attention Mechanism 34 3.1.2 Tokenization and Embeddings of BERT 39 3.1.3 Training and Fine-Tuning BERT 42 3.2 Evaluation of BERT 47 3.2.1 NLP Tasks 47 3.2.2 Metrics 50 3.3 Chapter Summary 52 4 Pre-Training of Korean BERT-based Model 55 4.1 The Need for a Korean Monolingual Model 55 4.2 Pre-Training Korean-specific BERT Model 58 4.3 Chapter Summary 70 5 Performances of Korean-Specific BERT Models 71 5.1 Task Datasets 71 5.1.1 Named Entity Recognition 71 5.1.2 Question Answering 73 5.1.3 Natural Language Inference 74 5.1.4 Semantic Textual Similarity 78 5.1.5 Sentiment Analysis 80 5.2 Experiments 81 5.2.1 Experiment Details 81 5.2.2 Task Results 83 5.3 Chapter Summary 89 6 An Extended Study to Sentiment Analysis 91 6.1 Sentiment Features 91 6.1.1 Sources of Sentiment Features 91 6.1.2 Assigning Prior Sentiment Values 94 6.2 Composition of Sentiment Embeddings 103 6.3 Training the Sentiment-Combined Model 109 6.4 Effect of Sentiment Features 113 6.5 Chapter Summary 121 7 Combining Two BERT Models 123 7.1 External Fusing Method 123 7.2 Experiments and Results 130 7.3 Chapter Summary 135 8 Conclusion 137 8.1 Summary of Contribution and Results 138 8.1.1 Construction of Korean Pre-trained BERT Models 138 8.1.2 Construction of a Sentiment-Combined Model 138 8.1.3 External Fusing of Two Pre-Trained Models to Gain Performance and Cost Advantages 139 8.2 Future Directions and Open Problems 140 8.2.1 More Training of KR-BERT-MEDIUM for Convergence of Performance 140 8.2.2 Observation of Changes Depending on the Domain of Training Data 141 8.2.3 Overlap of Sentiment Features with Linguistic Knowledge that BERT Learns 142 8.2.4 The Specific Process of Sentiment Features Helping the Language Modeling of BERT is Unknown 143 Bibliography 145 Appendices 157 A. Python Sources 157 A.1 Construction of Polarity and Intensity Embeddings 157 A.2 External Fusing of Different Pre-Trained Models 158 B. Examples of Experiment Outputs 162 C. Model Releases through GitHub 165Docto

SNU Open Repository and Archive

Role of sentiment classification in sentiment analysis: a survey

Author: J Prabhu
M R Pavan Kumar
Publication venue: Annals of Library and Information Studies (ALIS)
Publication date: 06/11/2018
Field of study

Through a survey of literature, the role of sentiment classification in sentiment analysis has been reviewed. The review identifies the research challenges involved in tackling sentiment classification. A total of 68 articles during 2015 – 2017 have been reviewed on six dimensions viz., sentiment classification, feature extraction, cross-lingual sentiment classification, cross-domain sentiment classification, lexica and corpora creation and multi-label sentiment classification. This study discusses the prominence and effects of sentiment classification in sentiment evaluation and a lot of further research needs to be done for productive results

Online Publishing @ NISCAIR

What attracts vehicle consumers’ buying:A Saaty scale-based VIKOR (SSC-VIKOR) approach from after-sales textual perspective?

Author: He Yandong
Lim Ming
Pratap Saurabh
Zhou Fuli
Publication venue: 'Emerald'
Publication date: 01/01/2019
Field of study

Purpose: The increasingly booming e-commerce development has stimulated vehicle consumers to express individual reviews through online forum. The purpose of this paper is to probe into the vehicle consumer consumption behavior and make recommendations for potential consumers from textual comments viewpoint. Design/methodology/approach: A big data analytic-based approach is designed to discover vehicle consumer consumption behavior from online perspective. To reduce subjectivity of expert-based approaches, a parallel Naïve Bayes approach is designed to analyze the sentiment analysis, and the Saaty scale-based (SSC) scoring rule is employed to obtain specific sentimental value of attribute class, contributing to the multi-grade sentiment classification. To achieve the intelligent recommendation for potential vehicle customers, a novel SSC-VIKOR approach is developed to prioritize vehicle brand candidates from a big data analytical viewpoint. Findings: The big data analytics argue that “cost-effectiveness” characteristic is the most important factor that vehicle consumers care, and the data mining results enable automakers to better understand consumer consumption behavior. Research limitations/implications: The case study illustrates the effectiveness of the integrated method, contributing to much more precise operations management on marketing strategy, quality improvement and intelligent recommendation. Originality/value: Researches of consumer consumption behavior are usually based on survey-based methods, and mostly previous studies about comments analysis focus on binary analysis. The hybrid SSC-VIKOR approach is developed to fill the gap from the big data perspective

Coventry University Pure Portal

Enlighten

Entity-sensitive attention and fusion network for entity-level multimodal sentiment classification

Author: JIANG Jing
YU Jianfei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

National Research Foundation (NRF) Singapor

Institutional Knowledge at Singapore Management University

Positionless aspect based sentiment analysis using attention mechanism.

Author: Goodwin Morten
Granmo Ole-Christoffer
Lei Jiao
Yadav Rohan Kumar
Publication venue: 'Elsevier BV'
Publication date: 01/01/2021
Field of study

publishedVersio

Agder University Research Archive

A Deep Multi-Level Attentive network for Multimodal Sentiment Analysis

Author: Vishwakarma Dinesh Kumar
Yadav Ashima
Publication venue
Publication date: 15/12/2020
Field of study

Multimodal sentiment analysis has attracted increasing attention with broad application prospects. The existing methods focuses on single modality, which fails to capture the social media content for multiple modalities. Moreover, in multi-modal learning, most of the works have focused on simply combining the two modalities, without exploring the complicated correlations between them. This resulted in dissatisfying performance for multimodal sentiment classification. Motivated by the status quo, we propose a Deep Multi-Level Attentive network, which exploits the correlation between image and text modalities to improve multimodal learning. Specifically, we generate the bi-attentive visual map along the spatial and channel dimensions to magnify CNNs representation power. Then we model the correlation between the image regions and semantics of the word by extracting the textual features related to the bi-attentive visual features by applying semantic attention. Finally, self-attention is employed to automatically fetch the sentiment-rich multimodal features for the classification. We conduct extensive evaluations on four real-world datasets, namely, MVSA-Single, MVSA-Multiple, Flickr, and Getty Images, which verifies the superiority of our method.Comment: 11 pages, 7 figure

arXiv.org e-Print Archive

Multimodal sentiment analysis in real-life videos

Author: Stappen Lukas
Publication venue
Publication date: 24/11/2022
Field of study

This thesis extends the emerging field of multimodal sentiment analysis of real-life videos, taking two components into consideration: the emotion and the emotion's target. The emotion component of media is traditionally represented as a segment-based intensity model of emotion classes. This representation is replaced here by a value- and time-continuous view. Adjacent research fields, such as affective computing, have largely neglected the linguistic information available from automatic transcripts of audio-video material. As is demonstrated here, this text modality is well-suited for time- and value-continuous prediction. Moreover, source-specific problems, such as trustworthiness, have been largely unexplored so far. This work examines perceived trustworthiness of the source, and its quantification, in user-generated video data and presents a possible modelling path. Furthermore, the transfer between the continuous and discrete emotion representations is explored in order to summarise the emotional context at a segment level. The other component deals with the target of the emotion, for example, the topic the speaker is addressing. Emotion targets in a video dataset can, as is shown here, be coherently extracted based on automatic transcripts without limiting a priori parameters, such as the expected number of targets. Furthermore, alternatives to purely linguistic investigation in predicting targets, such as knowledge-bases and multimodal systems, are investigated. A new dataset is designed for this investigation, and, in conjunction with proposed novel deep neural networks, extensive experiments are conducted to explore the components described above. The developed systems show robust prediction results and demonstrate strengths of the respective modalities, feature sets, and modelling techniques. Finally, foundations are laid for cross-modal information prediction systems with applications to the correction of corrupted in-the-wild signals from real-life videos

OPUS Augsburg