Search CORE

1,507 research outputs found

The Business Impact of Social Media - Sentiment Analysis Approach -

Author: 김동원
Publication venue: 한국해양대학교
Publication date: 01/02/2017
Field of study

이 연구의 목적은 소셜 미디어에서 추출된 7개의 감성 도메인이 자동차 시장 점유율 예측에 대한 감성 분석 실험을 위한 데이터로서 적합한 지에 대한 신뢰성을 확인하고 고객들의 의견이 기업의 성과에 어떻게 영향을 미치는 지에 대하여 확인하기 위한 것이다. 본 연구는 총3단계에 걸쳐서 진행되었습니다. 첫 번째 단계는 감성사전 구축의 단계로서 2013년 1월 1일부터 2015년 12월 31일까지 미국 내 26개의 자동차 제조 회사의 고객의 소리 (VOC: Voice of the Customer) 총 45,447개를 자동차 커뮤니티로부터 크롤링 (crawling)하여 POS (Part-of-Speech) 즉 품사정보를 추출하는 태깅 (tagging)과정을 거쳐 부정적, 긍정적 감성의 빈도수를 측정하여 감성사전을 구축하였고, 이에 대한 극성을 측정하여 7개의 감성도메인을 만들었습니다. 두 번째 단계는 데이터에 대한 신뢰성 분석의 단계로서 자기상관관계분석 (Auto-correlation Analysis)과 주성분분석 (PCA: Principal Component Analysis)을 통해 데이터가 실험에 적합한지를 검증하였다. 세 번째 단계에서는 2개의 선형회귀분석 모델로 7개의 감성영역이 미국내 자동차 제조 회사 중 GM, 포드, FCA, 폭스바겐 등 총 4개의 자동차 생산 기업을 선정하여 이들 기업의 성과 즉, 자동차 시장점유율에 어떤 영향을 미치고 있는 지 실험하였다. 그 결과, 우리는 4,815개의 부정적인 어휘들과 2,021개의 긍정적인 감성어휘들을 추출하여 감성사전을 구축하였으며, 구축된 감성사전을 바탕으로, 추출되고 분류된 부정적이고 긍정적인 어휘들을 자동차 산업에 관련된 어휘들과 조합하였고, 자기상관분석과 PCA (주성분 분석)를 통해 감성의 특성을 조사하였다. 실험 결과에 따르면, 자기상관분석에 의해서 감성 데이터에 어떤 일정한 패턴이 존재한다는 것이 발견되었고, 각각의 감성 영역의 감성이 자기상관성이 있으며, 감성의 시계열성 또한 관찰되었다. PCA에 의한 결과로서, 7개 감성영역이 부정성, 긍정성, 중립성을 주성분으로 연결되어 있음을 확인할 수 있었다. 자기상관분석과 PCA를 통한 VOC 감성 데이터에 대한 신뢰성을 바탕으로 2개의 선형회귀분석 모델을 구축하여 실험을 진행하였다. 첫 번째 모델은 주성분 분석에서 부정적 감성의 Sadness, Anger, Fear와 긍정적 감성도메인인 Delight, Satisfaction을 독립변수로 선정하고, 시장점유율을 종속변수로 선정하여 실행하였고 두 번째 모델은 첫 번째 모델에 주성분이 중립성으로 결과가 나온 Shame, Frustration을 독립변수에 추가하여 중립성을 띠고 있는 감성이 시장 점유율에 유의미한 영향을 미치고 있는 지를 확인하였다. 분석 결과, 각 기업 마다 시장점유율에 유의미한 영향을 미치는 감성들이 존재하고 모델 1과, 모델 2에서의 감성 영향력이 차이가 있음을 발견하였다. 본 연구를 통해, 데이터 상에 나타난 정보를 가진 감성이 과거 값에 기초하여 자동차 시장에서 변화를 수반할 수 있다는 것을 나타내고 있음을 확인하였다. 또한, 우리가 시장 데이터의 가용성을 적용하려고 할 때, 자동차 시장 관련 정보나 감성의 자기상관성을 잘 활용할 수 있다면, 감정 분석에 대한 연구에 큰 기여를 할 수 있을 뿐만 아니라, 실제 시장에서의 비지니스 성과에도 다양한 방법으로 기여할 수 있을 것으로 기대된다.List of Tables iv List of Figures v Abstract 1 1. Introduction 1.1 Back Ground 3 1.2 Necessity of Study 6 1.3 Purpose & Questions 8 1.4 Structure 9 2. Literature Reviews of VOC Analysis 2.1 Importance of VOC 11 2.2 Data Mining 15 2.2.1 Concept & Functionalities 15 2.2.2 Methodologies of Data mining 20 2.3 Text Mining 24 2.4 Sentiment Analysis 26 2.5 Research Trend in Korea 30 3. Methodology 3.1 Research Flow 32 3.2 Proposed Methodologies 34 3.2.1 Sentiment Analysis 34 3.2.2 Auto-correlation Analysis 37 3.2.3 Principal Component Analysis (PCA) 38 3.2.4 Linear Regression 40 4. Experiment & Analysis 4.1 Phase I: Constructing Sentiment Lexicon & 7 Sentiment Domains 43 4.1.1 The Subject of Analysis & Crawling Data 43 4.1.2 Extracting POS Information 44 4.1.3 Review Extracting POS Information 46 4.2 Phase II : Reliability Analysis 49 4.2.1 Auto-correlation Analysis of Sentiment 51 4.2.2 Principal Component Analysis of Sentiment 55 4.3 Phase III : Influence on Automotive Market Share 58 4.3.1 Linear Regression Model 58 4.3.2 Definition of Variables 60 4.3.3 The Result of Linear Regression Analysis 62 5. Conclusion 5.1 Summary of Study 73 5.2 Managerial Implication and Limitation 75 5.3 Future Study 77 References 79Docto

한국해양대학교(KMOU)

Adaptive sentiment analysis

Author: Mudinas Andrius
Publication venue
Publication date
Field of study

Domain dependency is one of the most challenging problems in the field of sentiment analysis. Although most sentiment analysis methods have decent performance if they are targeted at a specific domain and writing style, they do not usually work well with texts that are originated outside of their domain boundaries. Often there is a need to perform sentiment analysis in a domain where no labelled document is available. To address this scenario, researchers have proposed many domain adaptation or unsupervised sentiment analysis methods. However, there is still much room for improvement, as those methods typically cannot match conventional supervised sentiment analysis methods. In this thesis, we propose a novel aspect-level sentiment analysis method that seamlessly integrates lexicon- and learning-based methods. While its performance is comparable to existing approaches, it is less sensitive to domain boundaries and can be applied to cross-domain sentiment analysis when the target domain is similar to the source domain. It also offers more structured and readable results by detecting individual topic aspects and determining their sentiment strengths. Furthermore, we investigate a novel approach to automatically constructing domain-specific sentiment lexicons based on distributed word representations (aka word embeddings). The induced lexicon has quality on a par with a handcrafted one and could be used directly in a lexiconbased algorithm for sentiment analysis, but we find that a two-stage bootstrapping strategy could further boost the sentiment classification performance. Compared to existing methods, such an end-to-end nearly-unsupervised approach to domain-specific sentiment analysis works out of the box for any target domain, requires no handcrafted lexicon or labelled corpus, and achieves sentiment classification accuracy comparable to that of fully supervised approaches. Overall, the contribution of this Ph.D. work to the research field of sentiment analysis is twofold. First, we develop a new sentiment analysis system which can — in a nearlyunsupervised manner—adapt to the domain at hand and perform sentiment analysis with minimal loss of performance. Second, we showcase this system in several areas (including finance, politics, and e-business), and investigate particularly the temporal dynamics of sentiment in such contexts

Birkbeck Institutional Research Online

A survey on deep learning in image polarity detection: Balancing generalization performances and computational costs

Author: Cambria E.
Gastaldo P.
Ragusa E.
Zunino R.
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

Deep convolutional neural networks (CNNs) provide an effective tool to extract complex information from images. In the area of image polarity detection, CNNs are customarily utilized in combination with transfer learning techniques to tackle a major problem: the unavailability of large sets of labeled data. Thus, polarity predictors in general exploit a pre-trained CNN as the feature extractor that in turn feeds a classification unit. While the latter unit is trained from scratch, the pre-trained CNN is subject to fine-tuning. As a result, the specific CNN architecture employed as the feature extractor strongly affects the overall performance of the model. This paper analyses state-of-the-art literature on image polarity detection and identifies the most reliable CNN architectures. Moreover, the paper provides an experimental protocol that should allow assessing the role played by the baseline architecture in the polarity detection task. Performance is evaluated in terms of both generalization abilities and computational complexity. The latter attribute becomes critical as polarity predictors, in the era of social networks, might need to be updated within hours or even minutes. In this regard, the paper gives practical hints on the advantages and disadvantages of the examined architectures both in terms of generalization and computational cost

Archivio istituzionale della ricerca - Università di Genova

Mining Ethos In Parliamentary Debate

Author: Duthie Rory William
Publication venue
Publication date: 01/01/2020
Field of study

University of Dundee Online Publications

Sentiment Analysis in the Era of Large Language Models: A Reality Check

Author: Bing Lidong
Deng Yue
Liu Bing
Pan Sinno Jialin
Zhang Wenxuan
Publication venue
Publication date: 24/05/2023
Field of study

Sentiment analysis (SA) has been a long-standing research area in natural language processing. It can offer rich insights into human sentiments and opinions and has thus seen considerable interest from both academia and industry. With the advent of large language models (LLMs) such as ChatGPT, there is a great potential for their employment on SA problems. However, the extent to which existing LLMs can be leveraged for different sentiment analysis tasks remains unclear. This paper aims to provide a comprehensive investigation into the capabilities of LLMs in performing various sentiment analysis tasks, from conventional sentiment classification to aspect-based sentiment analysis and multifaceted analysis of subjective texts. We evaluate performance across 13 tasks on 26 datasets and compare the results against small language models (SLMs) trained on domain-specific datasets. Our study reveals that while LLMs demonstrate satisfactory performance in simpler tasks, they lag behind in more complex tasks requiring deeper understanding or structured sentiment information. However, LLMs significantly outperform SLMs in few-shot learning settings, suggesting their potential when annotation resources are limited. We also highlight the limitations of current evaluation practices in assessing LLMs' SA abilities and propose a novel benchmark, \textsc{SentiEval}, for a more comprehensive and realistic evaluation. Data and code during our investigations are available at \url{https://github.com/DAMO-NLP-SG/LLM-Sentiment}

arXiv.org e-Print Archive

Using Tsetlin Machine to discover interpretable rules in natural language processing applications

Author: Goodwin Morten
Granmo Ole-Christoffer
Saha Rupsa
Publication venue: 'Wiley'
Publication date: 01/01/2021
Field of study

publishedVersio

Agder University Research Archive

Navigating the Environmental, Social, and Governance (ESG) landscape: constructing a robust and reliable scoring engine - insights into Data Source Selection, Indicator Determination, Weighting and Aggregation Techniques, and Validation Processes for Comprehensive ESG Scoring Systems

Author: Baals Lennart John
Hadji Misheva Branka
Koenigstein Nicole
Publication venue: F1000 Research
Publication date: 01/01/2023
Field of study

This white paper explores the construction of a reliable Environmental, Social, and Governance (ESG) scoring engine, with a focus on the importance of data sources and quality, selection of ESG indicators, weighting and aggregation methodologies, and the necessary validation and benchmarking procedures. The current challenges in ESG scoring and the importance of a robust ESG scoring system are addressed, citing its increasing relevance to stakeholders. Furthermore, different data types, namely self-reported data, third-party data, and alternative data, are critically evaluated for their respective merits and limitations. The paper further elucidates the complexities and implications involved in the choice of ESG indicators, illustrating the trade-offs between standardized and customized approaches. Various weighting methodologies including equal weighting, factor weighting, and multi-criteria decision analysis are dissected. The paper culminates in outlining processes for validating the ESG scoring engine, emphasizing the correlation with financial performance, and conducting robustness and sensitivity analyses. Practical examples through case studies exemplify the implementation of the discussed techniques. The white paper aims to provide insights and guidelines for practitioners, academics, and policy makers in designing and implementing robust ESG scoring systems. This ESG white paper explores the interplay between Environmental, Social, and Governance (ESG) factors and green finance. We begin by defining ESG and green finance, exploring their evolution, and discussing their importance in financial markets. The paper emphasises the role of green finance in driving sustainable development. Next, we delve into the ESG scoring landscape. We outline various methodologies, key players in ESG ratings, and present challenges and criticisms of current ESG scoring systems. In the third section, we propose a blueprint for a reliable ESG scoring engine. This includes discussion on various data sources and the selection of ESG indicators, highlighting the role of materiality assessment, and the balance between standardized and customized indicators. We then discuss different methodologies for weighting and aggregating these indicators. The paper concludes with the necessity of validation and benchmarking of ESG scores, particularly correlating them with financial performance and performing robustness and sensitivity analyses

Berner Fachhochschule: ARBOR

Real-Time Online Stock Forecasting Utilizing Integrated Quantitative and Qualitative Analysis

Author: Bathini Sai Akash
Cihan Dagli
Publication venue
Publication date: 02/01/2024
Field of study

The application of Machine learning to finance has become a familiar approach, even more so in stock market forecasting. The stock market is highly volatile, and huge amounts of data are generated every minute globally. The extraction of effective intelligence from this data is of critical importance. However, a collaboration of numerical stock data with qualitative text data can be a challenging task. In this work, we accomplish this by providing an unprecedented, publicly available dataset with technical and fundamental data and sentiment that we gathered from news archives, TV news captions, radio transcripts, tweets, daily financial newspapers, etc. The text data entries used for sentiment extraction total more than 1.4 Million. The dataset consists of daily entries from January 2018 to December 2022 for eight companies representing diverse industrial sectors and the Dow Jones Industrial Average (DJIA) as a whole. Holistic Fundamental and Technical data is provided training ready for Model learning and deployment. Most importantly, the data generated could be used for incremental online learning with real-time data points retrieved daily since no stagnant data was utilized. All the data was retired from APIs or self-designed robust information retrieval technologies with extremely low latency and zero monetary cost. These adaptable technologies facilitate data extraction for any stock. Moreover, the utilization of Spearman's rank correlation over real-time data, linking stock returns with sentiment analysis has produced noteworthy results for the DJIA and the eight other stocks, achieving accuracy levels surpassing 60%. The dataset is made available at https://github.com/batking24/Huge-Stock-Dataset

arXiv.org e-Print Archive

Coping with low data availability for social media crisis message categorisation

Author: Wang Congcong
Publication venue
Publication date: 26/05/2023
Field of study

During crisis situations, social media allows people to quickly share information, including messages requesting help. This can be valuable to emergency responders, who need to categorise and prioritise these messages based on the type of assistance being requested. However, the high volume of messages makes it difficult to filter and prioritise them without the use of computational techniques. Fully supervised filtering techniques for crisis message categorisation typically require a large amount of annotated training data, but this can be difficult to obtain during an ongoing crisis and is expensive in terms of time and labour to create. This thesis focuses on addressing the challenge of low data availability when categorising crisis messages for emergency response. It first presents domain adaptation as a solution for this problem, which involves learning a categorisation model from annotated data from past crisis events (source domain) and adapting it to categorise messages from an ongoing crisis event (target domain). In many-to-many adaptation, where the model is trained on multiple past events and adapted to multiple ongoing events, a multi-task learning approach is proposed using pre-trained language models. This approach outperforms baselines and an ensemble approach further improves performance..

arXiv.org e-Print Archive

Clinical Document Classification Using Labeled and Unlabeled Data Across Hospitals

Author: Chu Kevin
Hassanzadeh Hamed
Kholghi Mahnoosh
Nguyen Anthony
Publication venue
Publication date: 01/01/2018
Field of study

Reviewing radiology reports in emergency departments is an essential but laborious task. Timely follow-up of patients with abnormal cases in their radiology reports may dramatically affect the patient's outcome, especially if they have been discharged with a different initial diagnosis. Machine learning approaches have been devised to expedite the process and detect the cases that demand instant follow up. However, these approaches require a large amount of labeled data to train reliable predictive models. Preparing such a large dataset, which needs to be manually annotated by health professionals, is costly and time-consuming. This paper investigates a semi-supervised learning framework for radiology report classification across three hospitals. The main goal is to leverage clinical unlabeled data in order to augment the learning process where limited labeled data is available. To further improve the classification performance, we also integrate a transfer learning technique into the semi-supervised learning pipeline . Our experimental findings show that (1) convolutional neural networks (CNNs), while being independent of any problem-specific feature engineering, achieve significantly higher effectiveness compared to conventional supervised learning approaches, (2) leveraging unlabeled data in training a CNN-based classifier reduces the dependency on labeled data by more than 50% to reach the same performance of a fully supervised CNN, and (3) transferring the knowledge gained from available labeled data in an external source hospital significantly improves the performance of a semi-supervised CNN model over their fully supervised counterparts in a target hospital

arXiv.org e-Print Archive

Queensland University of Technology ePrints Archive