1,507 research outputs found
The Business Impact of Social Media - Sentiment Analysis Approach -
์ด ์ฐ๊ตฌ์ ๋ชฉ์ ์ ์์
๋ฏธ๋์ด์์ ์ถ์ถ๋ 7๊ฐ์ ๊ฐ์ฑ ๋๋ฉ์ธ์ด ์๋์ฐจ ์์ฅ ์ ์ ์จ ์์ธก์ ๋ํ ๊ฐ์ฑ ๋ถ์ ์คํ์ ์ํ ๋ฐ์ดํฐ๋ก์ ์ ํฉํ ์ง์ ๋ํ ์ ๋ขฐ์ฑ์ ํ์ธํ๊ณ ๊ณ ๊ฐ๋ค์ ์๊ฒฌ์ด ๊ธฐ์
์ ์ฑ๊ณผ์ ์ด๋ป๊ฒ ์ํฅ์ ๋ฏธ์น๋ ์ง์ ๋ํ์ฌ ํ์ธํ๊ธฐ ์ํ ๊ฒ์ด๋ค. ๋ณธ ์ฐ๊ตฌ๋ ์ด3๋จ๊ณ์ ๊ฑธ์ณ์ ์งํ๋์์ต๋๋ค. ์ฒซ ๋ฒ์งธ ๋จ๊ณ๋ ๊ฐ์ฑ์ฌ์ ๊ตฌ์ถ์ ๋จ๊ณ๋ก์ 2013๋
1์ 1์ผ๋ถํฐ 2015๋
12์ 31์ผ๊น์ง ๋ฏธ๊ตญ ๋ด 26๊ฐ์ ์๋์ฐจ ์ ์กฐ ํ์ฌ์ ๊ณ ๊ฐ์ ์๋ฆฌ (VOC: Voice of the Customer) ์ด 45,447๊ฐ๋ฅผ ์๋์ฐจ ์ปค๋ฎค๋ํฐ๋ก๋ถํฐ ํฌ๋กค๋ง (crawling)ํ์ฌ POS (Part-of-Speech) ์ฆ ํ์ฌ์ ๋ณด๋ฅผ ์ถ์ถํ๋ ํ๊น
(tagging)๊ณผ์ ์ ๊ฑฐ์ณ ๋ถ์ ์ , ๊ธ์ ์ ๊ฐ์ฑ์ ๋น๋์๋ฅผ ์ธก์ ํ์ฌ ๊ฐ์ฑ์ฌ์ ์ ๊ตฌ์ถํ์๊ณ , ์ด์ ๋ํ ๊ทน์ฑ์ ์ธก์ ํ์ฌ 7๊ฐ์ ๊ฐ์ฑ๋๋ฉ์ธ์ ๋ง๋ค์์ต๋๋ค. ๋ ๋ฒ์งธ ๋จ๊ณ๋ ๋ฐ์ดํฐ์ ๋ํ ์ ๋ขฐ์ฑ ๋ถ์์ ๋จ๊ณ๋ก์ ์๊ธฐ์๊ด๊ด๊ณ๋ถ์ (Auto-correlation Analysis)๊ณผ ์ฃผ์ฑ๋ถ๋ถ์ (PCA: Principal Component Analysis)์ ํตํด ๋ฐ์ดํฐ๊ฐ ์คํ์ ์ ํฉํ์ง๋ฅผ ๊ฒ์ฆํ์๋ค. ์ธ ๋ฒ์งธ ๋จ๊ณ์์๋ 2๊ฐ์ ์ ํํ๊ท๋ถ์ ๋ชจ๋ธ๋ก 7๊ฐ์ ๊ฐ์ฑ์์ญ์ด ๋ฏธ๊ตญ๋ด ์๋์ฐจ ์ ์กฐ ํ์ฌ ์ค GM, ํฌ๋, FCA, ํญ์ค๋ฐ๊ฒ ๋ฑ ์ด 4๊ฐ์ ์๋์ฐจ ์์ฐ ๊ธฐ์
์ ์ ์ ํ์ฌ ์ด๋ค ๊ธฐ์
์ ์ฑ๊ณผ ์ฆ, ์๋์ฐจ ์์ฅ์ ์ ์จ์ ์ด๋ค ์ํฅ์ ๋ฏธ์น๊ณ ์๋ ์ง ์คํํ์๋ค. ๊ทธ ๊ฒฐ๊ณผ, ์ฐ๋ฆฌ๋ 4,815๊ฐ์ ๋ถ์ ์ ์ธ ์ดํ๋ค๊ณผ 2,021๊ฐ์ ๊ธ์ ์ ์ธ ๊ฐ์ฑ์ดํ๋ค์ ์ถ์ถํ์ฌ ๊ฐ์ฑ์ฌ์ ์ ๊ตฌ์ถํ์์ผ๋ฉฐ, ๊ตฌ์ถ๋ ๊ฐ์ฑ์ฌ์ ์ ๋ฐํ์ผ๋ก, ์ถ์ถ๋๊ณ ๋ถ๋ฅ๋ ๋ถ์ ์ ์ด๊ณ ๊ธ์ ์ ์ธ ์ดํ๋ค์ ์๋์ฐจ ์ฐ์
์ ๊ด๋ จ๋ ์ดํ๋ค๊ณผ ์กฐํฉํ์๊ณ , ์๊ธฐ์๊ด๋ถ์๊ณผ PCA (์ฃผ์ฑ๋ถ ๋ถ์)๋ฅผ ํตํด ๊ฐ์ฑ์ ํน์ฑ์ ์กฐ์ฌํ์๋ค. ์คํ ๊ฒฐ๊ณผ์ ๋ฐ๋ฅด๋ฉด, ์๊ธฐ์๊ด๋ถ์์ ์ํด์ ๊ฐ์ฑ ๋ฐ์ดํฐ์ ์ด๋ค ์ผ์ ํ ํจํด์ด ์กด์ฌํ๋ค๋ ๊ฒ์ด ๋ฐ๊ฒฌ๋์๊ณ , ๊ฐ๊ฐ์ ๊ฐ์ฑ ์์ญ์ ๊ฐ์ฑ์ด ์๊ธฐ์๊ด์ฑ์ด ์์ผ๋ฉฐ, ๊ฐ์ฑ์ ์๊ณ์ด์ฑ ๋ํ ๊ด์ฐฐ๋์๋ค. PCA์ ์ํ ๊ฒฐ๊ณผ๋ก์, 7๊ฐ ๊ฐ์ฑ์์ญ์ด ๋ถ์ ์ฑ, ๊ธ์ ์ฑ, ์ค๋ฆฝ์ฑ์ ์ฃผ์ฑ๋ถ์ผ๋ก ์ฐ๊ฒฐ๋์ด ์์์ ํ์ธํ ์ ์์๋ค. ์๊ธฐ์๊ด๋ถ์๊ณผ PCA๋ฅผ ํตํ VOC ๊ฐ์ฑ ๋ฐ์ดํฐ์ ๋ํ ์ ๋ขฐ์ฑ์ ๋ฐํ์ผ๋ก 2๊ฐ์ ์ ํํ๊ท๋ถ์ ๋ชจ๋ธ์ ๊ตฌ์ถํ์ฌ ์คํ์ ์งํํ์๋ค. ์ฒซ ๋ฒ์งธ ๋ชจ๋ธ์ ์ฃผ์ฑ๋ถ ๋ถ์์์ ๋ถ์ ์ ๊ฐ์ฑ์ Sadness, Anger, Fear์ ๊ธ์ ์ ๊ฐ์ฑ๋๋ฉ์ธ์ธ Delight, Satisfaction์ ๋
๋ฆฝ๋ณ์๋ก ์ ์ ํ๊ณ , ์์ฅ์ ์ ์จ์ ์ข
์๋ณ์๋ก ์ ์ ํ์ฌ ์คํํ์๊ณ ๋ ๋ฒ์งธ ๋ชจ๋ธ์ ์ฒซ ๋ฒ์งธ ๋ชจ๋ธ์ ์ฃผ์ฑ๋ถ์ด ์ค๋ฆฝ์ฑ์ผ๋ก ๊ฒฐ๊ณผ๊ฐ ๋์จ Shame, Frustration์ ๋
๋ฆฝ๋ณ์์ ์ถ๊ฐํ์ฌ ์ค๋ฆฝ์ฑ์ ๋ ๊ณ ์๋ ๊ฐ์ฑ์ด ์์ฅ ์ ์ ์จ์ ์ ์๋ฏธํ ์ํฅ์ ๋ฏธ์น๊ณ ์๋ ์ง๋ฅผ ํ์ธํ์๋ค. ๋ถ์ ๊ฒฐ๊ณผ, ๊ฐ ๊ธฐ์
๋ง๋ค ์์ฅ์ ์ ์จ์ ์ ์๋ฏธํ ์ํฅ์ ๋ฏธ์น๋ ๊ฐ์ฑ๋ค์ด ์กด์ฌํ๊ณ ๋ชจ๋ธ 1๊ณผ, ๋ชจ๋ธ 2์์์ ๊ฐ์ฑ ์ํฅ๋ ฅ์ด ์ฐจ์ด๊ฐ ์์์ ๋ฐ๊ฒฌํ์๋ค. ๋ณธ ์ฐ๊ตฌ๋ฅผ ํตํด, ๋ฐ์ดํฐ ์์ ๋ํ๋ ์ ๋ณด๋ฅผ ๊ฐ์ง ๊ฐ์ฑ์ด ๊ณผ๊ฑฐ ๊ฐ์ ๊ธฐ์ดํ์ฌ ์๋์ฐจ ์์ฅ์์ ๋ณํ๋ฅผ ์๋ฐํ ์ ์๋ค๋ ๊ฒ์ ๋ํ๋ด๊ณ ์์์ ํ์ธํ์๋ค. ๋ํ, ์ฐ๋ฆฌ๊ฐ ์์ฅ ๋ฐ์ดํฐ์ ๊ฐ์ฉ์ฑ์ ์ ์ฉํ๋ ค๊ณ ํ ๋, ์๋์ฐจ ์์ฅ ๊ด๋ จ ์ ๋ณด๋ ๊ฐ์ฑ์ ์๊ธฐ์๊ด์ฑ์ ์ ํ์ฉํ ์ ์๋ค๋ฉด, ๊ฐ์ ๋ถ์์ ๋ํ ์ฐ๊ตฌ์ ํฐ ๊ธฐ์ฌ๋ฅผ ํ ์ ์์ ๋ฟ๋ง ์๋๋ผ, ์ค์ ์์ฅ์์์ ๋น์ง๋์ค ์ฑ๊ณผ์๋ ๋ค์ํ ๋ฐฉ๋ฒ์ผ๋ก ๊ธฐ์ฌํ ์ ์์ ๊ฒ์ผ๋ก ๊ธฐ๋๋๋ค.List of Tables iv
List of Figures v
Abstract 1
1. Introduction
1.1 Back Ground 3
1.2 Necessity of Study 6
1.3 Purpose & Questions 8
1.4 Structure 9
2. Literature Reviews of VOC Analysis
2.1 Importance of VOC 11
2.2 Data Mining 15
2.2.1 Concept & Functionalities 15
2.2.2 Methodologies of Data mining 20
2.3 Text Mining 24
2.4 Sentiment Analysis 26
2.5 Research Trend in Korea 30
3. Methodology
3.1 Research Flow 32
3.2 Proposed Methodologies 34
3.2.1 Sentiment Analysis 34
3.2.2 Auto-correlation Analysis 37
3.2.3 Principal Component Analysis (PCA) 38
3.2.4 Linear Regression 40
4. Experiment & Analysis
4.1 Phase I: Constructing Sentiment Lexicon & 7 Sentiment Domains 43
4.1.1 The Subject of Analysis & Crawling Data 43
4.1.2 Extracting POS Information 44
4.1.3 Review Extracting POS Information 46
4.2 Phase II : Reliability Analysis 49
4.2.1 Auto-correlation Analysis of Sentiment 51
4.2.2 Principal Component Analysis of Sentiment 55
4.3 Phase III : Influence on Automotive Market Share 58
4.3.1 Linear Regression Model 58
4.3.2 Definition of Variables 60
4.3.3 The Result of Linear Regression Analysis 62
5. Conclusion
5.1 Summary of Study 73
5.2 Managerial Implication and Limitation 75
5.3 Future Study 77
References 79Docto
Adaptive sentiment analysis
Domain dependency is one of the most challenging problems in the field of sentiment analysis. Although most sentiment analysis methods have decent performance if they are targeted at a specific domain and writing style, they do not usually work well with texts that are originated outside of their domain boundaries. Often there is a need to perform sentiment analysis in a domain where no labelled document is available. To address this
scenario, researchers have proposed many domain adaptation or unsupervised sentiment analysis methods. However, there is still much room for improvement, as those methods typically cannot match conventional supervised sentiment analysis methods. In this thesis, we propose a novel aspect-level sentiment analysis method that seamlessly integrates lexicon- and learning-based methods. While its performance is comparable to existing approaches, it is less sensitive to domain boundaries and can be applied to cross-domain sentiment analysis when the target domain is similar to the source domain. It also offers more structured and readable results by detecting individual topic aspects and determining their sentiment strengths. Furthermore, we investigate a novel approach to automatically constructing domain-specific sentiment lexicons based on distributed word representations (aka word embeddings). The induced lexicon has quality on a par with a handcrafted one and could be used directly in a lexiconbased algorithm for sentiment analysis, but we find that a two-stage bootstrapping strategy could further boost the sentiment classification performance. Compared to existing methods, such an end-to-end nearly-unsupervised approach to domain-specific sentiment analysis works out of the box for any target domain, requires no handcrafted lexicon or labelled corpus, and achieves sentiment classification accuracy comparable to that of fully supervised approaches. Overall, the contribution of this Ph.D. work to the research field of sentiment analysis is twofold. First, we develop a new sentiment analysis system which can โ in a nearlyunsupervised mannerโadapt to the domain at hand and perform sentiment analysis with minimal loss of performance. Second, we showcase this system in several areas (including finance, politics, and e-business), and investigate particularly the temporal
dynamics of sentiment in such contexts
A survey on deep learning in image polarity detection: Balancing generalization performances and computational costs
Deep convolutional neural networks (CNNs) provide an effective tool to extract complex information from images. In the area of image polarity detection, CNNs are customarily utilized in combination with transfer learning techniques to tackle a major problem: the unavailability of large sets of labeled data. Thus, polarity predictors in general exploit a pre-trained CNN as the feature extractor that in turn feeds a classification unit. While the latter unit is trained from scratch, the pre-trained CNN is subject to fine-tuning. As a result, the specific CNN architecture employed as the feature extractor strongly affects the overall performance of the model. This paper analyses state-of-the-art literature on image polarity detection and identifies the most reliable CNN architectures. Moreover, the paper provides an experimental protocol that should allow assessing the role played by the baseline architecture in the polarity detection task. Performance is evaluated in terms of both generalization abilities and computational complexity. The latter attribute becomes critical as polarity predictors, in the era of social networks, might need to be updated within hours or even minutes. In this regard, the paper gives practical hints on the advantages and disadvantages of the examined architectures both in terms of generalization and computational cost
Sentiment Analysis in the Era of Large Language Models: A Reality Check
Sentiment analysis (SA) has been a long-standing research area in natural
language processing. It can offer rich insights into human sentiments and
opinions and has thus seen considerable interest from both academia and
industry. With the advent of large language models (LLMs) such as ChatGPT,
there is a great potential for their employment on SA problems. However, the
extent to which existing LLMs can be leveraged for different sentiment analysis
tasks remains unclear. This paper aims to provide a comprehensive investigation
into the capabilities of LLMs in performing various sentiment analysis tasks,
from conventional sentiment classification to aspect-based sentiment analysis
and multifaceted analysis of subjective texts. We evaluate performance across
13 tasks on 26 datasets and compare the results against small language models
(SLMs) trained on domain-specific datasets. Our study reveals that while LLMs
demonstrate satisfactory performance in simpler tasks, they lag behind in more
complex tasks requiring deeper understanding or structured sentiment
information. However, LLMs significantly outperform SLMs in few-shot learning
settings, suggesting their potential when annotation resources are limited. We
also highlight the limitations of current evaluation practices in assessing
LLMs' SA abilities and propose a novel benchmark, \textsc{SentiEval}, for a
more comprehensive and realistic evaluation. Data and code during our
investigations are available at
\url{https://github.com/DAMO-NLP-SG/LLM-Sentiment}
Using Tsetlin Machine to discover interpretable rules in natural language processing applications
publishedVersio
Navigating the Environmental, Social, and Governance (ESG) landscape: constructing a robust and reliable scoring engine - insights into Data Source Selection, Indicator Determination, Weighting and Aggregation Techniques, and Validation Processes for Comprehensive ESG Scoring Systems
This white paper explores the construction of a reliable Environmental, Social, and Governance (ESG) scoring engine, with a focus on the importance of data sources and quality, selection of ESG indicators, weighting and aggregation methodologies, and the necessary validation and benchmarking procedures. The current challenges in ESG scoring and the importance of a robust ESG scoring system are addressed, citing its increasing relevance to stakeholders. Furthermore, different data types, namely self-reported data, third-party data, and alternative data, are critically evaluated for their respective merits and limitations. The paper further elucidates the complexities and implications involved in the choice of ESG indicators, illustrating the trade-offs between standardized and customized approaches. Various weighting methodologies including equal weighting, factor weighting, and multi-criteria decision analysis are dissected. The paper culminates in outlining processes for validating the ESG scoring engine, emphasizing the correlation with financial performance, and conducting robustness and sensitivity analyses. Practical examples through case studies exemplify the implementation of the discussed techniques. The white paper aims to provide insights and guidelines for practitioners, academics, and policy makers in designing and implementing robust ESG scoring systems.
This ESG white paper explores the interplay between Environmental, Social, and Governance (ESG) factors and green finance. We begin by defining ESG and green finance, exploring their evolution, and discussing their importance in financial markets. The paper emphasises the role of green finance in driving sustainable development. Next, we delve into the ESG scoring landscape. We outline various methodologies, key players in ESG ratings, and present challenges and criticisms of current ESG scoring systems. In the third section, we propose a blueprint for a reliable ESG scoring engine. This includes discussion on various data sources and the selection of ESG indicators, highlighting the role of materiality assessment, and the balance between standardized and customized indicators. We then discuss different methodologies for weighting and aggregating these indicators. The paper concludes with the necessity of validation and benchmarking of ESG scores, particularly correlating them with financial performance and performing robustness and sensitivity analyses
Real-Time Online Stock Forecasting Utilizing Integrated Quantitative and Qualitative Analysis
The application of Machine learning to finance has become a familiar
approach, even more so in stock market forecasting. The stock market is highly
volatile, and huge amounts of data are generated every minute globally. The
extraction of effective intelligence from this data is of critical importance.
However, a collaboration of numerical stock data with qualitative text data can
be a challenging task. In this work, we accomplish this by providing an
unprecedented, publicly available dataset with technical and fundamental data
and sentiment that we gathered from news archives, TV news captions, radio
transcripts, tweets, daily financial newspapers, etc. The text data entries
used for sentiment extraction total more than 1.4 Million. The dataset consists
of daily entries from January 2018 to December 2022 for eight companies
representing diverse industrial sectors and the Dow Jones Industrial Average
(DJIA) as a whole. Holistic Fundamental and Technical data is provided training
ready for Model learning and deployment. Most importantly, the data generated
could be used for incremental online learning with real-time data points
retrieved daily since no stagnant data was utilized. All the data was retired
from APIs or self-designed robust information retrieval technologies with
extremely low latency and zero monetary cost. These adaptable technologies
facilitate data extraction for any stock. Moreover, the utilization of
Spearman's rank correlation over real-time data, linking stock returns with
sentiment analysis has produced noteworthy results for the DJIA and the eight
other stocks, achieving accuracy levels surpassing 60%. The dataset is made
available at https://github.com/batking24/Huge-Stock-Dataset
Coping with low data availability for social media crisis message categorisation
During crisis situations, social media allows people to quickly share
information, including messages requesting help. This can be valuable to
emergency responders, who need to categorise and prioritise these messages
based on the type of assistance being requested. However, the high volume of
messages makes it difficult to filter and prioritise them without the use of
computational techniques. Fully supervised filtering techniques for crisis
message categorisation typically require a large amount of annotated training
data, but this can be difficult to obtain during an ongoing crisis and is
expensive in terms of time and labour to create.
This thesis focuses on addressing the challenge of low data availability when
categorising crisis messages for emergency response. It first presents domain
adaptation as a solution for this problem, which involves learning a
categorisation model from annotated data from past crisis events (source
domain) and adapting it to categorise messages from an ongoing crisis event
(target domain). In many-to-many adaptation, where the model is trained on
multiple past events and adapted to multiple ongoing events, a multi-task
learning approach is proposed using pre-trained language models. This approach
outperforms baselines and an ensemble approach further improves performance..
Clinical Document Classification Using Labeled and Unlabeled Data Across Hospitals
Reviewing radiology reports in emergency departments is an essential but
laborious task. Timely follow-up of patients with abnormal cases in their
radiology reports may dramatically affect the patient's outcome, especially if
they have been discharged with a different initial diagnosis. Machine learning
approaches have been devised to expedite the process and detect the cases that
demand instant follow up. However, these approaches require a large amount of
labeled data to train reliable predictive models. Preparing such a large
dataset, which needs to be manually annotated by health professionals, is
costly and time-consuming. This paper investigates a semi-supervised learning
framework for radiology report classification across three hospitals. The main
goal is to leverage clinical unlabeled data in order to augment the learning
process where limited labeled data is available. To further improve the
classification performance, we also integrate a transfer learning technique
into the semi-supervised learning pipeline . Our experimental findings show
that (1) convolutional neural networks (CNNs), while being independent of any
problem-specific feature engineering, achieve significantly higher
effectiveness compared to conventional supervised learning approaches, (2)
leveraging unlabeled data in training a CNN-based classifier reduces the
dependency on labeled data by more than 50% to reach the same performance of a
fully supervised CNN, and (3) transferring the knowledge gained from available
labeled data in an external source hospital significantly improves the
performance of a semi-supervised CNN model over their fully supervised
counterparts in a target hospital
- โฆ