1,507 research outputs found

    The Business Impact of Social Media - Sentiment Analysis Approach -

    Get PDF
    ์ด ์—ฐ๊ตฌ์˜ ๋ชฉ์ ์€ ์†Œ์…œ ๋ฏธ๋””์–ด์—์„œ ์ถ”์ถœ๋œ 7๊ฐœ์˜ ๊ฐ์„ฑ ๋„๋ฉ”์ธ์ด ์ž๋™์ฐจ ์‹œ์žฅ ์ ์œ ์œจ ์˜ˆ์ธก์— ๋Œ€ํ•œ ๊ฐ์„ฑ ๋ถ„์„ ์‹คํ—˜์„ ์œ„ํ•œ ๋ฐ์ดํ„ฐ๋กœ์„œ ์ ํ•ฉํ•œ ์ง€์— ๋Œ€ํ•œ ์‹ ๋ขฐ์„ฑ์„ ํ™•์ธํ•˜๊ณ  ๊ณ ๊ฐ๋“ค์˜ ์˜๊ฒฌ์ด ๊ธฐ์—…์˜ ์„ฑ๊ณผ์— ์–ด๋–ป๊ฒŒ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ์ง€์— ๋Œ€ํ•˜์—ฌ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•œ ๊ฒƒ์ด๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ์ด3๋‹จ๊ณ„์— ๊ฑธ์ณ์„œ ์ง„ํ–‰๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„๋Š” ๊ฐ์„ฑ์‚ฌ์ „ ๊ตฌ์ถ•์˜ ๋‹จ๊ณ„๋กœ์„œ 2013๋…„ 1์›” 1์ผ๋ถ€ํ„ฐ 2015๋…„ 12์›” 31์ผ๊นŒ์ง€ ๋ฏธ๊ตญ ๋‚ด 26๊ฐœ์˜ ์ž๋™์ฐจ ์ œ์กฐ ํšŒ์‚ฌ์˜ ๊ณ ๊ฐ์˜ ์†Œ๋ฆฌ (VOC: Voice of the Customer) ์ด 45,447๊ฐœ๋ฅผ ์ž๋™์ฐจ ์ปค๋ฎค๋‹ˆํ‹ฐ๋กœ๋ถ€ํ„ฐ ํฌ๋กค๋ง (crawling)ํ•˜์—ฌ POS (Part-of-Speech) ์ฆ‰ ํ’ˆ์‚ฌ์ •๋ณด๋ฅผ ์ถ”์ถœํ•˜๋Š” ํƒœ๊น… (tagging)๊ณผ์ •์„ ๊ฑฐ์ณ ๋ถ€์ •์ , ๊ธ์ •์  ๊ฐ์„ฑ์˜ ๋นˆ๋„์ˆ˜๋ฅผ ์ธก์ •ํ•˜์—ฌ ๊ฐ์„ฑ์‚ฌ์ „์„ ๊ตฌ์ถ•ํ•˜์˜€๊ณ , ์ด์— ๋Œ€ํ•œ ๊ทน์„ฑ์„ ์ธก์ •ํ•˜์—ฌ 7๊ฐœ์˜ ๊ฐ์„ฑ๋„๋ฉ”์ธ์„ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค. ๋‘ ๋ฒˆ์งธ ๋‹จ๊ณ„๋Š” ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์‹ ๋ขฐ์„ฑ ๋ถ„์„์˜ ๋‹จ๊ณ„๋กœ์„œ ์ž๊ธฐ์ƒ๊ด€๊ด€๊ณ„๋ถ„์„ (Auto-correlation Analysis)๊ณผ ์ฃผ์„ฑ๋ถ„๋ถ„์„ (PCA: Principal Component Analysis)์„ ํ†ตํ•ด ๋ฐ์ดํ„ฐ๊ฐ€ ์‹คํ—˜์— ์ ํ•ฉํ•œ์ง€๋ฅผ ๊ฒ€์ฆํ•˜์˜€๋‹ค. ์„ธ ๋ฒˆ์งธ ๋‹จ๊ณ„์—์„œ๋Š” 2๊ฐœ์˜ ์„ ํ˜•ํšŒ๊ท€๋ถ„์„ ๋ชจ๋ธ๋กœ 7๊ฐœ์˜ ๊ฐ์„ฑ์˜์—ญ์ด ๋ฏธ๊ตญ๋‚ด ์ž๋™์ฐจ ์ œ์กฐ ํšŒ์‚ฌ ์ค‘ GM, ํฌ๋“œ, FCA, ํญ์Šค๋ฐ”๊ฒ ๋“ฑ ์ด 4๊ฐœ์˜ ์ž๋™์ฐจ ์ƒ์‚ฐ ๊ธฐ์—…์„ ์„ ์ •ํ•˜์—ฌ ์ด๋“ค ๊ธฐ์—…์˜ ์„ฑ๊ณผ ์ฆ‰, ์ž๋™์ฐจ ์‹œ์žฅ์ ์œ ์œจ์— ์–ด๋–ค ์˜ํ–ฅ์„ ๋ฏธ์น˜๊ณ  ์žˆ๋Š” ์ง€ ์‹คํ—˜ํ•˜์˜€๋‹ค. ๊ทธ ๊ฒฐ๊ณผ, ์šฐ๋ฆฌ๋Š” 4,815๊ฐœ์˜ ๋ถ€์ •์ ์ธ ์–ดํœ˜๋“ค๊ณผ 2,021๊ฐœ์˜ ๊ธ์ •์ ์ธ ๊ฐ์„ฑ์–ดํœ˜๋“ค์„ ์ถ”์ถœํ•˜์—ฌ ๊ฐ์„ฑ์‚ฌ์ „์„ ๊ตฌ์ถ•ํ•˜์˜€์œผ๋ฉฐ, ๊ตฌ์ถ•๋œ ๊ฐ์„ฑ์‚ฌ์ „์„ ๋ฐ”ํƒ•์œผ๋กœ, ์ถ”์ถœ๋˜๊ณ  ๋ถ„๋ฅ˜๋œ ๋ถ€์ •์ ์ด๊ณ  ๊ธ์ •์ ์ธ ์–ดํœ˜๋“ค์„ ์ž๋™์ฐจ ์‚ฐ์—…์— ๊ด€๋ จ๋œ ์–ดํœ˜๋“ค๊ณผ ์กฐํ•ฉํ•˜์˜€๊ณ , ์ž๊ธฐ์ƒ๊ด€๋ถ„์„๊ณผ PCA (์ฃผ์„ฑ๋ถ„ ๋ถ„์„)๋ฅผ ํ†ตํ•ด ๊ฐ์„ฑ์˜ ํŠน์„ฑ์„ ์กฐ์‚ฌํ•˜์˜€๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ์— ๋”ฐ๋ฅด๋ฉด, ์ž๊ธฐ์ƒ๊ด€๋ถ„์„์— ์˜ํ•ด์„œ ๊ฐ์„ฑ ๋ฐ์ดํ„ฐ์— ์–ด๋–ค ์ผ์ •ํ•œ ํŒจํ„ด์ด ์กด์žฌํ•œ๋‹ค๋Š” ๊ฒƒ์ด ๋ฐœ๊ฒฌ๋˜์—ˆ๊ณ , ๊ฐ๊ฐ์˜ ๊ฐ์„ฑ ์˜์—ญ์˜ ๊ฐ์„ฑ์ด ์ž๊ธฐ์ƒ๊ด€์„ฑ์ด ์žˆ์œผ๋ฉฐ, ๊ฐ์„ฑ์˜ ์‹œ๊ณ„์—ด์„ฑ ๋˜ํ•œ ๊ด€์ฐฐ๋˜์—ˆ๋‹ค. PCA์— ์˜ํ•œ ๊ฒฐ๊ณผ๋กœ์„œ, 7๊ฐœ ๊ฐ์„ฑ์˜์—ญ์ด ๋ถ€์ •์„ฑ, ๊ธ์ •์„ฑ, ์ค‘๋ฆฝ์„ฑ์„ ์ฃผ์„ฑ๋ถ„์œผ๋กœ ์—ฐ๊ฒฐ๋˜์–ด ์žˆ์Œ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ์ž๊ธฐ์ƒ๊ด€๋ถ„์„๊ณผ PCA๋ฅผ ํ†ตํ•œ VOC ๊ฐ์„ฑ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์‹ ๋ขฐ์„ฑ์„ ๋ฐ”ํƒ•์œผ๋กœ 2๊ฐœ์˜ ์„ ํ˜•ํšŒ๊ท€๋ถ„์„ ๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ•˜์—ฌ ์‹คํ—˜์„ ์ง„ํ–‰ํ•˜์˜€๋‹ค. ์ฒซ ๋ฒˆ์งธ ๋ชจ๋ธ์€ ์ฃผ์„ฑ๋ถ„ ๋ถ„์„์—์„œ ๋ถ€์ •์  ๊ฐ์„ฑ์˜ Sadness, Anger, Fear์™€ ๊ธ์ •์  ๊ฐ์„ฑ๋„๋ฉ”์ธ์ธ Delight, Satisfaction์„ ๋…๋ฆฝ๋ณ€์ˆ˜๋กœ ์„ ์ •ํ•˜๊ณ , ์‹œ์žฅ์ ์œ ์œจ์„ ์ข…์†๋ณ€์ˆ˜๋กœ ์„ ์ •ํ•˜์—ฌ ์‹คํ–‰ํ•˜์˜€๊ณ  ๋‘ ๋ฒˆ์งธ ๋ชจ๋ธ์€ ์ฒซ ๋ฒˆ์งธ ๋ชจ๋ธ์— ์ฃผ์„ฑ๋ถ„์ด ์ค‘๋ฆฝ์„ฑ์œผ๋กœ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜จ Shame, Frustration์„ ๋…๋ฆฝ๋ณ€์ˆ˜์— ์ถ”๊ฐ€ํ•˜์—ฌ ์ค‘๋ฆฝ์„ฑ์„ ๋ ๊ณ  ์žˆ๋Š” ๊ฐ์„ฑ์ด ์‹œ์žฅ ์ ์œ ์œจ์— ์œ ์˜๋ฏธํ•œ ์˜ํ–ฅ์„ ๋ฏธ์น˜๊ณ  ์žˆ๋Š” ์ง€๋ฅผ ํ™•์ธํ•˜์˜€๋‹ค. ๋ถ„์„ ๊ฒฐ๊ณผ, ๊ฐ ๊ธฐ์—… ๋งˆ๋‹ค ์‹œ์žฅ์ ์œ ์œจ์— ์œ ์˜๋ฏธํ•œ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ๊ฐ์„ฑ๋“ค์ด ์กด์žฌํ•˜๊ณ  ๋ชจ๋ธ 1๊ณผ, ๋ชจ๋ธ 2์—์„œ์˜ ๊ฐ์„ฑ ์˜ํ–ฅ๋ ฅ์ด ์ฐจ์ด๊ฐ€ ์žˆ์Œ์„ ๋ฐœ๊ฒฌํ•˜์˜€๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋ฅผ ํ†ตํ•ด, ๋ฐ์ดํ„ฐ ์ƒ์— ๋‚˜ํƒ€๋‚œ ์ •๋ณด๋ฅผ ๊ฐ€์ง„ ๊ฐ์„ฑ์ด ๊ณผ๊ฑฐ ๊ฐ’์— ๊ธฐ์ดˆํ•˜์—ฌ ์ž๋™์ฐจ ์‹œ์žฅ์—์„œ ๋ณ€ํ™”๋ฅผ ์ˆ˜๋ฐ˜ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ๋‚˜ํƒ€๋‚ด๊ณ  ์žˆ์Œ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋˜ํ•œ, ์šฐ๋ฆฌ๊ฐ€ ์‹œ์žฅ ๋ฐ์ดํ„ฐ์˜ ๊ฐ€์šฉ์„ฑ์„ ์ ์šฉํ•˜๋ ค๊ณ  ํ•  ๋•Œ, ์ž๋™์ฐจ ์‹œ์žฅ ๊ด€๋ จ ์ •๋ณด๋‚˜ ๊ฐ์„ฑ์˜ ์ž๊ธฐ์ƒ๊ด€์„ฑ์„ ์ž˜ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค๋ฉด, ๊ฐ์ • ๋ถ„์„์— ๋Œ€ํ•œ ์—ฐ๊ตฌ์— ํฐ ๊ธฐ์—ฌ๋ฅผ ํ•  ์ˆ˜ ์žˆ์„ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ์‹ค์ œ ์‹œ์žฅ์—์„œ์˜ ๋น„์ง€๋‹ˆ์Šค ์„ฑ๊ณผ์—๋„ ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ๊ธฐ์—ฌํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋œ๋‹ค.List of Tables iv List of Figures v Abstract 1 1. Introduction 1.1 Back Ground 3 1.2 Necessity of Study 6 1.3 Purpose & Questions 8 1.4 Structure 9 2. Literature Reviews of VOC Analysis 2.1 Importance of VOC 11 2.2 Data Mining 15 2.2.1 Concept & Functionalities 15 2.2.2 Methodologies of Data mining 20 2.3 Text Mining 24 2.4 Sentiment Analysis 26 2.5 Research Trend in Korea 30 3. Methodology 3.1 Research Flow 32 3.2 Proposed Methodologies 34 3.2.1 Sentiment Analysis 34 3.2.2 Auto-correlation Analysis 37 3.2.3 Principal Component Analysis (PCA) 38 3.2.4 Linear Regression 40 4. Experiment & Analysis 4.1 Phase I: Constructing Sentiment Lexicon & 7 Sentiment Domains 43 4.1.1 The Subject of Analysis & Crawling Data 43 4.1.2 Extracting POS Information 44 4.1.3 Review Extracting POS Information 46 4.2 Phase II : Reliability Analysis 49 4.2.1 Auto-correlation Analysis of Sentiment 51 4.2.2 Principal Component Analysis of Sentiment 55 4.3 Phase III : Influence on Automotive Market Share 58 4.3.1 Linear Regression Model 58 4.3.2 Definition of Variables 60 4.3.3 The Result of Linear Regression Analysis 62 5. Conclusion 5.1 Summary of Study 73 5.2 Managerial Implication and Limitation 75 5.3 Future Study 77 References 79Docto

    Adaptive sentiment analysis

    Get PDF
    Domain dependency is one of the most challenging problems in the field of sentiment analysis. Although most sentiment analysis methods have decent performance if they are targeted at a specific domain and writing style, they do not usually work well with texts that are originated outside of their domain boundaries. Often there is a need to perform sentiment analysis in a domain where no labelled document is available. To address this scenario, researchers have proposed many domain adaptation or unsupervised sentiment analysis methods. However, there is still much room for improvement, as those methods typically cannot match conventional supervised sentiment analysis methods. In this thesis, we propose a novel aspect-level sentiment analysis method that seamlessly integrates lexicon- and learning-based methods. While its performance is comparable to existing approaches, it is less sensitive to domain boundaries and can be applied to cross-domain sentiment analysis when the target domain is similar to the source domain. It also offers more structured and readable results by detecting individual topic aspects and determining their sentiment strengths. Furthermore, we investigate a novel approach to automatically constructing domain-specific sentiment lexicons based on distributed word representations (aka word embeddings). The induced lexicon has quality on a par with a handcrafted one and could be used directly in a lexiconbased algorithm for sentiment analysis, but we find that a two-stage bootstrapping strategy could further boost the sentiment classification performance. Compared to existing methods, such an end-to-end nearly-unsupervised approach to domain-specific sentiment analysis works out of the box for any target domain, requires no handcrafted lexicon or labelled corpus, and achieves sentiment classification accuracy comparable to that of fully supervised approaches. Overall, the contribution of this Ph.D. work to the research field of sentiment analysis is twofold. First, we develop a new sentiment analysis system which can โ€” in a nearlyunsupervised mannerโ€”adapt to the domain at hand and perform sentiment analysis with minimal loss of performance. Second, we showcase this system in several areas (including finance, politics, and e-business), and investigate particularly the temporal dynamics of sentiment in such contexts

    A survey on deep learning in image polarity detection: Balancing generalization performances and computational costs

    Get PDF
    Deep convolutional neural networks (CNNs) provide an effective tool to extract complex information from images. In the area of image polarity detection, CNNs are customarily utilized in combination with transfer learning techniques to tackle a major problem: the unavailability of large sets of labeled data. Thus, polarity predictors in general exploit a pre-trained CNN as the feature extractor that in turn feeds a classification unit. While the latter unit is trained from scratch, the pre-trained CNN is subject to fine-tuning. As a result, the specific CNN architecture employed as the feature extractor strongly affects the overall performance of the model. This paper analyses state-of-the-art literature on image polarity detection and identifies the most reliable CNN architectures. Moreover, the paper provides an experimental protocol that should allow assessing the role played by the baseline architecture in the polarity detection task. Performance is evaluated in terms of both generalization abilities and computational complexity. The latter attribute becomes critical as polarity predictors, in the era of social networks, might need to be updated within hours or even minutes. In this regard, the paper gives practical hints on the advantages and disadvantages of the examined architectures both in terms of generalization and computational cost

    Sentiment Analysis in the Era of Large Language Models: A Reality Check

    Full text link
    Sentiment analysis (SA) has been a long-standing research area in natural language processing. It can offer rich insights into human sentiments and opinions and has thus seen considerable interest from both academia and industry. With the advent of large language models (LLMs) such as ChatGPT, there is a great potential for their employment on SA problems. However, the extent to which existing LLMs can be leveraged for different sentiment analysis tasks remains unclear. This paper aims to provide a comprehensive investigation into the capabilities of LLMs in performing various sentiment analysis tasks, from conventional sentiment classification to aspect-based sentiment analysis and multifaceted analysis of subjective texts. We evaluate performance across 13 tasks on 26 datasets and compare the results against small language models (SLMs) trained on domain-specific datasets. Our study reveals that while LLMs demonstrate satisfactory performance in simpler tasks, they lag behind in more complex tasks requiring deeper understanding or structured sentiment information. However, LLMs significantly outperform SLMs in few-shot learning settings, suggesting their potential when annotation resources are limited. We also highlight the limitations of current evaluation practices in assessing LLMs' SA abilities and propose a novel benchmark, \textsc{SentiEval}, for a more comprehensive and realistic evaluation. Data and code during our investigations are available at \url{https://github.com/DAMO-NLP-SG/LLM-Sentiment}

    Navigating the Environmental, Social, and Governance (ESG) landscape: constructing a robust and reliable scoring engine - insights into Data Source Selection, Indicator Determination, Weighting and Aggregation Techniques, and Validation Processes for Comprehensive ESG Scoring Systems

    Get PDF
    This white paper explores the construction of a reliable Environmental, Social, and Governance (ESG) scoring engine, with a focus on the importance of data sources and quality, selection of ESG indicators, weighting and aggregation methodologies, and the necessary validation and benchmarking procedures. The current challenges in ESG scoring and the importance of a robust ESG scoring system are addressed, citing its increasing relevance to stakeholders. Furthermore, different data types, namely self-reported data, third-party data, and alternative data, are critically evaluated for their respective merits and limitations. The paper further elucidates the complexities and implications involved in the choice of ESG indicators, illustrating the trade-offs between standardized and customized approaches. Various weighting methodologies including equal weighting, factor weighting, and multi-criteria decision analysis are dissected. The paper culminates in outlining processes for validating the ESG scoring engine, emphasizing the correlation with financial performance, and conducting robustness and sensitivity analyses. Practical examples through case studies exemplify the implementation of the discussed techniques. The white paper aims to provide insights and guidelines for practitioners, academics, and policy makers in designing and implementing robust ESG scoring systems. This ESG white paper explores the interplay between Environmental, Social, and Governance (ESG) factors and green finance. We begin by defining ESG and green finance, exploring their evolution, and discussing their importance in financial markets. The paper emphasises the role of green finance in driving sustainable development. Next, we delve into the ESG scoring landscape. We outline various methodologies, key players in ESG ratings, and present challenges and criticisms of current ESG scoring systems. In the third section, we propose a blueprint for a reliable ESG scoring engine. This includes discussion on various data sources and the selection of ESG indicators, highlighting the role of materiality assessment, and the balance between standardized and customized indicators. We then discuss different methodologies for weighting and aggregating these indicators. The paper concludes with the necessity of validation and benchmarking of ESG scores, particularly correlating them with financial performance and performing robustness and sensitivity analyses

    Real-Time Online Stock Forecasting Utilizing Integrated Quantitative and Qualitative Analysis

    Full text link
    The application of Machine learning to finance has become a familiar approach, even more so in stock market forecasting. The stock market is highly volatile, and huge amounts of data are generated every minute globally. The extraction of effective intelligence from this data is of critical importance. However, a collaboration of numerical stock data with qualitative text data can be a challenging task. In this work, we accomplish this by providing an unprecedented, publicly available dataset with technical and fundamental data and sentiment that we gathered from news archives, TV news captions, radio transcripts, tweets, daily financial newspapers, etc. The text data entries used for sentiment extraction total more than 1.4 Million. The dataset consists of daily entries from January 2018 to December 2022 for eight companies representing diverse industrial sectors and the Dow Jones Industrial Average (DJIA) as a whole. Holistic Fundamental and Technical data is provided training ready for Model learning and deployment. Most importantly, the data generated could be used for incremental online learning with real-time data points retrieved daily since no stagnant data was utilized. All the data was retired from APIs or self-designed robust information retrieval technologies with extremely low latency and zero monetary cost. These adaptable technologies facilitate data extraction for any stock. Moreover, the utilization of Spearman's rank correlation over real-time data, linking stock returns with sentiment analysis has produced noteworthy results for the DJIA and the eight other stocks, achieving accuracy levels surpassing 60%. The dataset is made available at https://github.com/batking24/Huge-Stock-Dataset

    Coping with low data availability for social media crisis message categorisation

    Full text link
    During crisis situations, social media allows people to quickly share information, including messages requesting help. This can be valuable to emergency responders, who need to categorise and prioritise these messages based on the type of assistance being requested. However, the high volume of messages makes it difficult to filter and prioritise them without the use of computational techniques. Fully supervised filtering techniques for crisis message categorisation typically require a large amount of annotated training data, but this can be difficult to obtain during an ongoing crisis and is expensive in terms of time and labour to create. This thesis focuses on addressing the challenge of low data availability when categorising crisis messages for emergency response. It first presents domain adaptation as a solution for this problem, which involves learning a categorisation model from annotated data from past crisis events (source domain) and adapting it to categorise messages from an ongoing crisis event (target domain). In many-to-many adaptation, where the model is trained on multiple past events and adapted to multiple ongoing events, a multi-task learning approach is proposed using pre-trained language models. This approach outperforms baselines and an ensemble approach further improves performance..

    Clinical Document Classification Using Labeled and Unlabeled Data Across Hospitals

    Full text link
    Reviewing radiology reports in emergency departments is an essential but laborious task. Timely follow-up of patients with abnormal cases in their radiology reports may dramatically affect the patient's outcome, especially if they have been discharged with a different initial diagnosis. Machine learning approaches have been devised to expedite the process and detect the cases that demand instant follow up. However, these approaches require a large amount of labeled data to train reliable predictive models. Preparing such a large dataset, which needs to be manually annotated by health professionals, is costly and time-consuming. This paper investigates a semi-supervised learning framework for radiology report classification across three hospitals. The main goal is to leverage clinical unlabeled data in order to augment the learning process where limited labeled data is available. To further improve the classification performance, we also integrate a transfer learning technique into the semi-supervised learning pipeline . Our experimental findings show that (1) convolutional neural networks (CNNs), while being independent of any problem-specific feature engineering, achieve significantly higher effectiveness compared to conventional supervised learning approaches, (2) leveraging unlabeled data in training a CNN-based classifier reduces the dependency on labeled data by more than 50% to reach the same performance of a fully supervised CNN, and (3) transferring the knowledge gained from available labeled data in an external source hospital significantly improves the performance of a semi-supervised CNN model over their fully supervised counterparts in a target hospital
    • โ€ฆ
    corecore