3,017 research outputs found

    A Survey on Awesome Korean NLP Datasets

    Full text link
    English based datasets are commonly available from Kaggle, GitHub, or recently published papers. Although benchmark tests with English datasets are sufficient to show off the performances of new models and methods, still a researcher need to train and validate the models on Korean based datasets to produce a technology or product, suitable for Korean processing. This paper introduces 15 popular Korean based NLP datasets with summarized details such as volume, license, repositories, and other research results inspired by the datasets. Also, I provide high-resolution instructions with sample or statistics of datasets. The main characteristics of datasets are presented on a single table to provide a rapid summarization of datasets for researchers.Comment: 11 pages, 1 horizontal page for large tabl

    ๊ตญ์ œ์ •์น˜์  ๊ฐˆ๋“ฑ์ด ๋‚จ๋ถํ•œ ๊ฒฝ์ œ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ: ์ฃผ์‹ ์‹œ์žฅ๊ณผ ๋ฌด์—ญ์— ๋Œ€ํ•œ ๋ถ„์„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :์‚ฌํšŒ๊ณผํ•™๋Œ€ํ•™ ๊ฒฝ์ œํ•™๋ถ€,2020. 2. ๊น€๋ณ‘์—ฐ.๋ณธ ๋…ผ๋ฌธ์€ ๊ตญ์ œ์ •์น˜์  ๊ฐˆ๋“ฑ์˜ ๊ฒฝ์ œ์  ์˜ํ–ฅ์— ๋Œ€ํ•ด ๋‚จ๋ถํ•œ์˜ ์‚ฌ๋ก€๋ฅผ ์ค‘์‹ฌ์œผ๋กœ ์‚ดํŽด๋ณธ๋‹ค. ํŠนํžˆ ๋ถํ•œ ๊ด€๋ จ ๋ฆฌ์Šคํฌ๊ฐ€ ๋‚จํ•œ์˜ ์ฃผ์‹ ์‹œ์žฅ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ๊ณผ ๊ฒฝ์ œ ์ œ์žฌ๊ฐ€ ๋ถํ•œ์˜ ๋ฌด์—ญ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์— ์ฃผ๋ชฉํ•˜์˜€๋‹ค. ์ „์ฒด ๋…ผ๋ฌธ์€ ๊ฐœ๋ณ„์ ์ธ ์†Œ์ฃผ์ œ๋ฅผ ๋‹ค๋ฃจ๋Š” ์„ธ ํŽธ์˜ ์‹ค์ฆ ์—ฐ๊ตฌ๋กœ ๊ตฌ์„ฑ๋œ๋‹ค. ์ฒซ ๋ฒˆ์งธ ์žฅ์—์„œ๋Š” ๋‚จํ•œ์˜ ๊ธฐ์—… ์ฃผ๊ฐ€ ์ˆ˜์ต๋ฅ ์ด ๋ถํ•œ ๋ฆฌ์Šคํฌ์— ์–ด๋–ป๊ฒŒ ๋ฐ˜์‘ํ•˜๋Š”์ง€ ๋ถ„์„ํ•œ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ๋‚จํ•œ ์–ธ๋ก ์˜ ๋ถํ•œ ๊ด€๋ จ ๋ณด๋„ ์ž๋ฃŒ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์›”๋ณ„ ๋ถํ•œ ๋ฆฌ์Šคํฌ ์ง€์ˆ˜๋ฅผ ์ž‘์„ฑํ•˜์˜€๋‹ค. ์ด ์ง€์ˆ˜๋Š” ๋‚จ๋ถ ๊ด€๊ณ„์˜ ๊ธด์žฅ์ด ํ™•๋Œ€๋˜๊ฑฐ๋‚˜ ์™„ํ™”๋˜๋Š” ๊ฒฝ์šฐ ์–ธ๋ก ๋ณด๋„์— ๋“ฑ์žฅํ•  ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋˜๋Š” ํ‚ค์›Œ๋“œ๋ฅผ ํฌํ•จํ•œ ๊ธฐ์‚ฌ์˜ ๋นˆ๋„์ˆ˜๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์‚ฐ์ถœ๋œ๋‹ค. 1999~2018๋…„์˜ ์–ธ๋ก  ๋ณด๋„์ž๋ฃŒ๋ฅผ ๋ถ„์„ํ•œ ๊ฒฐ๊ณผ, ๋ถํ•œ ๋ฐœ ๋ฆฌ์Šคํฌ๋Š” ํ•ต/๋ฏธ์‚ฌ์ผ ์‹คํ—˜, ๊ตฐ์‚ฌ๋„๋ฐœ ๋“ฑ ์ด๋ฒคํŠธ ์‹œ์ ์— ๊ธ‰์ฆํ•˜๋ฉฐ, ๋ฐ˜๋Œ€๋กœ ์ •์ƒํšŒ๋‹ด, 6์ž ํšŒ๋‹ด ๋“ฑ ๋Œ€ํ™”์˜ ์‹œ๊ธฐ์—๋Š” ๊ฐ์†Œํ•˜๋Š” ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ๋‹ค. ๊ธฐ์—… ์ฃผ๊ฐ€ ์ˆ˜์ต๋ฅ ์„ ์ข…์†๋ณ€์ˆ˜๋กœ ํ•œ ํšŒ๊ท€ ๋ถ„์„์—์„œ๋Š” ๊ตญ๋‚ด ํˆฌ์ž์ž์˜ ์ฃผ์‹ ๋ณด์œ  ๋น„์œจ์ด ๋†’์€ ๊ธฐ์—…์ผ์ˆ˜๋ก, ์ž์‚ฐ ๊ทœ๋ชจ๊ฐ€ ํฌ๊ณ  ๊ณ ์ •์ž์‚ฐ์˜ ๋น„์ค‘์ด ๋†’์€ ๊ธฐ์—…์ผ์ˆ˜๋ก, ๋‚จ๋ถ๊ฒฝํ˜‘์— ๊ด€์—ฌํ•œ ๊ฒฝํ—˜์ด ์žˆ๋Š” ๊ธฐ์—…์ผ์ˆ˜๋ก ๋ถํ•œ ๊ด€๋ จ ๋ฆฌ์Šคํฌ์— ๋ฏผ๊ฐํ•˜๊ฒŒ ๋ฐ˜์‘ํ•˜๋Š” ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ๋‹ค. ๋‘ ๋ฒˆ์งธ ์žฅ์—์„œ๋Š” ๋ถํ•œ์— ๋ถ€๊ณผ๋œ ์ฃผ์š” ๊ฒฝ์ œ ์ œ์žฌ๊ฐ€ ๋ฌด์—ญ์— ์ค€ ์˜ํ–ฅ์„ ๋ฌด์—ญ์˜ ์งˆ์  ์ธก๋ฉด์„ ์ค‘์‹ฌ์œผ๋กœ ๋ถ„์„ํ•œ๋‹ค. ์šฐ์„  1998~2018๋…„์˜ ๋ถํ•œ-์ค‘๊ตญ ๊ฐ„ ๋ฌด์—ญ์„ ์™ธ์—ฐ์  ํ™•์žฅ ์ˆ˜์ค€(extensive margin), ์ƒ๋Œ€ ๊ฐ€๊ฒฉ(relative unit price), ๋ฌผ๋Ÿ‰(quantity)์œผ๋กœ ๋ถ„ํ•ดํ•˜๊ณ , ์ด ์ค‘ ๋ฌด์—ญ์˜ ์งˆ์  ์ธก๋ฉด์œผ๋กœ ๋ณผ ์ˆ˜ ์žˆ๋Š” ์™ธ์—ฐ์  ํ™•์žฅ ์ˆ˜์ค€๊ณผ ์ƒ๋Œ€ ๊ฐ€๊ฒฉ ์ง€์ˆ˜์˜ ๋ณ€ํ™”์— ์ฃผ๋ชฉํ•˜์˜€๋‹ค. ์ด๋ฅผ ํ†ตํ•ด, ๋ถํ•œ์˜ ๋Œ€์ค‘ ์ˆ˜์ถœ์ด ์ง€๋‚œ 20๋…„๊ฐ„ ์–‘์ ์œผ๋กœ ์„ฑ์žฅํ•˜์˜€์„ ๋ฟ ์งˆ์ ์œผ๋กœ๋Š” ์ •์ฒด๋˜์–ด ์žˆ๊ฑฐ๋‚˜ ์˜คํžˆ๋ ค ํ›„ํ‡ดํ–ˆ๋‹ค๋Š” ์‚ฌ์‹ค์„ ํ™•์ธํ•˜์˜€๋‹ค. ํšŒ๊ท€๋ถ„์„์—์„œ๋Š” ๋ถํ•œ์˜ ๋ฌด์—ญ์„ ์ง์ ‘์ ์œผ๋กœ ํƒ€๊ฒฉํ•˜๊ณ ์ž ํ•œ ํ•œ๊ตญ๊ณผ ์ผ๋ณธ์˜ ๋…์ž ์ œ์žฌ ๋ฐ 2017๋…„ UN์•ˆ๋ณด๋ฆฌ์—์„œ ๊ฒฐ์˜๋œ ๋‹ค์ž ์ œ์žฌ๋ฅผ ํ•ต์‹ฌ ์„ค๋ช… ๋ณ€์ˆ˜๋กœ ์„ค์ •ํ•˜์˜€๊ณ , ๋ถ„์„ ๋ฐฉ๋ฒ•์œผ๋กœ๋Š” ์ „๊ธฐ ์ข…์† ๋ณ€์ˆ˜๊ฐ€ ํฌํ•จ๋œ ๋™์  ํŒจ๋„ ๋ชจํ˜•(dynamic panel model)์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ์ถ”์ • ๊ฒฐ๊ณผ, 2017๋…„ UN์˜ ์ œ์žฌ๊ฐ€ ๋ถํ•œ์˜ ๋Œ€์ค‘ ์ˆ˜์ถœ์—์„œ ํ’ˆ๋ชฉ์˜ ์™ธ์—ฐ์  ํ™•์žฅ ์ˆ˜์ค€์„ ์ถ•์†Œ ์‹œํ‚จ ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ๋‹ค. ๋˜ํ•œ 2003๋…„ ์ผ๋ณธ์˜ ์ œ์žฌ๋Š” ์ค‘๊ตญ ์ˆ˜์ž… ์‹œ์žฅ์—์„œ ๋ถํ•œ ์ƒ์‚ฐํ’ˆ์˜ ์ƒ๋Œ€ ๊ฐ€๊ฒฉ์„ ์œ ์˜ํ•˜๊ฒŒ ํ•˜๋ฝ์‹œํ‚จ ๊ฒƒ์œผ๋กœ ์ถ”์ •๋˜์—ˆ๋‹ค. ์ถ”๊ฐ€์ ์ธ ํšŒ๊ท€๋ถ„์„์— ๋”ฐ๋ฅด๋ฉด, ์ด๋Ÿฌํ•œ ์ƒ๋Œ€ ๊ฐ€๊ฒฉ ํ•˜๋ฝ์€ ๋ถ-์ค‘ ๊ฐ„ ๊ฐ€๊ฒฉ ํ˜‘์ƒ๋ ฅ์˜ ์ฐจ์ด์—์„œ ๊ธฐ์ธํ•œ๋‹ค. ์ด ์—ฐ๊ตฌ ๊ฒฐ๊ณผ๋Š” ๋ถํ•œ์— ๋Œ€ํ•œ ๋ฌด์—ญ ์ œ์žฌ๊ฐ€ ๋‹ค๋ฅธ ์ฃผ์š” ๊ต์—ญ๊ตญ๊ณผ์˜ ๊ฑฐ๋ž˜ ๊ด€๊ณ„๋ฅผ ์ฐจ๋‹จํ•˜๊ณ  ์ค‘๊ตญ์— ๋Œ€ํ•œ ์˜์กด๋„๋ฅผ ์ง€๋‚˜์น˜๊ฒŒ ๋†’์ด๋ฉด์„œ ์•”๋ฌต์  ๋น„์šฉ์„ ๋ฐœ์ƒ์‹œํ‚ค๊ณ  ์žˆ์Œ์„ ์‹œ์‚ฌํ•œ๋‹ค. ๋งˆ์ง€๋ง‰ ์žฅ์—์„œ๋Š” ๋‚จํ•œ์˜ 5.24 ์กฐ์น˜๋ฅผ ํšŒํ”ผํ•˜๊ธฐ ์œ„ํ•œ ๋ถ-์ค‘ ๊ฐ„์˜ ์šฐํšŒ๋ฌด์—ญ ๊ทœ๋ชจ๋ฅผ ์ถ”์ •ํ•˜์˜€๋‹ค. ์ด ์—ฐ๊ตฌ๋Š” ์ค‘๊ตญ์˜ ๊ธฐ์—…-ํ’ˆ๋ชฉ ๋‹จ์œ„์˜ ์ž๋ฃŒ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ˜„์žฌ๊นŒ์ง€ ๋ถ-์ค‘ ๋ฌด์—ญ์— ๋Œ€ํ•œ ์—ฐ๊ตฌ ์ค‘ ๊ฐ€์žฅ ๋ฏธ์‹œ์  ์ˆ˜์ค€์˜ ์‹ค์ฆ๋ถ„์„ ๊ฒฐ๊ณผ๋ฅผ ์ œ์‹œํ•œ๋‹ค. ๋ถ„์„ ๋ฐฉ๋ฒ•์€ ์ด์ค‘ ์ฐจ๋ถ„๋ฒ•(difference-in-difference estimation)์„ ์‚ฌ์šฉํ•˜์˜€์œผ๋ฉฐ, ๋Œ€ ๋‚จํ•œ ์ˆ˜์ถœ๊ณผ ๋Œ€ ๋ถํ•œ ์ˆ˜์ž…์ด ๋™์‹œ์— ๋ฐœ์ƒํ•œ ๊ธฐ์—…-ํ’ˆ๋ชฉ๋“ค์„ ์ฒ˜์น˜๊ทธ๋ฃน์œผ๋กœ ์„ค์ •ํ•˜์—ฌ 2010๋…„ ์ „ํ›„์˜ ๋ณ€ํ™”๋ฅผ ์ถ”์ •ํ•˜์˜€๋‹ค. ๋ถ„์„ ๊ฒฐ๊ณผ, ๋ถํ•œ์˜ ์ค‘๊ตญ์„ ๊ฒฝ์œ ํ•œ ๋‚จํ•œ์œผ๋กœ์˜ ๊ฐ„์ ‘ ์ˆ˜์ถœ์€ 2010๋…„ 5.24 ์กฐ์น˜ ์ดํ›„ ์œ ์˜ํ•˜๊ฒŒ ์ฆ๊ฐ€ํ•œ ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ๋‹ค. ์‚ฐ์—…๋ณ„๋กœ ๋‚˜๋ˆ„์–ด ๋ณด๋ฉด, ์ด๋Ÿฌํ•œ ์šฐํšŒ ๋ฌด์—ญ์€ ์ฃผ๋กœ ์˜๋ฅ˜ ์ž„๊ฐ€๊ณต ๋ถ€๋ฌธ์— ์ง‘์ค‘๋˜์–ด ์žˆ์œผ๋ฉฐ ๊ทธ ๊ทœ๋ชจ๋Š” ์ œ์žฌ ์ดํ›„ ๋ถํ•œ์˜ ๋Œ€๋‚จ ์ง์ ‘ ์ˆ˜์ถœ ๊ฐ์†Œ๋ถ„์˜ 25%์— ๋‹ฌํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์ถ”์ •๋˜์—ˆ๋‹ค.This dissertation investigates the economic impacts of international conflict, focusing on the cases of the two Koreas. Specifically, it examines the effects of North Korea-related risks on the South Korean stock market and economic sanctions on North Korea's foreign trade. It consists of three empirical studies covering subtopics. The first chapter analyzes how South Korean stock returns respond to North Korea-related risk. To do this, a monthly index for geopolitical risk from North Korea is constructed using South Korean media coverage database. The index is based on the frequency of articles containing keywords that are likely to appear in the media when inter-Korean tensions escalate or ease. Analysis of the media coverage from 1999 to 2018 show that the geopolitical risk index sharply increases in the occurrences of nuclear tests, missile launches, and military confrontations, while decreases significantly at around the times of summit meetings and multilateral talks. In the regression analysis, it is found that geopolitical risk related to North Korea has more negative effects on stock returns of firms with a higher share of domestic investors, larger assets and a higher proportion of fixed assets. It is also found that stock prices of companies involved in inter-Korean economic cooperation exhibit a more sensitive response to the North Korea risk. Chapter โ…ก explores the impact of economic sanctions on North Koreas foreign trade, focusing on the quality of trade. It decomposes the trade between North Korea and China into the extensive margin, relative unit price and quantity, over the periods 1998-2018. Then it estimates sanction-induced changes in the former two elements of North Koreas export to China. The decomposition results show that the growth of North Koreas export to China is mostly attributed to the growth in quantity rather than quality. In the regression analysis, sanctions imposed by South Korea, Japan and the United Nations Security Council (UNSC) are used as key treatments. It is found that the UN sanctions in 2017 reduce the extensive margin in North Korean exports, and Japanese sanctions in 2003 have lowered the relative prices of North Korean products in the Chinese import market. The price impacts of sanctions are found to be associated with the bargaining power of China over North Korea. The findings suggest that trade sanctions against North Korea have created implicit costs by preventing North Korea from trading with alternative partners and increasing reliance on China. The last chapter estimates the size of the transit trade between North Korea and China to circumvent the sanctions imposed by South Korea. It exploits firm-product level variations in Chinese trade data to present micro evidence of the sanction-bypassing trade. Specifically, the transit trade is identified only when a firm import a product from North Korea and export the same product to South Korea in the same period. The difference-in-difference estimation results show that indirect exports from North Korea to South Korea via China are increased significantly by the 5.24 measures in 2010. The increase in North Koreas indirect export of apparels, in particular, accounts for a 25% of the decrease in North Korea's direct exports to South Korea.Introduction 1 Chapter โ… . Geopolitical Risk from North Korea and Stock Market Reaction 4 1. Introduction 4 2. Related Literature 6 2.1. News-based Uncertainty Index 6 2.2. The Effects of Geopolitical Risk from North Korea 7 3. Measuring Geopolitical Risk from North Korea 7 3.1. Definition and Scope of Geopolitical Risk 7 3.2. Data and Methodology 9 3.3. Evaluating the GPRNK Index 12 4. Geopolitical Risk and Firm-level Stock Returns 20 4.1. Empirical Framework 20 4.2. Data and Descriptive Statistics 23 4.3. Baseline Results 26 4.4. Robustness Check 32 5. Conclusion 40 Chapter โ…ก. Decomposing North Koreas Trade with China and Revisiting Sanction Effects 41 1. Introduction 41 2. Decomposing North Koreas Trade with China 46 2.1. Data 47 2.2. Methodology 50 2.3. Decomposition Results 52 3. Panel Regression Analysis 59 3.1. Empirical Framework 59 3.2. Baseline Results 63 3.3. Possible Channels 66 3.4. Robustness Check 71 4. Conclusion 73 Chapter โ…ข. The Role of Chinese firms in Bypassing Sanctions on North Korea 75 1. Introduction 75 2. Data 78 2.1. Chinese Custom Trade Data 78 2.2. Stylized Facts 81 3. Empirical Strategy 83 4. Regression Results 86 4.1. North Koreas Indirect Exports via China 86 4.2. The Effects of Sanctions on the Indirect Exports 90 5. Conclusion 93 Concluding Remarks 95 References 98 Appendices 107 A1. Supplementary Materials for Chapter 1 107 A2. Supplementary Materials for Chapter โ…ก 117 A3. Supplementary Materials for Chapter โ…ข 121Docto

    Understanding Social Media through Large Volume Measurements

    Get PDF
    The amount of user-generated web content has grown drastically in the past 15 years and many social media services are exceedingly popular nowadays. In this thesis we study social media content creation and consumption through large volume measurements of three prominent social media services, namely Twitter, YouTube, and Wikipedia. Common to the services is that they have millions of users, they are free to use, and the users of the services can both create and consume content. The motivation behind this thesis is to examine how users create and consume social media content, investigate why social media services are as popular as they are, what drives people to contribute on them, and see if it is possible to model the conduct of the users. We study how various aspects of social media content be that for example its creation and consumption or its popularity can be measured, characterized, and linked to real world occurrences. We have gathered more than 20 million tweets, metadata of more than 10 million YouTube videos and a complete six-year page view history of 19 different Wikipedia language editions. We show, for example, daily and hourly patterns for the content creation and consumption, content popularity distributions, characteristics of popular content, and user statistics. We will also compare social media with traditional news services and show the interaction with social media, news, and stock prices. In addition, we combine natural language processing with social media analysis, and discover interesting correlations between news and social media content. Moreover, we discuss the importance of correct measurement methods and show the effects of different sampling methods using YouTube measurements as an example.Sosiaalisen median suosio ja sen kรคyttรคjien luoman sisรคllรถn mรครคrรค on kasvanut valtavasti viimeisen 15 vuoden aikana ja palvelut kuten Facebook, Instagram, Twitter, YouTube ja Wikipedia ovat erittรคin suosittuja. Tรคssรค vรคitรถskirjassa tarkastellaan sosiaalisen median sisรคllรถn luonti- ja kulutusmalleja laajavoluumisen mittausdatan kautta. Vรคitรถskirja sisรคltรครค mittausdataa Twitter-, YouTube- ja Wikipedia -palveluista. Yhteistรค nรคille kolmelle palvelulle on muuan muassa se, ettรค niillรค on miljoonia kรคyttรคjiรค, niitรค voi kรคyttรครค maksutta ja kรคyttรคjรคt voivat luoda sekรค kuluttaa sisรคltรถรค. Mittausdata sisรคltรครค yli 20 miljoona Twitter -viestiรค, metadatatietoja yli kymmenestรค miljoonasta YouTube -videosta ja tรคydellisen artikkelien katselukertojen tiedot kuudelta vuodelta 19 eri Wikipedian kieliversiosta. Tutkimuksen tarkoituksena on tarkastella kuinka kรคyttรคjรคt luovat ja kuluttavat sisรคltรถรค sekรค lรถytรครค niihin liittyviรค malleja, joita voi hyรถdyntรครค tiedon jaossa, replikoinnissa ja tallentamisessa. Tutkimuksessa pyritรครคn siis selvittรคmรครคn miksi miksi sosiaalisen median palvelut ovat niin suosittuja kuin ne nyt ovat, mikรค saa kรคyttรคjรคt tuottamaan sisรคltรถรค niihin ja onko palveluiden kรคyttรถรค mahdollista mallintaa ja ennakoida. Vรคitรถskirjassa verrataan myรถs sosiaalisen median ja tavallisten uutispalveluiden luonti- ja kulutusmalleja. Lisรคksi nรคytetรครคn kuinka sosiaalisen median sisรคltรถ, uutiset ja pรถrssikurssi hinnat ovat vuorovaikutuksessa toisiinsa. Vรคitรถskirja sisรคltรครค myรถs pohdintaa oikean mittausmenetelmรคn valinnasta ja kรคyttรคmisestรค sekรค nรคytetรครคn eri mittausmenetelmien vaikutuksista tuloksiin YouTube -mittausdatan avulla

    Dirichlet belief networks for topic structure learning

    Full text link
    Recently, considerable research effort has been devoted to developing deep architectures for topic models to learn topic structures. Although several deep models have been proposed to learn better topic proportions of documents, how to leverage the benefits of deep structures for learning word distributions of topics has not yet been rigorously studied. Here we propose a new multi-layer generative process on word distributions of topics, where each layer consists of a set of topics and each topic is drawn from a mixture of the topics of the layer above. As the topics in all layers can be directly interpreted by words, the proposed model is able to discover interpretable topic hierarchies. As a self-contained module, our model can be flexibly adapted to different kinds of topic models to improve their modelling accuracy and interpretability. Extensive experiments on text corpora demonstrate the advantages of the proposed model.Comment: accepted in NIPS 201

    Impact of online public opinion regarding the Japanese nuclear wastewater incident on stock market based on the SOR model

    Get PDF
    The exposure of the Japanese nuclear wastewater incident has shaped online public opinion and has also caused a certain impact on stocks in aquaculture and feed industries. In order to explore the impact of network public opinion caused by public emergencies on relevant stocks, this paper uses the stimulus organism response(SOR) model to construct a framework model of the impact path of network public opinion on the financial stock market, and it uses emotional analysis, LDA and grounded theory methods to conduct empirical analysis. The study draws a new conclusion about the impact of online public opinion on the performance of relevant stocks in the context of the nuclear waste water incident in Japan. The positive change of media sentiment will lead to the decline of stock returns and the increase of volatility. The positive change of public sentiment will lead to the decline of stock returns in the current period and the increase of stock returns in the lag period. At the same time, we have proved that media attention, public opinion theme and prospect theory value have certain influences on stock performance in the context of the Japanese nuclear wastewater incident. The conclusion shows that after the public emergency, the government and investors need to pay attention to the changes of network public opinion caused by the event, so as to avoid the possible stock market risks

    Semantic Knowledge Graphs for the News: A Review

    Get PDF
    ICT platforms for news production, distribution, and consumption must exploit the ever-growing availability of digital data. These data originate from different sources and in different formats; they arrive at different velocities and in different volumes. Semantic knowledge graphs (KGs) is an established technique for integrating such heterogeneous information. It is therefore well-aligned with the needs of news producers and distributors, and it is likely to become increasingly important for the news industry. This article reviews the research on using semantic knowledge graphs for production, distribution, and consumption of news. The purpose is to present an overview of the field; to investigate what it means; and to suggest opportunities and needs for further research and development.publishedVersio

    Journal of Asian Finance, Economics and Business, v. 4, no. 1

    Get PDF

    Deep Learning-based Information Fusion Frameworks for Stock Price Movement Prediction

    Get PDF
    The challenges of modeling behaviour of financial markets, such as its high volatility, poor predictive behaviour, and its non-stationary nature, have continuously attracted attention of the researchers to employ advanced engineering methods. Within the context of financial econometrics, stock market movement prediction is a key and challenging problem. The research works reported in this thesis are motivated by the potentials of Artificial Intelligence (AI) and Machine Learning (ML)-based models, especially Deep Neural Network (DNN) architectures, for stock movement prediction. Considering recent progress in design and implementation of advanced DNN-based models, there has been a surge of interest in their application for predicting stock trends. In particular, the focus of the thesis is on utilization of information fusion to combine Twitter data with extended horizon market historical data for the task of price movement prediction. In this regard, the thesis made a number of contributions, first, the Noisy Deep Stock Movement Prediction Fusion (ND-SMPF) framework is proposed to extract news level temporal information; identify relevant words with highest correlation and effects on the stock trends, and; perform information fusion with historical price data. A real dataset is incorporated to evaluate performance of the proposed ND-SMPF framework. In addition, given that recent COVID-19 pandemic has negatively affected financial econometrics and stock markets across the globe, a unique COVID-19 related PRIce MOvement prediction (\CDATA) dataset is constructed. The objective is to incorporate effects of social media trends related to COVID-19 on stock market price movements. A novel hybrid and parallel DNN-based framework is then designed that integrates different and diversified learning architectures. Referred to as the COVID-19 adopted Hybrid and Parallel deep fusion framework for Stock price Movement Prediction (\SMP), innovative fusion strategies are used to combine scattered social media news related to COVID-19 with historical market data and perform accurate price movement prediction during a pandemic crisis
    • โ€ฆ
    corecore