438 research outputs found

    ์•ฝ๋ฌผ ๊ฐ์‹œ๋ฅผ ์œ„ํ•œ ๋น„์ •ํ˜• ํ…์ŠคํŠธ ๋‚ด ์ž„์ƒ ์ •๋ณด ์ถ”์ถœ ์—ฐ๊ตฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ์œตํ•ฉ๊ณผํ•™๊ธฐ์ˆ ๋Œ€ํ•™์› ์‘์šฉ๋ฐ”์ด์˜ค๊ณตํ•™๊ณผ, 2023. 2. ์ดํ˜•๊ธฐ.Pharmacovigilance is a scientific activity to detect, evaluate and understand the occurrence of adverse drug events or other problems related to drug safety. However, concerns have been raised over the quality of drug safety information for pharmacovigilance, and there is also a need to secure a new data source to acquire drug safety information. On the other hand, the rise of pre-trained language models based on a transformer architecture has accelerated the application of natural language processing (NLP) techniques in diverse domains. In this context, I tried to define two problems in pharmacovigilance as an NLP task and provide baseline models for the defined tasks: 1) extracting comprehensive drug safety information from adverse drug events narratives reported through a spontaneous reporting system (SRS) and 2) extracting drug-food interaction information from abstracts of biomedical articles. I developed annotation guidelines and performed manual annotation, demonstrating that strong NLP models can be trained to extracted clinical information from unstructrued free-texts by fine-tuning transformer-based language models on a high-quality annotated corpus. Finally, I discuss issues to consider when when developing annotation guidelines for extracting clinical information related to pharmacovigilance. The annotated corpora and the NLP models in this dissertation can streamline pharmacovigilance activities by enhancing the data quality of reported drug safety information and expanding the data sources.์•ฝ๋ฌผ ๊ฐ์‹œ๋Š” ์•ฝ๋ฌผ ๋ถ€์ž‘์šฉ ๋˜๋Š” ์•ฝ๋ฌผ ์•ˆ์ „์„ฑ๊ณผ ๊ด€๋ จ๋œ ๋ฌธ์ œ์˜ ๋ฐœ์ƒ์„ ๊ฐ์ง€, ํ‰๊ฐ€ ๋ฐ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•œ ๊ณผํ•™์  ํ™œ๋™์ด๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์•ฝ๋ฌผ ๊ฐ์‹œ์— ์‚ฌ์šฉ๋˜๋Š” ์˜์•ฝํ’ˆ ์•ˆ์ „์„ฑ ์ •๋ณด์˜ ๋ณด๊ณ  ํ’ˆ์งˆ์— ๋Œ€ํ•œ ์šฐ๋ ค๊ฐ€ ๊พธ์ค€ํžˆ ์ œ๊ธฐ๋˜์—ˆ์œผ๋ฉฐ, ํ•ด๋‹น ๋ณด๊ณ  ํ’ˆ์งˆ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด์„œ๋Š” ์•ˆ์ „์„ฑ ์ •๋ณด๋ฅผ ํ™•๋ณดํ•  ์ƒˆ๋กœ์šด ์ž๋ฃŒ์›์ด ํ•„์š”ํ•˜๋‹ค. ํ•œํŽธ ํŠธ๋žœ์Šคํฌ๋จธ ์•„ํ‚คํ…์ฒ˜๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์‚ฌ์ „ํ›ˆ๋ จ ์–ธ์–ด๋ชจ๋ธ์ด ๋“ฑ์žฅํ•˜๋ฉด์„œ ๋‹ค์–‘ํ•œ ๋„๋ฉ”์ธ์—์„œ ์ž์—ฐ์–ด์ฒ˜๋ฆฌ ๊ธฐ์ˆ  ์ ์šฉ์ด ๊ฐ€์†ํ™”๋˜์—ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋งฅ๋ฝ์—์„œ ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ์•ฝ๋ฌผ ๊ฐ์‹œ๋ฅผ ์œ„ํ•œ ๋‹ค์Œ 2๊ฐ€์ง€ ์ •๋ณด ์ถ”์ถœ ๋ฌธ์ œ๋ฅผ ์ž์—ฐ์–ด์ฒ˜๋ฆฌ ๋ฌธ์ œ ํ˜•ํƒœ๋กœ ์ •์˜ํ•˜๊ณ  ๊ด€๋ จ ๊ธฐ์ค€ ๋ชจ๋ธ์„ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค: 1) ์ˆ˜๋™์  ์•ฝ๋ฌผ ๊ฐ์‹œ ์ฒด๊ณ„์— ๋ณด๊ณ ๋œ ์ด์ƒ์‚ฌ๋ก€ ์„œ์ˆ ์ž๋ฃŒ์—์„œ ํฌ๊ด„์ ์ธ ์•ฝ๋ฌผ ์•ˆ์ „์„ฑ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•œ๋‹ค. 2) ์˜๋ฌธ ์˜์•ฝํ•™ ๋…ผ๋ฌธ ์ดˆ๋ก์—์„œ ์•ฝ๋ฌผ-์‹ํ’ˆ ์ƒํ˜ธ์ž‘์šฉ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•œ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์•ˆ์ „์„ฑ ์ •๋ณด ์ถ”์ถœ์„ ์œ„ํ•œ ์–ด๋…ธํ…Œ์ด์…˜ ๊ฐ€์ด๋“œ๋ผ์ธ์„ ๊ฐœ๋ฐœํ•˜๊ณ  ์ˆ˜์ž‘์—…์œผ๋กœ ์–ด๋…ธํ…Œ์ด์…˜์„ ์ˆ˜ํ–‰ํ•˜์˜€๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ ๊ณ ํ’ˆ์งˆ์˜ ์ž์—ฐ์–ด ํ•™์Šต๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์‚ฌ์ „ํ•™์Šต ์–ธ์–ด๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•จ์œผ๋กœ์จ ๋น„์ •ํ˜• ํ…์ŠคํŠธ์—์„œ ์ž„์ƒ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•˜๋Š” ๊ฐ•๋ ฅํ•œ ์ž์—ฐ์–ด์ฒ˜๋ฆฌ ๋ชจ๋ธ ๊ฐœ๋ฐœ์ด ๊ฐ€๋Šฅํ•จ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ์•ฝ๋ฌผ๊ฐ์‹œ์™€ ๊ด€๋ จ๋œ์ž„์ƒ ์ •๋ณด ์ถ”์ถœ์„ ์œ„ํ•œ ์–ด๋…ธํ…Œ์ด์…˜ ๊ฐ€์ด๋“œ๋ผ์ธ์„ ๊ฐœ๋ฐœํ•  ๋•Œ ๊ณ ๋ คํ•ด์•ผ ํ•  ์ฃผ์˜ ์‚ฌํ•ญ์— ๋Œ€ํ•ด ๋…ผ์˜ํ•˜์˜€๋‹ค. ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ ์†Œ๊ฐœํ•œ ์ž์—ฐ์–ด ํ•™์Šต๋ฐ์ดํ„ฐ์™€ ์ž์—ฐ์–ด์ฒ˜๋ฆฌ ๋ชจ๋ธ์€ ์•ฝ๋ฌผ ์•ˆ์ „์„ฑ ์ •๋ณด์˜ ๋ณด๊ณ  ํ’ˆ์งˆ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ณ  ์ž๋ฃŒ์›์„ ํ™•์žฅํ•˜์—ฌ ์•ฝ๋ฌผ ๊ฐ์‹œ ํ™œ๋™์„ ๋ณด์กฐํ•  ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋œ๋‹ค.Chapter 1 1 1.1 Contributions of this dissertation 2 1.2 Overview of this dissertation 2 1.3 Other works 3 Chapter 2 4 2.1 Pharmacovigilance 4 2.2 Biomedical NLP for pharmacovigilance 6 2.2.1 Pre-trained language models 6 2.2.2 Corpora to extract clinical information for pharmacovigilance 9 Chapter 3 11 3.1 Motivation 12 3.2 Proposed Methods 14 3.2.1 Data source and text corpus 15 3.2.2 Annotation of ADE narratives 16 3.2.3 Quality control of annotation 17 3.2.4 Pretraining KAERS-BERT 18 3.2.6 Named entity recognition 20 3.2.7 Entity label classification and sentence extraction 21 3.2.8 Relation extraction 21 3.2.9 Model evaluation 22 3.2.10 Ablation experiment 23 3.3 Results 24 3.3.1 Annotated ICSRs 24 3.3.2 Corpus statistics 26 3.3.3 Performance of NLP models to extract drug safety information 28 3.3.4 Ablation experiment 31 3.4 Discussion 33 3.5 Conclusion 38 Chapter 4 39 4.1 Motivation 39 4.2 Proposed Methods 43 4.2.1 Data source 44 4.2.2 Annotation 45 4.2.3 Quality control of annotation 49 4.2.4 Baseline model development 49 4.3 Results 50 4.3.1 Corpus statistics 50 4.3.2 Annotation Quality 54 4.3.3 Performance of baseline models 55 4.3.4 Qualitative error analysis 56 4.4 Discussion 59 4.5 Conclusion 63 Chapter 5 64 5.1 Issues around defining a word entity 64 5.2 Issues around defining a relation between word entities 66 5.3 Issues around defining entity labels 68 5.4 Issues around selecting and preprocessing annotated documents 68 Chapter 6 71 6.1 Dissertation summary 71 6.2 Limitation and future works 72 6.2.1 Development of end-to-end information extraction models from free-texts to database based on existing structured information 72 6.2.2 Application of in-context learning framework in clinical information extraction 74 Chapter 7 76 7.1 Annotation Guideline for "Extraction of Comprehensive Drug Safety Information from Adverse Event Narratives Reported through Spontaneous Reporting System" 76 7.2 Annotation Guideline for "Extraction of Drug-Food Interactions from the Abtracts of Biomedical Articles" 100๋ฐ•

    Visual Question Answering: A SURVEY

    Get PDF
    Visual Question Answering (VQA) has been an emerging field in computer vision and natural language processing that aims to enable machines to understand the content of images and answer natural language questions about them. Recently, there has been increasing interest in integrating Semantic Web technologies into VQA systems to enhance their performance and scalability. In this context, knowledge graphs, which represent structured knowledge in the form of entities and their relationships, have shown great potential in providing rich semantic information for VQA. This paper provides an abstract overview of the state-of-the-art research on VQA using Semantic Web technologies, including knowledge graph based VQA, medical VQA with semantic segmentation, and multi-modal fusion with recurrent neural networks. The paper also highlights the challenges and future directions in this area, such as improving the accuracy of knowledge graph based VQA, addressing the semantic gap between image content and natural language, and designing more effective multimodal fusion strategies. Overall, this paper emphasizes the importance and potential of using Semantic Web technologies in VQA and encourages further research in this exciting area

    ํ† ํฐ ๋‹จ์œ„ ๋ถ„๋ฅ˜๋ชจ๋ธ์„ ์œ„ํ•œ ์ค‘์š” ํ† ํฐ ํฌ์ฐฉ ๋ฐ ์‹œํ€€์Šค ์ธ์ฝ”๋” ์„ค๊ณ„ ๋ฐฉ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2022. 8. ์ •๊ต๋ฏผ.With the development of internet, a great of volume of data have accumulated over time. Therefore, dealing long sequential data can become a core problem in web services. For example, streaming services such as YouTube, Netflx and Tictoc have used the user's viewing history sequence to recommend videos that users may like. Such systems have replaced the user's viewed video with each item or token to predict what item or token will be viewed next. These tasks have been defined as Token-Level Classification (TLC) tasks. Given the sequence of tokens, TLC identifies the labels of tokens in the required portion of this sequence. As mentioned above, TLC can be applied to various recommendation Systems. In addition, most of Natural Language Processing (NLP) tasks can also be formulated as TLC problem. For example, sentence and each word within the sentence can be expressed as token-level sequence. In particular, in the case of information extraction, it can be changed to a TLC task that distinguishes whether a specific word span in the sentence is information. The characteristics of TLC datasets are that they are very sparse and long. Therefore, it is a very important problem to extract only important information from the sequences and properly encode them. In this thesis, we propose the method to solve the two academic questions of TLC in Recommendation Systems and information extraction: 1) How to capture important tokens from the token sequence and 2) How to encode a token sequence into model. As deep neural networks (DNNs) have shown outstanding performance in various web application tasks, we design the RNN and Transformer-based model for recommendation systems, and information extractions. In this dissertation, we propose novel models that can extract important tokens for recommendation systems and information extraction systems. In recommendation systems, we design a BART-based system that can capture important portion of token sequence through self-attention mechanisms and consider both bidirectional and left-to-right directional information. In information systems, we present relation network-based models to focus important parts such as opinion target and neighbor words.์ธํ„ฐ๋„ท์˜ ๋ฐœ๋‹ฌ๋กœ, ๋งŽ์€ ์–‘์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ์‹œ๊ฐ„์ด ์ง€๋‚จ์— ๋”ฐ๋ผ ์ถ•์ ๋˜์—ˆ๋‹ค. ์ด๋กœ์ธํ•ด ๊ธด ์ˆœ์ฐจ์  ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ๊ฒƒ์€ ์›น ์„œ๋น„์Šค์˜ ํ•ต์‹ฌ ๋ฌธ์ œ๊ฐ€ ๋˜์—ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์œ ํŠœ๋ธŒ, ๋„ทํ”Œ๋ฆญ์Šค, ํ‹ฑํ†ก๊ณผ ๊ฐ™์€ ์ŠคํŠธ๋ฆฌ๋ฐ ์„œ๋น„์Šค๋Š” ์‚ฌ์šฉ์ž์˜ ์‹œ์ฒญ ๊ธฐ๋ก ์‹œํ€€์Šค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‚ฌ์šฉ์ž๊ฐ€ ์ข‹์•„ํ•  ๋งŒํ•œ ๋น„๋””์˜ค๋ฅผ ์ถ”์ฒœํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ์‹œ์Šคํ…œ์€ ๋‹ค์Œ์— ์–ด๋–ค ํ•ญ๋ชฉ์ด๋‚˜ ํ† ํฐ์„ ๋ณผ ๊ฒƒ์ธ์ง€๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ์ž๊ฐ€ ๋ณธ ๋น„๋””์˜ค๋ฅผ ๊ฐ ํ•ญ๋ชฉ ๋˜๋Š” ํ† ํฐ์œผ๋กœ ๋Œ€์ฒดํ•˜์—ฌ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ž‘์—…์€ ํ† ํฐ ์ˆ˜์ค€ ๋ถ„๋ฅ˜(TLC) ์ž‘์—…์œผ๋กœ ์ •์˜ํ•œ๋‹ค. ํ† ํฐ ์‹œํ€€์Šค๊ฐ€ ์ฃผ์–ด์ง€๋ฉด, TLC๋Š” ์ด ์‹œํ€€์Šค์˜ ํ•„์š”ํ•œ ๋ถ€๋ถ„์—์„œ ํ† ํฐ์˜ ๋ผ๋ฒจ์„ ์‹๋ณ„ํ•œ๋‹ค. ์ด๋ ‡๊ฒŒ์™€ ๊ฐ™์ด, TLC๋Š” ๋‹ค์–‘ํ•œ ์ถ”์ฒœ ์‹œ์Šคํ…œ์— ์ ์šฉ๋  ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ ๋Œ€๋ถ€๋ถ„์˜ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ(NLP) ์ž‘์—…์€ TLC ๋ฌธ์ œ๋กœ ๊ณต์‹ํ™”๋  ์ˆ˜ ์žˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๋ฌธ์žฅ๊ณผ ๋ฌธ์žฅ ๋‚ด์˜ ๊ฐ ๋‹จ์–ด๋Š” ํ† ํฐ ๋ ˆ๋ฒจ ์‹œํ€€์Šค๋กœ ํ‘œํ˜„๋  ์ˆ˜ ์žˆ๋‹ค. ํŠนํžˆ ์ •๋ณด ์ถ”์ถœ์˜ ๊ฒฝ์šฐ ๋ฌธ์žฅ์˜ ํŠน์ • ๋‹จ์–ด ๊ฐ„๊ฒฉ์ด ์ •๋ณด์ธ์ง€ ์—ฌ๋ถ€๋ฅผ ๊ตฌ๋ถ„ํ•˜๋Š” TLC ์ž‘์—…์œผ๋กœ ๋ฐ”๋€” ์ˆ˜ ์žˆ๋‹ค. TLC ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ํŠน์ง•์€ ๋งค์šฐ ํฌ๋ฐ•(Sparse)ํ•˜๊ณ  ๊ธธ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ๋”ฐ๋ผ์„œ ์‹œํ€€์Šค์—์„œ ์ค‘์š”ํ•œ ์ •๋ณด๋งŒ ์ถ”์ถœํ•˜์—ฌ ์ ์ ˆํžˆ ์ธ์ฝ”๋”ฉํ•˜๋Š” ๊ฒƒ์€ ๋งค์šฐ ์ค‘์š”ํ•œ ๋ฌธ์ œ์ด๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ถŒ์žฅ ์‹œ์Šคํ…œ๊ณผ ์ •๋ณด ์ถ”์ถœ์—์„œ TLC์˜ ๋‘ ๊ฐ€์ง€ ํ•™๋ฌธ์  ์งˆ๋ฌธ- 1) ํ† ํฐ ์‹œํ€€์Šค์—์„œ ์ค‘์š”ํ•œ ํ† ํฐ์„ ์บก์ฒ˜ํ•˜๋Š” ๋ฐฉ๋ฒ• ๋ฐ 2) ํ† ํฐ ์‹œํ€€์Šค๋ฅผ ๋ชจ๋ธ๋กœ ์ธ์ฝ”๋”ฉํ•˜๋Š” ๋ฐฉ๋ฒ• ์„ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง(DNN)์ด ๋‹ค์–‘ํ•œ ์›น ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ์ž‘์—…์—์„œ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ ์™”๊ธฐ ๋•Œ๋ฌธ์— ์ถ”์ฒœ ์‹œ์Šคํ…œ ๋ฐ ์ •๋ณด ์ถ”์ถœ์„ ์œ„ํ•œ RNN ๋ฐ ํŠธ๋žœ์Šคํฌ๋จธ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์„ ์„ค๊ณ„ํ•œ๋‹ค. ๋จผ์ € ์šฐ๋ฆฌ๋Š” ์ž๊ธฐ ์ฃผ์˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ํ†ตํ•ด ํ† ํฐ ์‹œํ€€์Šค์˜ ์ค‘์š”ํ•œ ๋ถ€๋ถ„์„ ํฌ์ฐฉํ•˜๊ณ  ์–‘๋ฐฉํ–ฅ ๋ฐ ์ขŒ์šฐ ๋ฐฉํ–ฅ ์ •๋ณด๋ฅผ ๋ชจ๋‘ ๊ณ ๋ คํ•  ์ˆ˜ ์žˆ๋Š” BART ๊ธฐ๋ฐ˜ ์ถ”์ฒœ ์‹œ์Šคํ…œ์„ ์„ค๊ณ„ํ•œ๋‹ค. ์ •๋ณด ์‹œ์Šคํ…œ์—์„œ, ์šฐ๋ฆฌ๋Š” ์˜๊ฒฌ ๋Œ€์ƒ๊ณผ ์ด์›ƒ ๋‹จ์–ด์™€ ๊ฐ™์€ ์ค‘์š”ํ•œ ๋ถ€๋ถ„์— ์ดˆ์ ์„ ๋งž์ถ”๊ธฐ ์œ„ํ•ด ๊ด€๊ณ„ ๋„คํŠธ์›Œํฌ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์„ ์ œ์‹œํ•œ๋‹ค.1. Introduction 1 2. Token-level Classification in Recommendation Systems 8 2.1 Overview 8 2.2 Hierarchical RNN-based Recommendation Systems 19 2.3 Entangled Bidirectional Encoder to Auto-regressive Decoder for Sequential Recommendation 27 3. Token-level Classification in Information Extraction 39 3.1 Overview 39 3.2 RABERT: Relation-Aware BERT for Target-Oriented Opinion Words Extraction 49 3.3 Gated Relational Target-aware Encoder and Local Context-aware Decoder for Target-oriented Opinion Words Extraction 58 4. Conclusion 79๋ฐ•

    ์›น ๊ฒ€์ƒ‰๋Ÿ‰ ๊ธฐ๋ฐ˜ ์ฃผ๊ฐ€ ๋ณ€๋™ ์˜ˆ์ธก์„ ์œ„ํ•œ ๋ณ€ํ™”ํ•˜๋Š” ์ฃผ์‹ ๊ด€๊ณ„ ๋ชจ๋ธ๋ง

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(์„์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ํ˜‘๋™๊ณผ์ • ์ธ๊ณต์ง€๋Šฅ์ „๊ณต, 2023. 2. ๊ฐ•์œ .Given historical stock prices and web search volumes of selected keywords, how can we accurately predict stock price predictions? Stock price movement prediction is an attractive task for its applicability in real-world investments. Even a slight improvement in performance can lead to enormous profit. However, the task is extremely challenging due to the inherently volatile and random nature of the stock market. To overcome such difficulties, many researchers have tried to utilize relationships between stocks to make predictions. Despite the effort, previous works have failed to incorporate the dynamic characteristic of stock relationships as they heavily relied on predefined concepts to find stock correlations. However, correlations between stocks change over time and are not dependent on a single criterion. In this paper, we propose GFS (Graph-based Framework using changing relations for Stock price movement prediction), a novel framework for stock price movement prediction using web search volumes to capture the changing relations between stocks. GFS combines relationship information from stationary connections based on predefined concepts with variable connects made from the correlations of each stocks web search volumes collected using tickers. In addition, from the fact that stock prices are affected by global trends, we collect web search volumes of 5 keywords that best represent a common denominator of the target stocks. Experimental results on a 1-year dataset of semiconductor stocks listed in the U.S. stock market show that our model achieves higher accuracy than its baselines.๊ณผ๊ฑฐ ์ฃผ๊ฐ€์™€ ๊ด€๋ จ ํ‚ค์›Œ๋“œ ์›น ๊ฒ€์ƒ‰๋Ÿ‰์ด ์ฃผ์–ด์กŒ์„ ๋•Œ ์ฃผ๊ฐ€์˜ ๋ณ€๋™์„ ์–ด๋–ป๊ฒŒ ์ •ํ™•ํ•˜๊ฒŒ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ์„๊นŒ? ์ฃผ๊ฐ€ ์˜ˆ์ธก์€ ๋งŽ์€ ๊ฐ๊ด‘์„ ๋ฐ›๊ณ  ์žˆ์œผ๋ฉฐ ์•ฝ๊ฐ„์˜ ์„ฑ๋Šฅ ๊ฐœ์„ ์œผ๋กœ๋„ ์‹ค์ œ ํˆฌ์ž์—์„œ ๋งŽ์€ ์ด์ต์„ ์–ป์„ ์ˆ˜ ์žˆ๊ธฐ์— ๋งค์šฐ ๋งค๋ ฅ์ ์ธ ์ฃผ์ œ์ด๋‹ค. ์ฃผ๊ฐ€์˜ ์›€์ง์ž„์„ ์˜ˆ์ธกํ•œ๋‹ค๋Š” ๊ฒƒ์€ ๋น„๋ก ๊ฐ„๋‹จํ•ด๋ณด์ด์ง€๋งŒ ์ฃผ๊ฐ€์˜ ๋ณธ์งˆ์ ์ธ ๋ณ€๋™์„ฑ์œผ๋กœ ์ธํ•ด ๋งค์šฐ ์–ด๋ ต๋‹ค. ์ด๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ์•ˆ์œผ๋กœ ๋งŽ์€ ๋ฐฉ๋ฒ•๋“ค์ด ์ฃผ์‹ ๊ฐ„ ์ƒ๊ด€๊ด€๊ณ„ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•ด ์‹œ๋„ํ•ด ์™”๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด์ „ ์—ฐ๊ตฌ๋“ค์€ ์‚ฌ์ „์— ์ •์˜๋œ ์ •๋ณด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ณ ์ •๋œ ๊ด€๊ณ„๋งŒ์„ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜ ๊ณผ๊ฑฐ ๊ฐ€๊ฒฉ๋งŒ์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ณ„์†ํ•ด์„œ ๋ณ€ํ™”ํ•˜๋Š” ์ฃผ์‹๋“ค๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธก์— ํ™œ์šฉํ•˜๋Š”๋ฐ ์‹คํŒจํ•˜์˜€๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ฃผ์‹ ๊ด€๊ณ„์˜ ๋™์  ๋ณ€ํ™”๋ฅผ ์‚ฌ์šฉํ•ด ์ฃผ๊ฐ€์˜ ๋ณ€๋™์„ ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉ๋ฒ•์ธ GFS (Graph-based Framework using changing relations for Stock price prediction)๋ฅผ ์ œ์•ˆํ•œ๋‹ค. GFS๋Š” ์‚ฌ์ „์— ์ •์˜๋œ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•œ ๊ทธ๋ž˜ํ”„์™€ ํ•จ๊ป˜ ์›น ๊ฒ€์ƒ‰๋Ÿ‰์œผ๋กœ๋ถ€ํ„ฐ ์ฃผ์‹๋“ค๊ฐ„์˜ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ๋งค๋ฒˆ ์ƒˆ๋กœ์šด ๊ทธ๋ž˜ํ”„๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ์‚ฌ์šฉํ•œ๋‹ค. ๋˜ํ•œ, GFS ๋Š” ๋‰ด์Šค๋กœ๋ถ€ํ„ฐ ๊ธ€๋กœ๋ฒŒ ์‚ฐ์—… ํŠธ๋ Œ๋“œ๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ํ‚ค์›Œ๋“œ๋ฅผ ์ถ”์ถœํ•˜์—ฌ ์–ป์€ ์›น ๊ฒ€์ƒ‰๋Ÿ‰์˜ ํŠน์„ฑ์„ ํšจ๊ณผ์ ์œผ๋กœ ์ถ”์ถœํ•˜์—ฌ ๊ธ€๋กœ๋ฒŒ ์‚ฐ์—… ํŠธ๋ Œ๋“œ ๋ฒกํ„ฐ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค. ๋‘ ๊ทธ๋ž˜ํ”„์™€ ๊ธ€๋กœ๋ฒŒ ์‚ฐ์—… ํŠธ๋ Œ๋“œ ๋ฒกํ„ฐ๋Š” ๋ชจ๋‘ GFS๊ฐ€ ์ •ํ™•ํ•œ ์ฃผ๊ฐ€ ๋ณ€๋™์„ ์˜ˆ์ธก ํ•˜๋Š” ๊ฒƒ์— ์ƒ๋‹น ๋ถ€๋ถ„ ๊ธฐ์—ฌํ•˜๋ฉฐ, ์‹คํ—˜ ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ด GFS๊ฐ€ ์ฃผ๊ฐ€ ๋ณ€๋™ ์˜ˆ์ธก ๋ถ„์•ผ์—์„œ ์ตœ๊ณ  ์ˆ˜์ค€์˜ ์ •ํ™•๋„๋ฅผ ์ œ๊ณตํ•จ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹คI. Introduction 1 II. Related Work 7 2.1 Individual Stock Price Prediction 7 2.2 Correlated Stock Price Prediction 8 III. Proposed Method 9 3.1 Overview 9 3.2 Attentive Feature Extraction 12 3.3 Utilization of Stationary and Trend Graphs 14 3.4 Keyword-based global trend extraction 15 3.5 Stock Price Movement Prediction 15 IV. Experiment 17 4.1 Experiment Settings 17 4.2 Classification Performance 18 4.3 Ablation Study 18 V. Conclusion 21 References 22 Abstract in Korean 24์„

    Event Extraction: A Survey

    Full text link
    Extracting the reported events from text is one of the key research themes in natural language processing. This process includes several tasks such as event detection, argument extraction, role labeling. As one of the most important topics in natural language processing and natural language understanding, the applications of event extraction spans across a wide range of domains such as newswire, biomedical domain, history and humanity, and cyber security. This report presents a comprehensive survey for event detection from textual documents. In this report, we provide the task definition, the evaluation method, as well as the benchmark datasets and a taxonomy of methodologies for event extraction. We also present our vision of future research direction in event detection.Comment: 20 page

    Natural Language Processing: Emerging Neural Approaches and Applications

    Get PDF
    This Special Issue highlights the most recent research being carried out in the NLP field to discuss relative open issues, with a particular focus on both emerging approaches for language learning, understanding, production, and grounding interactively or autonomously from data in cognitive and neural systems, as well as on their potential or real applications in different domains

    Named Entity Recognition in Electronic Health Records: A Methodological Review

    Get PDF
    Objectives A substantial portion of the data contained in Electronic Health Records (EHR) is unstructured, often appearing as free text. This format restricts its potential utility in clinical decision-making. Named entity recognition (NER) methods address the challenge of extracting pertinent information from unstructured text. The aim of this study was to outline the current NER methods and trace their evolution from 2011 to 2022. Methods We conducted a methodological literature review of NER methods, with a focus on distinguishing the classification models, the types of tagging systems, and the languages employed in various corpora. Results Several methods have been documented for automatically extracting relevant information from EHRs using natural language processing techniques such as NER and relation extraction (RE). These methods can automatically extract concepts, events, attributes, and other data, as well as the relationships between them. Most NER studies conducted thus far have utilized corpora in English or Chinese. Additionally, the bidirectional encoder representation from transformers using the BIO tagging system architecture is the most frequently reported classification scheme. We discovered a limited number of papers on the implementation of NER or RE tasks in EHRs within a specific clinical domain. Conclusions EHRs play a pivotal role in gathering clinical information and could serve as the primary source for automated clinical decision support systems. However, the creation of new corpora from EHRs in specific clinical domains is essential to facilitate the swift development of NER and RE models applied to EHRs for use in clinical practice

    A Semantic Information Management Approach for Improving Bridge Maintenance based on Advanced Constraint Management

    Get PDF
    Bridge rehabilitation projects are important for transportation infrastructures. This research proposes a novel information management approach based on state-of-the-art deep learning models and ontologies. The approach can automatically extract, integrate, complete, and search for project knowledge buried in unstructured text documents. The approach on the one hand facilitates implementation of modern management approaches, i.e., advanced working packaging to delivery success bridge rehabilitation projects, on the other hand improves information management practices in the construction industry
    • โ€ฆ
    corecore